DataFolder¶
- class Stoner.folders.DataFolder(*args, **kargs)[source]¶
Bases:
DataMethodsMixin
,DiskBasedFolderMixin
,baseFolder
Provide an interface to manipulating lots of data files stored within a directory structure on disc.
By default, the members of the DataFolder are instances of
Stoner.Data
. The DataFolder emplys a lazy open strategy, so that files are only read in from disc when actually needed.Attributes Summary
Return a list of just the filename parts of the objectFolder.
Clone just does a deepcopy as a property for compatibility with
Stoner.Core.DataFile
.Just read the local debug value.
Build a single list of all of our defaults by iterating over the __mro__, caching the result.
Give the maximum number of levels of group below the current objectFolder.
Just alias directory to root now.
Return a
Stoner.folders.each.item
proxy object.Return an iterator of potentially unloaded named objects.
Subfolders are held in an ordered dictionary of groups.
Return a default instance of the type of object in the folder.
Return True if the folder is empty.
Override the parent class key to use the directory attribute.
Return a tuple that describes the number of files and groups in the folder.
Iterate only over those members of the folder in memory.
Return a callable that will load the files on demand.
List just the names of the objects in the folder.
Return a list of the groups as a generator.
Return a
Stoner.folders.metadata.MetadataProxy
object.Give the minimum number of levels of group below the current objectFolder.
Iterate over the objectFolder that checks whether the loaded metadataObject objects have any data.
Return an array of True/False for whether we've loaded a metadataObject yet.
Return the objects in the folder are stored in a
regexpDict
.Provide support for getting the pattern attribute.
Return the real folder root.
Return the proxy for the setas attribute for each object in the folder.
Return a data structure that is characteristic of the objectFolder's shape.
Return the number of levels of group before a group with files is found.
Return the (sub)class of the
Stoner.Core.metadataObject
instances.Methods Summary
add_group
(key)Add a new group to the current baseFolder with the given key.
all
()Iterate over all the files in the Folder and all it's sub Folders recursely.
append
(value)Append an item to the folder object.
clear
()Clear the subgroups.
compress
([base, key, keep_terminal])Compresses all empty groups from the root up until the first non-empty group is located.
concatenate
([sort, reverse])Concatenates all the files in a objectFolder into a single metadataObject like object.
count
(value)Provide a count method like a sequence.
extend
(values)S.extend(iterable) -- extend sequence by appending elements from the iterable
extract
(*metadata, **kargs)Extract metadata from each of the files in the terminal group.
fetch
()Preload the contents of the DiskBasedFolderMixin.
file
(name, value[, create, pathsplit])recursely add groups in order to put the named value into a virtual tree of
baseFolder
.filter
([filter, invert, copy, recurse, prune])Filter the current set of files by some criterion.
filterout
(filter[, copy, recurse, prune])Synonym for self.filter(filter,invert=True).
flatten
([depth])Compresses all the groups and sub-groups iunto a single flat file list.
gather
([xcol, ycol])Collect xy and y columns from the subfiles in the final group in the tree.
get
(name[, default])Return either a sub-group or named object from this folder.
getlist
(**kargs)Scan the current directory, optionally recursively to build a list of filenames.
group
(key)Sort Files into a series of objectFolders according to the value of the key.
index
(value[, start, end])Provide an index method like a sequence.
insert
(index, value)Implement the insert method with the option to append as well.
items
()Return the key,value pairs for the subbroups of this folder.
Filter out earlier revisions of files with the same name.
keys
()Return the keys used to access the sub-=groups of this folder.
make_name
([value])Construct a name from the value object if possible.
on_load_process
(tmp)Carry out processing on a newly loaded file to set means and extra metadata.
pop
([name, default])Return and remove either a subgroup or named object from this folder.
popitem
()Return the most recent subgroup from this folder.
prune
([name])Remove any empty groups from the objectFolder (and subgroups).
remove
(value)S.remove(value) -- remove first occurrence of value.
reverse
()S.reverse() -- reverse IN PLACE
save
([root])Save the entire data folder out to disc using the groups as a directory tree.
select
(*args, **kargs)Select a subset of the objects in the folder based on flexible search criteria on the metadata.
setdefault
(k[, d])Return or set a subgroup or named object.
slice_metadata
(key[, output])Return an array of the metadata values for each item/file in the top level group.
sort
([key, reverse, recurse])Sort the files by some key.
Take the file list an unflattens them according to the file paths.
unload
([name])Remove the instance from memory without losing the name in the Folder.
update
(other)Update this folder with a dictionary or another folder.
values
()Return the sub-groups of this folder.
walk_groups
(walker, **kargs)Walk through a hierarchy of groups and calls walker for each file.
zip_groups
(groups)Return a list of tuples of metadataObjects drawn from the specified groups.
Attributes Documentation
- basenames¶
Return a list of just the filename parts of the objectFolder.
- clone¶
Clone just does a deepcopy as a property for compatibility with
Stoner.Core.DataFile
.
- debug¶
Just read the local debug value.
- defaults¶
Build a single list of all of our defaults by iterating over the __mro__, caching the result.
- depth¶
Give the maximum number of levels of group below the current objectFolder.
- directory¶
Just alias directory to root now.
- each¶
Return a
Stoner.folders.each.item
proxy object.This is for calling attributes of the member type of the folder.
- files¶
Return an iterator of potentially unloaded named objects.
- groups¶
Subfolders are held in an ordered dictionary of groups.
- instance¶
Return a default instance of the type of object in the folder.
- is_empty¶
Return True if the folder is empty.
- key¶
Override the parent class key to use the directory attribute.
- layout¶
Return a tuple that describes the number of files and groups in the folder.
- loaded¶
Iterate only over those members of the folder in memory.
- loader¶
Return a callable that will load the files on demand.
- ls¶
List just the names of the objects in the folder.
- lsgrp¶
Return a list of the groups as a generator.
- metadata¶
Return a
Stoner.folders.metadata.MetadataProxy
object.This allows for operations on combined metadata.
- mindepth¶
Give the minimum number of levels of group below the current objectFolder.
- not_empty¶
Iterate over the objectFolder that checks whether the loaded metadataObject objects have any data.
Returns the next non-empty DatFile member of the objectFolder.
Note
not_empty will also silently skip over any cases where loading the metadataObject object will raise and exception.
- not_loaded¶
Return an array of True/False for whether we’ve loaded a metadataObject yet.
- objects¶
Return the objects in the folder are stored in a
regexpDict
.
- pattern¶
Provide support for getting the pattern attribute.
- root¶
Return the real folder root.
- setas¶
Return the proxy for the setas attribute for each object in the folder.
- shape¶
Return a data structure that is characteristic of the objectFolder’s shape.
- trunkdepth¶
Return the number of levels of group before a group with files is found.
- type¶
Return the (sub)class of the
Stoner.Core.metadataObject
instances.
Methods Documentation
- add_group(key)¶
Add a new group to the current baseFolder with the given key.
- Parameters:
key (string) – A hashable value to be used as the dictionary key in the groups dictionary
- Returns:
A copy of the objectFolder
Note
If key already exists in the groups dictionary then no action is taken.
Todo
Propagate any extra attributes into the groups.
- all()¶
Iterate over all the files in the Folder and all it’s sub Folders recursely.
- Yields:
(path/filename,file)
- append(value)¶
Append an item to the folder object.
- clear()¶
Clear the subgroups.
- compress(base=None, key='.', keep_terminal=False)¶
Compresses all empty groups from the root up until the first non-empty group is located.
- Returns:
A copy of the now flattened DatFolder
- concatenate(sort=None, reverse=False)¶
Concatenates all the files in a objectFolder into a single metadataObject like object.
- Keyword Arguments:
sort (column index, None or bool, or clallable function) – Sort the resultant metadataObject by this column (if a column index), or by the x column if None or True, or not at all if False. sort is passed directly to the eponymous method as the order parameter.
reverse (bool) – Reverse the order of the sort (defaults to False)
- Returns:
The current objectFolder with only one metadataObject item containing all the data.
- count(value)¶
Provide a count method like a sequence.
- Parameters:
value (str, regexp, or
Stoner.Core.metadataObject
) – The thing to count matches for.- Returns:
(int) – The number of matching metadataObject instances.
Notes
If name is a string, then matching is based on either exact matches of the name, or if it includes a * or ? then the basis of a globbing match. name may also be a regular expressiuon, in which case matches are made on the basis of the match with the name of the metadataObject. Finally, if name is a metadataObject, then it matches for an equyality test.
- extend(values)¶
S.extend(iterable) – extend sequence by appending elements from the iterable
- extract(*metadata, **kargs)¶
Extract metadata from each of the files in the terminal group.
Walks through the terminal group and gets the listed metadata from each file and constructsa replacement metadataObject.
- Parameters:
*metadata (str) – One or more metadata indices that should be used to construct the new data file.
- Ketyword Arguments:
- copy (bool):
Take a copy of the
DataFolder
before starting the extract (default is True)
- Returns:
An instance of a metadataObject like object.
- fetch()¶
Preload the contents of the DiskBasedFolderMixin.
With multiprocess enabled this will parallel load the contents of the folder into memory.
- file(name, value, create=True, pathsplit=None)¶
recursely add groups in order to put the named value into a virtual tree of
baseFolder
.- Parameters:
name (str) – A name (which may be a nested path) of the object to file.
value (metadataObject) – The object to be filed - it should be an instance of
baseFolder.type
.
- Keyword Aprameters:
- create(bool):
Whether to create missing groups or to raise an error (default True to create groups).
- pathsplit(str or None):
Character to use to split the name into path components. Defaults to using os.path.split()
- Returns:
(baseFolder) – A reference to the group where the value was eventually filed
- filter(filter=None, invert=False, copy=False, recurse=False, prune=True)¶
Filter the current set of files by some criterion.
- Parameters:
filter (string or callable) – Either a string flename pattern or a callable function which takes a single parameter x which is an instance of a metadataObject and evaluates True or False
- Keyword Arguments:
invert (bool) – Invert the sense of the filter (done by doing an XOR with the filter condition
copy (bool) – If set True then the
DataFolder
is copied before being filtered. Default is False - work in place.recurse (bool) – If True, apply the filter recursely to all groups. Default False
prune (bool) – If True, execute a
baseFolder.prune()
to remove empty groups after filering
- Returns:
The current objectFolder object
- filterout(filter, copy=False, recurse=False, prune=True)¶
Synonym for self.filter(filter,invert=True).
- Parameters:
filter (string or callable) – Either a string flename pattern or a callable function which takes a single parameter x which is an instance of a metadataObject and evaluates True or False
- Keyword Arguments:
copy (bool) – If set True then the
DataFolder
is copied before being filtered. Default is False - work in place.recurse (bool) – If True, apply the filter recursely to all groups. Default False
prune (bool) – If True, execute a
baseFolder.prune()
to remove empty groups after filering
- Returns:
The current objectFolder object with the files in the file list filtered.
- flatten(depth=None)¶
Compresses all the groups and sub-groups iunto a single flat file list.
- Keyword Arguments:
) (depth) –
level. (Only flatten ub-=groups that are within (depth of the deepest) –
- Returns:
A copy of the now flattened DatFolder
- gather(xcol=None, ycol=None)¶
Collect xy and y columns from the subfiles in the final group in the tree.
Builds the collected data into a
Stoner.Core.metadataObject
- Keyword Arguments:
Notes
This is a wrapper around walk_groups that assembles the data into a single file for further analysis or plotting.
- get(name, default=None)¶
Return either a sub-group or named object from this folder.
- getlist(**kargs)¶
Scan the current directory, optionally recursively to build a list of filenames.
- Keyword Arguments:
recursive (bool) – Do a walk through all the directories for files
directory (string or False) – Either a string path to a new directory or False to open a dialog box or not set in which case existing directory is used.
flatten (bool) – After scanning the directory tree, flaten all the subgroupos to make a flat file list. (this is the previous behaviour of
objectFolder.getlist()
)
- Returns:
A copy of the current DataFoder directory with the files stored in the files attribute
getlist() scans a directory tree finding files that match the pattern. By default it will recurse through the entire directory tree finding sub directories and creating groups in the data folder for each sub directory.
- group(key)¶
Sort Files into a series of objectFolders according to the value of the key.
- Parameters:
key (string or callable or list) – Either a simple string or callable function or a list. If a string then it is interpreted as an item of metadata in each file. If a callable function then takes a single argument x which should be an instance of a metadataObject and returns some vale. If key is a list then the grouping is done recursely for each element in key.
- Returns:
A copy of the current objectFolder object in which the groups attribute is a dictionary of objectFolder objects with sub lists of files
Notes
If ne of the grouping metadata keys does not exist in one file then no exception is raised - rather the fiiles will be returned into the grou with key None. Metadata keys that are generated from the filename are supported.
- index(value, start=None, end=None)¶
Provide an index method like a sequence.
- Parameters:
value (str, regexp, or
Stoner.Core.metadataObject
) – The thing to search for.- Keyword Arguments:
start,end (int) – Limit the index search to a sub-range as per Python 3.5+ list.index
- Returns:
(int) – The index of the first matching metadataObject instances.
Notes
If name is a string, then matching is based on either exact matches of the name, or if it includes a * or ? then the basis of a globbing match. name may also be a regular expressiuon, in which case matches are made on the basis of the match with the name of the metadataObject. Finally, if name is a metadataObject, then it matches for an equyality test.
- insert(index, value)¶
Implement the insert method with the option to append as well.
- items()¶
Return the key,value pairs for the subbroups of this folder.
- keep_latest()¶
Filter out earlier revisions of files with the same name.
The CM group LabVIEW software will avoid overwriting files when measuring by inserting !#### where #### is an integer revision number just before the filename extension. This method will look for instances of several files which differ in name only by the presence of the revision number and will kepp only the highest revision number. This is useful if several measurements of the same experiment have been carried out, but only the last file is the correct one.
- Returns:
A copy of the DataFolder.
- keys()¶
Return the keys used to access the sub-=groups of this folder.
- make_name(value=None)¶
Construct a name from the value object if possible.
- on_load_process(tmp)¶
Carry out processing on a newly loaded file to set means and extra metadata.
- pop(name=- 1, default=None)¶
Return and remove either a subgroup or named object from this folder.
- popitem()¶
Return the most recent subgroup from this folder.
- prune(name=None)¶
Remove any empty groups from the objectFolder (and subgroups).
- Returns:
A copy of thte pruned objectFolder.
- remove(value)¶
S.remove(value) – remove first occurrence of value. Raise ValueError if the value is not present.
- reverse()¶
S.reverse() – reverse IN PLACE
- save(root=None)¶
Save the entire data folder out to disc using the groups as a directory tree.
Calls the save method for each file in turn.
- Parameters:
root (string) – The root directory to start creating files and subdirectories under. If set to None or not specified, the current folder’s directory attribute will be used.
- Returns:
A list of the saved files
- select(*args, **kargs)¶
Select a subset of the objects in the folder based on flexible search criteria on the metadata.
- Parameters:
args (various) – A single positional argument if present is interpreted as follows:
If a callable function is given, the entire metadataObject is presented to it. If it evaluates True then that metadataObject is selected. This allows arbitrary select operations
If a dict is given, then it and the kargs dictionary are merged and used to select the metadataObjects
- Keyword Arguments:
recurse (bool) – Also recursively slect through the sub groups
kargs (varuous) –
Arbitrary keyword arguments are interpreted as requestion matches against the corresponding metadata values. The keyword argument may have an additional __operator appended to it which is interpreted as follows:
eq metadata value equals argument value (this is the default test for scalar argument)
ne metadata value doe not equal argument value
gt metadata value doe greater than argument value
lt metadata value doe less than argument value
ge metadata value doe greater than or equal to argument value
le metadata value doe less than or equal to argument value
contains metadata value contains argument value
- in metadata value is in the argument value (this is the default test for non-tuple iterable
arguments)
startswith metadata value startswith argument value
endswith metadata value endwith argument value
icontains,*iin*, istartswith,*iendswith* as above but case insensitive
between metadata value lies between the minimum and maximum values of the argument (the default test for 2-length tuple arguments)
ibetween,*ilbetween*,*iubetween* as above but include both,lower or upper values
rich. (The syntax is inspired by the Django project for selecting, but is not quite as) –
- Returns:
(baseFGolder) – A new baseFolder instance that contains just the matching metadataObjects.
Note
If any of the tests is True, then the metadataObject will be selected, so the effect is a logical OR. To achieve a logical AND, you can chain two selects together:
d.select(temp__le=4.2,vti_temp__lt=4.2).select(field_gt=3.0)
will select metadata objects that have either temp or vti_temp metadata values below 4.2 AND field metadata values greater than 3.
There are a few cases where special treatment is needed:
If you need to select on a aparameter called recurse, pass a dictionary of {“recurse”:value} as the sole positional argument.
If you need to select on a metadata value that ends in an operator word, then append __eq in the keyword name to force the equality test.
If the metadata keys to select on are not valid python identifiers, then pass them via the first positional dictionary value.
If the metadata item being checked exists in a regular expression file pattern for the folder, then the files are not loaded and the metadata is evaluated based on the filename. This can speed up operations where a file load is not required.
- setdefault(k, d=None)¶
Return or set a subgroup or named object.
- slice_metadata(key, output='smart')¶
Return an array of the metadata values for each item/file in the top level group.
- Parameters:
key (str, regexp or list of str) – the meta data key(s) to return
- Keyword Parameters:
- output (str):
Output format - values are - dict: return an array of dictionaries - list: return a list of lists - array: return a numpy array - Data: return a
Stoner.Data
object - smart: (default) return either a list if only one key or a list of dictionaries
- Returns:
(array of metadata) – If single key is given and is an exact match then returns an array of the matching values. If the key results in a regular expression match, then returns an array of dictionaries of all matching keys. If key is a list ir other iterable, then return a 2D array where each column corresponds to one of the keys.
Todo
Add options to recurse through all groups? Put back RCT’s values only functionality?
- sort(key=None, reverse=False, recurse=True)¶
Sort the files by some key.
- Keyword Arguments:
key (string, callable or None) – Either a string or a callable function. If a string then this is interpreted as a metadata key, if callable then it is assumed that this is a a function of one parameter x that is a
Stoner.Core.metadataObject
object and that returns a key value. If key is not specified (default), then a sort is performed on the filenamereverse (bool) – Optionally sort in reverse order
recurse (bool) – If True (default) sort the sub-groups as well.
- Returns:
A copy of the current objectFolder object
- unflatten()¶
Take the file list an unflattens them according to the file paths.
- Returns:
A copy of the objectFolder
- unload(name=None)¶
Remove the instance from memory without losing the name in the Folder.
- Parameters:
name (string,int or None) – Specifies the entry to unload from memory. If set to None all loaded entries are unloaded.
- Returns:
(DataFolder) – returns a copy of itself.
- update(other)¶
Update this folder with a dictionary or another folder.
- values()¶
Return the sub-groups of this folder.
- walk_groups(walker, **kargs)¶
Walk through a hierarchy of groups and calls walker for each file.
- Parameters:
walker (callable) – A callable object that takes either a metadataObject instance or a objectFolder instance.
- Keyword Arguments:
group (bool) – (default False) determines whether the walker function will expect to be given the objectFolder representing the lowest level group or individual metadataObject objects from the lowest level group
replace_terminal (bool) – If group is True and the walker function returns an instance of metadataObject then the return value is appended to the files and the group is removed from the current objectFolder. This will unwind the group hierarchy by one level.
obly_terminal (bool) – Only execute the walker function on groups that have no sub-groups inside them (i.e. are terminal groups)
walker_args (dict) – A dictionary of static arguments for the walker function.
Notes
The walker function should have a prototype of the form:
walker(f,list_of_group_names,**walker_args)
where f is either a objectFolder or metadataObject.
- zip_groups(groups)¶
Return a list of tuples of metadataObjects drawn from the specified groups.
- Parameters:
groups (list of strings) – A list of keys of groups in the Lpy:class:objectFolder
- Returns:
A list of tuples of groups of files – [(grp_1_file_1,grp_2_file_1….grp_n_files_1),(grp_1_file_2, grp_2_file_2….grp_n_file_2)….(grp_1_file_m,grp_2_file_m…grp_n_file_m)]