Data Management Plan API¶
Methods¶
-
class
dmp.dmp.
dmp
(cnf_loc=u'', test=False)[source]¶ API for management of files within the VRE
-
add_file_metadata
(user_id, file_id, key, value)[source]¶ Add a key value pair to the meta data for a file
This way a user is able to add extra information to the meta data to better describe the file.
Parameters: - user_id (str) – Identifier to uniquely locate the users files. Can be set to “common” if the files can be shared between users
- file_id (str) – ID of the file. This is the value returned when a file is loaded into the DMP or is the _id for a given file when the files have been retrieved.
- key (str) – Unique key for the identification of the extra meta data. If the key matches a value already in the meta data then it over-writes the current value.
- value – Value to be stored for the given key. This can be a str, int, list or dict.
Returns: This is an id for that file within the system and can be used for tracing this file and where it is used and where it has come from.
Return type: str
-
get_file_by_file_path
(user_id, file_path, rest=False)[source]¶ Get a list of the file dictionary objects given a user_id and file_path
Parameters: - user_id (str) – Identifier to uniquely locate the users files. Can be set to “common” if the files can be shared between users
- file_path (str) – File path (see validate_file)
Returns: - file_path : str
Location of the file in the file system
- file_type : str
File format (see validate_file)
- data_type : str
The type of information in the file (RNA-seq, ChIP-seq, etc)
- taxon_id : int
Taxon ID that the species that the file has been derived from
- compressed : str
Type of compression (None, gzip, zip)
- source_id : list
List of IDs of files that were processed to generate this file
- meta_data : dict
Dictionary object containing the extra data related to the generation of the file or describing the way it was processed
- creation_time : list
Time at which the file was loaded into the system
Return type: dict
Example
1 2 3
from dmp import dmp da = dmp() da.get_files_by_file_path(<user_id>, <file_type>)
-
get_file_by_id
(user_id, file_id, rest=False)[source]¶ Returns files data based on the unique_id for a given file
Parameters: - user_id (str) – Identifier to uniquely locate the users files. Can be set to “common” if the files can be shared between users
- file_id (str) – Location of the file in the file system
Returns: - file_path : str
Location of the file in the file system
- path_type : str
File or Folder
- file_type : str
File format (see validate_file)
- size : int
Size of the file
- parent_dir : str
Location of the parent dir
- data_type : str
The type of information in the file (RNA-seq, ChIP-seq, etc)
- taxon_id : int
Taxon ID that the species that the file has been derived from
- compressed : str
Type of compression (None, gzip, zip)
- source_id : list
List of IDs of files that were processed to generate this file
- meta_data : dict
Dictionary object containing the extra data related to the generation of the file or describing the way it was processed
- creation_time : list
Time at which the file was loaded into the system
Return type: dict
Example
1 2 3
from dmp import dmp da = dmp() da.get_file_by_id(<unique_file_id>)
-
get_file_history
(user_id, file_id)[source]¶ Returns the full path of file_ids from the current file to the original file(s)
Needs work to define the format for how declaring the history is best
Parameters: - user_id (str) – Identifier to uniquely locate the users files. Can be set to “common” if the files can be shared between users
- file_id (str) – ID of the file. This is the value returned when a file is loaded into the DMP or is the _id for a given file when the files have been retrieved.
Returns: List of lists representing the adjancency of child and parent files.
Return type: list
Example
1 2 3 4
from dmp import dmp da = dmp() history = da.get_file_history("aLongString") print history
Output:
[['aLongString', 'parentOfaLongString'], ['parentOfaLongString', 'parentOfParent']]
These IDs can then be requested to ruturn the meta data and locations with the get_file_by_id method.
-
get_files_by_assembly
(user_id, assembly, rest=False)[source]¶ Get a list of the file dictionary objects given a user_id and assembly
Parameters: - user_id (str) – Identifier to uniquely locate the users files. Can be set to “common” if the files can be shared between users
- assembly (str) – Assembly that the species that the file has been derived from
Returns: - file_path : str
Location of the file in the file system
- file_type : str
File format (see validate_file)
- data_type : str
The type of information in the file (RNA-seq, ChIP-seq, etc)
- taxon_id : int
Taxon ID that the species that the file has been derived from
- compressed : str
Type of compression (None, gzip, zip)
- source_id : list
List of IDs of files that were processed to generate this file
- meta_data : dict
Dictionary object containing the extra data related to the generation of the file or describing the way it was processed
- creation_time : list
Time at which the file was loaded into the system
Return type: dict
Example
1 2 3
from dmp import dmp da = dmp() da.get_files_by_taxon_id(<user_id>, <taxon_id>)
-
get_files_by_data_type
(user_id, data_type, rest=False)[source]¶ Get a list of the file dictionary objects given a user_id and data_type
Parameters: - user_id (str) – Identifier to uniquely locate the users files. Can be set to “common” if the files can be shared between users
- data_type (str) – The type of information in the file (RNA-seq, ChIP-seq, etc)
Returns: - file_path : str
Location of the file in the file system
- file_type : str
File format (see validate_file)
- data_type : str
The type of information in the file (RNA-seq, ChIP-seq, etc)
- taxon_id : int
Taxon ID that the species that the file has been derived from
- compressed : str
Type of compression (None, gzip, zip)
- source_id : list
List of IDs of files that were processed to generate this file
- meta_data : dict
Dictionary object containing the extra data related to the generation of the file or describing the way it was processed
- creation_time : list
Time at which the file was loaded into the system
Return type: dict
Example
1 2 3
from dmp import dmp da = dmp() da.get_files_by_data_type(<user_id>, <data_type>)
-
get_files_by_file_type
(user_id, file_type, rest=False)[source]¶ Get a list of the file dictionary objects given a user_id and file_type
Parameters: - user_id (str) – Identifier to uniquely locate the users files. Can be set to “common” if the files can be shared between users
- file_type (str) – File format (see validate_file)
Returns: - file_path : str
Location of the file in the file system
- file_type : str
File format (see validate_file)
- data_type : str
The type of information in the file (RNA-seq, ChIP-seq, etc)
- taxon_id : int
Taxon ID that the species that the file has been derived from
- compressed : str
Type of compression (None, gzip, zip)
- source_id : list
List of IDs of files that were processed to generate this file
- meta_data : dict
Dictionary object containing the extra data related to the generation of the file or describing the way it was processed
- creation_time : list
Time at which the file was loaded into the system
Return type: dict
Example
1 2 3
from dmp import dmp da = dmp() da.get_files_by_file_type(<user_id>, <file_type>)
-
get_files_by_taxon_id
(user_id, taxon_id, rest=False)[source]¶ Get a list of the file dictionary objects given a user_id and taxon_id
Parameters: - user_id (str) – Identifier to uniquely locate the users files. Can be set to “common” if the files can be shared between users
- taxon_id (int) – Taxon ID that the species that the file has been derived from
Returns: - file_path : str
Location of the file in the file system
- file_type : str
File format (see validate_file)
- data_type : str
The type of information in the file (RNA-seq, ChIP-seq, etc)
- taxon_id : int
Taxon ID that the species that the file has been derived from
- compressed : str
Type of compression (None, gzip, zip)
- source_id : list
List of IDs of files that were processed to generate this file
- meta_data : dict
Dictionary object containing the extra data related to the generation of the file or describing the way it was processed
- creation_time : list
Time at which the file was loaded into the system
Return type: dict
Example
1 2 3
from dmp import dmp da = dmp() da.get_files_by_taxon_id(<user_id>, <taxon_id>)
-
get_files_by_user
(user_id, rest=False)[source]¶ Get a list of the file dictionary objects given a user_id
Parameters: user_id (str) – Identifier to uniquely locate the users files. Can be set to “common” if the files can be shared between users Returns: List of dict objects for each file that has been loaded by a user. Return type: list Example
1 2 3
from dmp import dmp da = dmp() da.get_files_by_user(<user_id>)
-
modify_column
(user_id, file_id, key, value)[source]¶ Update a key value pair for the record
Parameters: - user_id (str) – Identifier to uniquely locate the users files. Can be set to “common” if the files can be shared between users
- file_id (str) – ID of the file. This is the value returned when a file is loaded into the DMP or is the _id for a given file when the files have been retrieved.
- key (str) – Unique key for the identification of the extra meta data. If the key matches a value already in the meta data then it over-writes the current value.
- value – Value to be stored for the given key. This can be a str, int, list or dict.
Returns: This is an id for that file within the system and can be used for tracing this file and where it is used and where it has come from.
Return type: str
-
remove_file
(user_id, file_id)[source]¶ Removes a single file from the directory. Returns the ID of the file that was removed
Parameters: - user_id (str) – Identifier to uniquely locate the users files. Can be set to “common” if the files can be shared between users
- file_id (str) – ID of the file. This is the value returned when a file is loaded into the DMP or is the _id for a given file when the files have been retrieved.
Returns: The file_id of the removed file.
Return type: str
Example
1 2 3
from dmp import dmp da = dmp() da.remove_file(<file_id>)
-
remove_file_metadata
(user_id, file_id, key)[source]¶ Remove a key value pair from the meta data for a given file
Parameters: - user_id (str) – Identifier to uniquely locate the users files. Can be set to “common” if the files can be shared between users
- file_id (str) – ID of the file. This is the value returned when a file is loaded into the DMP or is the _id for a given file when the files have been retrieved.
- key (str) – Unique key for the identification of the extra meta data to be removed
Returns: This is an id for that file within the system and can be used for tracing this file and where it is used and where it has come from.
Return type: str
-
set_file
(user_id, file_path, path_type, file_type=u'', size=0, parent_dir=u'', data_type=u'', taxon_id=u'', compressed=None, source_id=None, meta_data=None, **kwargs)[source]¶ Adds a file to the data management API.
Parameters: - user_id (str) – Identifier to uniquely locate the users files. Can be set to “common” if the files can be shared between users
- file_path (str) – Location of the file in the file system
- path_type (str) –
- parent_dir (str) – _id of the parent directory
- file_type (str) – File format (see validate_file)
- size (int) – File size in bytes
- data_type (str) – The type of information in the file (RNA-seq, ChIP-seq, etc)
- taxon_id (int) – Taxon ID that the species that the file has been derived from
- compressed (str) – Type of compression (None, gzip, zip)
- source_id (list) – List of IDs of files that were processed to generate this file
- meta_data (dict) –
Dictionary object containing the extra data related to the generation of the file or describing the way it was processed
- assembly : string
- Dependent paramenter. If the sequence has been aligned at some point during the production of this file then the assembly must be recorded.
Returns: This is an id for that file within the system and can be used for tracing this file and where it is used and where it has come from.
Return type: str
Example
1 2 3 4
from dmp import dmp da = dmp() unique_file_id = da.set_file( 'user1', '/tmp/example_file.fastq', 'fastq', 'RNA-seq', 9606, None)
If there is a processed result of 1 or more files then these can be specified using the file_id:
>>> da.set_file( 'user1', '/tmp/example_file.fastq', 'fastq', 'RNA-seq', 9606, None, source_id=[1, 2])
Meta data about the file can also be included to provide extra information about the file, origins or how it was generated:
>>> da.set_file('user1', '/tmp/example_file.fastq', 'fastq', 'RNA-seq', 9606, None, meta_data={'assembly' : 'GCA_0000nnnn', 'downloaded_from' : 'http://www.', })
-
static
validate_file
(entry)[source]¶ Validate that the required meta data for a given entry is present. If there is missing data then a ValueError excepetion is raised. This function checks that all required paths are defined and that when various selections are made then the correct matching data is also present
Parameters: entry (dict) – - user_id : str
- Identifier to uniquely locate the users files. Can be set to “common” if the files can be shared between users
- file_path : str
- Location of the file in the file system
- path_type : str
- File or folder
- file_type : str
- File format (“amb”, “ann”, “bam”, “bb”, “bed”, “bt2”, “bw”, “bwt”, “cpt”, “csv”, “dcd”, “fa”, “fasta”, “fastq”, “gem”, “gff3”, “gz”, “hdf5”, “json”, ‘lif’, “pac”, “pdb”, “pdf”, “png”, “prmtop”, “sa”, “tbi”, “tif”, “tpr”, “trj”, “tsv”, “txt”, “wig”)
- size : int
- Size of the file in bytes
- data_type : str
- The type of information in the file (RNA-seq, ChIP-seq, etc)
- taxon_id : int
- Taxon ID that the species that the file has been derived from
- compressed : str
- Type of compression (None, gzip, zip)
- source_id : list
- List of IDs of files that were processed to generate this file
- meta_data : dict
- Dictionary object containing the extra data related to the generation of the file or describing the way it was processed assembly : string
Returns: - bool – Returns True if there are no errors with the entry
- If there are issues with the entry then a ValueError is raised.
-