Data Management Plan API¶

Methods¶

class dmp.dmp.dmp(cnf_loc=u'', test=False)[source]¶

API for management of files within the VRE

add_file_metadata(user_id, file_id, key, value)[source]¶

Add a key value pair to the meta data for a file

This way a user is able to add extra information to the meta data to better describe the file.

Parameters:	user_id (str) – Identifier to uniquely locate the users files. Can be set to “common” if the files can be shared between users file_id (str) – ID of the file. This is the value returned when a file is loaded into the DMP or is the _id for a given file when the files have been retrieved. key (str) – Unique key for the identification of the extra meta data. If the key matches a value already in the meta data then it over-writes the current value. value – Value to be stored for the given key. This can be a str, int, list or dict.
Returns:	This is an id for that file within the system and can be used for tracing this file and where it is used and where it has come from.
Return type:	str

get_file_by_file_path(user_id, file_path, rest=False)[source]¶

Get a list of the file dictionary objects given a user_id and file_path

Parameters:

user_id (str) – Identifier to uniquely locate the users files. Can be set to “common” if the files can be shared between users
file_path (str) – File path (see validate_file)

Returns:

file_path : str: Location of the file in the file system
file_type : str: File format (see validate_file)
data_type : str: The type of information in the file (RNA-seq, ChIP-seq, etc)
taxon_id : int: Taxon ID that the species that the file has been derived from
compressed : str: Type of compression (None, gzip, zip)
source_id : list: List of IDs of files that were processed to generate this file
meta_data : dict: Dictionary object containing the extra data related to the generation of the file or describing the way it was processed
creation_time : list: Time at which the file was loaded into the system

Return type:

dict

Example

from dmp import dmp
da = dmp()
da.get_files_by_file_path(<user_id>, <file_type>)

get_file_by_id(user_id, file_id, rest=False)[source]¶

Returns files data based on the unique_id for a given file

Parameters:

user_id (str) – Identifier to uniquely locate the users files. Can be set to “common” if the files can be shared between users
file_id (str) – Location of the file in the file system

Returns:

file_path : str: Location of the file in the file system
path_type : str: File or Folder
file_type : str: File format (see validate_file)
size : int: Size of the file
parent_dir : str: Location of the parent dir
data_type : str: The type of information in the file (RNA-seq, ChIP-seq, etc)
taxon_id : int: Taxon ID that the species that the file has been derived from
compressed : str: Type of compression (None, gzip, zip)
source_id : list: List of IDs of files that were processed to generate this file
meta_data : dict: Dictionary object containing the extra data related to the generation of the file or describing the way it was processed
creation_time : list: Time at which the file was loaded into the system

Return type:

dict

Example

from dmp import dmp
da = dmp()
da.get_file_by_id(<unique_file_id>)

get_file_history(user_id, file_id)[source]¶

Returns the full path of file_ids from the current file to the original file(s)

Needs work to define the format for how declaring the history is best

Parameters:	user_id (str) – Identifier to uniquely locate the users files. Can be set to “common” if the files can be shared between users file_id (str) – ID of the file. This is the value returned when a file is loaded into the DMP or is the _id for a given file when the files have been retrieved.
Returns:	List of lists representing the adjancency of child and parent files.
Return type:	list

Example

from dmp import dmp
da = dmp()
history = da.get_file_history("aLongString")
print history

Output: [['aLongString', 'parentOfaLongString'], ['parentOfaLongString', 'parentOfParent']]

These IDs can then be requested to ruturn the meta data and locations with the get_file_by_id method.

get_files_by_assembly(user_id, assembly, rest=False)[source]¶

Get a list of the file dictionary objects given a user_id and assembly

Parameters:

user_id (str) – Identifier to uniquely locate the users files. Can be set to “common” if the files can be shared between users
assembly (str) – Assembly that the species that the file has been derived from

Returns:

file_path : str: Location of the file in the file system
file_type : str: File format (see validate_file)
data_type : str: The type of information in the file (RNA-seq, ChIP-seq, etc)
taxon_id : int: Taxon ID that the species that the file has been derived from
compressed : str: Type of compression (None, gzip, zip)
source_id : list: List of IDs of files that were processed to generate this file
meta_data : dict: Dictionary object containing the extra data related to the generation of the file or describing the way it was processed
creation_time : list: Time at which the file was loaded into the system

Return type:

dict

Example

from dmp import dmp
da = dmp()
da.get_files_by_taxon_id(<user_id>, <taxon_id>)

get_files_by_data_type(user_id, data_type, rest=False)[source]¶

Get a list of the file dictionary objects given a user_id and data_type

Parameters:

user_id (str) – Identifier to uniquely locate the users files. Can be set to “common” if the files can be shared between users
data_type (str) – The type of information in the file (RNA-seq, ChIP-seq, etc)

Returns:

file_path : str: Location of the file in the file system
file_type : str: File format (see validate_file)
data_type : str: The type of information in the file (RNA-seq, ChIP-seq, etc)
taxon_id : int: Taxon ID that the species that the file has been derived from
compressed : str: Type of compression (None, gzip, zip)
source_id : list: List of IDs of files that were processed to generate this file
meta_data : dict: Dictionary object containing the extra data related to the generation of the file or describing the way it was processed
creation_time : list: Time at which the file was loaded into the system

Return type:

dict

Example

from dmp import dmp
da = dmp()
da.get_files_by_data_type(<user_id>, <data_type>)

get_files_by_file_type(user_id, file_type, rest=False)[source]¶

Get a list of the file dictionary objects given a user_id and file_type

Parameters:

user_id (str) – Identifier to uniquely locate the users files. Can be set to “common” if the files can be shared between users
file_type (str) – File format (see validate_file)

Returns:

file_path : str: Location of the file in the file system
file_type : str: File format (see validate_file)
data_type : str: The type of information in the file (RNA-seq, ChIP-seq, etc)
taxon_id : int: Taxon ID that the species that the file has been derived from
compressed : str: Type of compression (None, gzip, zip)
source_id : list: List of IDs of files that were processed to generate this file
meta_data : dict: Dictionary object containing the extra data related to the generation of the file or describing the way it was processed
creation_time : list: Time at which the file was loaded into the system

Return type:

dict

Example

from dmp import dmp
da = dmp()
da.get_files_by_file_type(<user_id>, <file_type>)

get_files_by_taxon_id(user_id, taxon_id, rest=False)[source]¶

Get a list of the file dictionary objects given a user_id and taxon_id

Parameters:

user_id (str) – Identifier to uniquely locate the users files. Can be set to “common” if the files can be shared between users
taxon_id (int) – Taxon ID that the species that the file has been derived from

Returns:

file_path : str: Location of the file in the file system
file_type : str: File format (see validate_file)
data_type : str: The type of information in the file (RNA-seq, ChIP-seq, etc)
taxon_id : int: Taxon ID that the species that the file has been derived from
compressed : str: Type of compression (None, gzip, zip)
source_id : list: List of IDs of files that were processed to generate this file
meta_data : dict: Dictionary object containing the extra data related to the generation of the file or describing the way it was processed
creation_time : list: Time at which the file was loaded into the system

Return type:

dict

Example

from dmp import dmp
da = dmp()
da.get_files_by_taxon_id(<user_id>, <taxon_id>)

get_files_by_user(user_id, rest=False)[source]¶

Get a list of the file dictionary objects given a user_id

Parameters:	user_id (str) – Identifier to uniquely locate the users files. Can be set to “common” if the files can be shared between users
Returns:	List of dict objects for each file that has been loaded by a user.
Return type:	list

Example

from dmp import dmp
da = dmp()
da.get_files_by_user(<user_id>)

modify_column(user_id, file_id, key, value)[source]¶

Update a key value pair for the record

Parameters:	user_id (str) – Identifier to uniquely locate the users files. Can be set to “common” if the files can be shared between users file_id (str) – ID of the file. This is the value returned when a file is loaded into the DMP or is the _id for a given file when the files have been retrieved. key (str) – Unique key for the identification of the extra meta data. If the key matches a value already in the meta data then it over-writes the current value. value – Value to be stored for the given key. This can be a str, int, list or dict.
Returns:	This is an id for that file within the system and can be used for tracing this file and where it is used and where it has come from.
Return type:	str

remove_file(user_id, file_id)[source]¶

Removes a single file from the directory. Returns the ID of the file that was removed

Parameters:	user_id (str) – Identifier to uniquely locate the users files. Can be set to “common” if the files can be shared between users file_id (str) – ID of the file. This is the value returned when a file is loaded into the DMP or is the _id for a given file when the files have been retrieved.
Returns:	The file_id of the removed file.
Return type:	str

Example

from dmp import dmp
da = dmp()
da.remove_file(<file_id>)

remove_file_metadata(user_id, file_id, key)[source]¶

Remove a key value pair from the meta data for a given file

Parameters:	user_id (str) – Identifier to uniquely locate the users files. Can be set to “common” if the files can be shared between users file_id (str) – ID of the file. This is the value returned when a file is loaded into the DMP or is the _id for a given file when the files have been retrieved. key (str) – Unique key for the identification of the extra meta data to be removed
Returns:	This is an id for that file within the system and can be used for tracing this file and where it is used and where it has come from.
Return type:	str

set_file(user_id, file_path, path_type, file_type=u'', size=0, parent_dir=u'', data_type=u'', taxon_id=u'', compressed=None, source_id=None, meta_data=None, **kwargs)[source]¶

Adds a file to the data management API.

Parameters:	user_id (str) – Identifier to uniquely locate the users files. Can be set to “common” if the files can be shared between users file_path (str) – Location of the file in the file system path_type (str) – parent_dir (str) – _id of the parent directory file_type (str) – File format (see validate_file) size (int) – File size in bytes data_type (str) – The type of information in the file (RNA-seq, ChIP-seq, etc) taxon_id (int) – Taxon ID that the species that the file has been derived from compressed (str) – Type of compression (None, gzip, zip) source_id (list) – List of IDs of files that were processed to generate this file meta_data (dict) – Dictionary object containing the extra data related to the generation of the file or describing the way it was processed assembly : string Dependent paramenter. If the sequence has been aligned at some point during the production of this file then the assembly must be recorded.
Returns:	This is an id for that file within the system and can be used for tracing this file and where it is used and where it has come from.
Return type:	str

Example

from dmp import dmp
da = dmp()
unique_file_id = da.set_file(
    'user1', '/tmp/example_file.fastq', 'fastq', 'RNA-seq', 9606, None)

If there is a processed result of 1 or more files then these can be specified using the file_id:

>>> da.set_file(
    'user1', '/tmp/example_file.fastq', 'fastq', 'RNA-seq', 9606, None,
    source_id=[1, 2])

Meta data about the file can also be included to provide extra information about the file, origins or how it was generated:

>>> da.set_file('user1', '/tmp/example_file.fastq', 'fastq', 'RNA-seq',
    9606, None, meta_data={'assembly' : 'GCA_0000nnnn',
    'downloaded_from' : 'http://www.', })

static validate_file(entry)[source]¶

Validate that the required meta data for a given entry is present. If there is missing data then a ValueError excepetion is raised. This function checks that all required paths are defined and that when various selections are made then the correct matching data is also present

Parameters:

entry (dict) –

user_id : str: Identifier to uniquely locate the users files. Can be set to “common” if the files can be shared between users
file_path : str: Location of the file in the file system
path_type : str: File or folder
file_type : str: File format (“amb”, “ann”, “bam”, “bb”, “bed”, “bt2”, “bw”, “bwt”, “cpt”, “csv”, “dcd”, “fa”, “fasta”, “fastq”, “gem”, “gff3”, “gz”, “hdf5”, “json”, ‘lif’, “pac”, “pdb”, “pdf”, “png”, “prmtop”, “sa”, “tbi”, “tif”, “tpr”, “trj”, “tsv”, “txt”, “wig”)
size : int: Size of the file in bytes
data_type : str: The type of information in the file (RNA-seq, ChIP-seq, etc)
taxon_id : int: Taxon ID that the species that the file has been derived from
compressed : str: Type of compression (None, gzip, zip)
source_id : list: List of IDs of files that were processed to generate this file
meta_data : dict: Dictionary object containing the extra data related to the generation of the file or describing the way it was processed assembly : string

Returns:

bool – Returns True if there are no errors with the entry
If there are issues with the entry then a ValueError is raised.