Data Management Plan API

Methods

class dmp.dmp.dmp(cnf_loc=u'', test=False)[source]

API for management of files within the VRE

add_file_metadata(user_id, file_id, key, value)[source]

Add a key value pair to the meta data for a file

This way a user is able to add extra information to the meta data to better describe the file.

Parameters:
  • user_id (str) – Identifier to uniquely locate the users files. Can be set to “common” if the files can be shared between users
  • file_id (str) – ID of the file. This is the value returned when a file is loaded into the DMP or is the _id for a given file when the files have been retrieved.
  • key (str) – Unique key for the identification of the extra meta data. If the key matches a value already in the meta data then it over-writes the current value.
  • value – Value to be stored for the given key. This can be a str, int, list or dict.
Returns:

This is an id for that file within the system and can be used for tracing this file and where it is used and where it has come from.

Return type:

str

get_file_by_file_path(user_id, file_path, rest=False)[source]

Get a list of the file dictionary objects given a user_id and file_path

Parameters:
  • user_id (str) – Identifier to uniquely locate the users files. Can be set to “common” if the files can be shared between users
  • file_path (str) – File path (see validate_file)
Returns:

file_path : str

Location of the file in the file system

file_type : str

File format (see validate_file)

data_type : str

The type of information in the file (RNA-seq, ChIP-seq, etc)

taxon_id : int

Taxon ID that the species that the file has been derived from

compressed : str

Type of compression (None, gzip, zip)

source_id : list

List of IDs of files that were processed to generate this file

meta_data : dict

Dictionary object containing the extra data related to the generation of the file or describing the way it was processed

creation_time : list

Time at which the file was loaded into the system

Return type:

dict

Example

1
2
3
from dmp import dmp
da = dmp()
da.get_files_by_file_path(<user_id>, <file_type>)
get_file_by_id(user_id, file_id, rest=False)[source]

Returns files data based on the unique_id for a given file

Parameters:
  • user_id (str) – Identifier to uniquely locate the users files. Can be set to “common” if the files can be shared between users
  • file_id (str) – Location of the file in the file system
Returns:

file_path : str

Location of the file in the file system

path_type : str

File or Folder

file_type : str

File format (see validate_file)

size : int

Size of the file

parent_dir : str

Location of the parent dir

data_type : str

The type of information in the file (RNA-seq, ChIP-seq, etc)

taxon_id : int

Taxon ID that the species that the file has been derived from

compressed : str

Type of compression (None, gzip, zip)

source_id : list

List of IDs of files that were processed to generate this file

meta_data : dict

Dictionary object containing the extra data related to the generation of the file or describing the way it was processed

creation_time : list

Time at which the file was loaded into the system

Return type:

dict

Example

1
2
3
from dmp import dmp
da = dmp()
da.get_file_by_id(<unique_file_id>)
get_file_history(user_id, file_id)[source]

Returns the full path of file_ids from the current file to the original file(s)

Needs work to define the format for how declaring the history is best

Parameters:
  • user_id (str) – Identifier to uniquely locate the users files. Can be set to “common” if the files can be shared between users
  • file_id (str) – ID of the file. This is the value returned when a file is loaded into the DMP or is the _id for a given file when the files have been retrieved.
Returns:

List of lists representing the adjancency of child and parent files.

Return type:

list

Example

1
2
3
4
from dmp import dmp
da = dmp()
history = da.get_file_history("aLongString")
print history

Output: [['aLongString', 'parentOfaLongString'], ['parentOfaLongString', 'parentOfParent']]

These IDs can then be requested to ruturn the meta data and locations with the get_file_by_id method.

get_files_by_assembly(user_id, assembly, rest=False)[source]

Get a list of the file dictionary objects given a user_id and assembly

Parameters:
  • user_id (str) – Identifier to uniquely locate the users files. Can be set to “common” if the files can be shared between users
  • assembly (str) – Assembly that the species that the file has been derived from
Returns:

file_path : str

Location of the file in the file system

file_type : str

File format (see validate_file)

data_type : str

The type of information in the file (RNA-seq, ChIP-seq, etc)

taxon_id : int

Taxon ID that the species that the file has been derived from

compressed : str

Type of compression (None, gzip, zip)

source_id : list

List of IDs of files that were processed to generate this file

meta_data : dict

Dictionary object containing the extra data related to the generation of the file or describing the way it was processed

creation_time : list

Time at which the file was loaded into the system

Return type:

dict

Example

1
2
3
from dmp import dmp
da = dmp()
da.get_files_by_taxon_id(<user_id>, <taxon_id>)
get_files_by_data_type(user_id, data_type, rest=False)[source]

Get a list of the file dictionary objects given a user_id and data_type

Parameters:
  • user_id (str) – Identifier to uniquely locate the users files. Can be set to “common” if the files can be shared between users
  • data_type (str) – The type of information in the file (RNA-seq, ChIP-seq, etc)
Returns:

file_path : str

Location of the file in the file system

file_type : str

File format (see validate_file)

data_type : str

The type of information in the file (RNA-seq, ChIP-seq, etc)

taxon_id : int

Taxon ID that the species that the file has been derived from

compressed : str

Type of compression (None, gzip, zip)

source_id : list

List of IDs of files that were processed to generate this file

meta_data : dict

Dictionary object containing the extra data related to the generation of the file or describing the way it was processed

creation_time : list

Time at which the file was loaded into the system

Return type:

dict

Example

1
2
3
from dmp import dmp
da = dmp()
da.get_files_by_data_type(<user_id>, <data_type>)
get_files_by_file_type(user_id, file_type, rest=False)[source]

Get a list of the file dictionary objects given a user_id and file_type

Parameters:
  • user_id (str) – Identifier to uniquely locate the users files. Can be set to “common” if the files can be shared between users
  • file_type (str) – File format (see validate_file)
Returns:

file_path : str

Location of the file in the file system

file_type : str

File format (see validate_file)

data_type : str

The type of information in the file (RNA-seq, ChIP-seq, etc)

taxon_id : int

Taxon ID that the species that the file has been derived from

compressed : str

Type of compression (None, gzip, zip)

source_id : list

List of IDs of files that were processed to generate this file

meta_data : dict

Dictionary object containing the extra data related to the generation of the file or describing the way it was processed

creation_time : list

Time at which the file was loaded into the system

Return type:

dict

Example

1
2
3
from dmp import dmp
da = dmp()
da.get_files_by_file_type(<user_id>, <file_type>)
get_files_by_taxon_id(user_id, taxon_id, rest=False)[source]

Get a list of the file dictionary objects given a user_id and taxon_id

Parameters:
  • user_id (str) – Identifier to uniquely locate the users files. Can be set to “common” if the files can be shared between users
  • taxon_id (int) – Taxon ID that the species that the file has been derived from
Returns:

file_path : str

Location of the file in the file system

file_type : str

File format (see validate_file)

data_type : str

The type of information in the file (RNA-seq, ChIP-seq, etc)

taxon_id : int

Taxon ID that the species that the file has been derived from

compressed : str

Type of compression (None, gzip, zip)

source_id : list

List of IDs of files that were processed to generate this file

meta_data : dict

Dictionary object containing the extra data related to the generation of the file or describing the way it was processed

creation_time : list

Time at which the file was loaded into the system

Return type:

dict

Example

1
2
3
from dmp import dmp
da = dmp()
da.get_files_by_taxon_id(<user_id>, <taxon_id>)
get_files_by_user(user_id, rest=False)[source]

Get a list of the file dictionary objects given a user_id

Parameters:user_id (str) – Identifier to uniquely locate the users files. Can be set to “common” if the files can be shared between users
Returns:List of dict objects for each file that has been loaded by a user.
Return type:list

Example

1
2
3
from dmp import dmp
da = dmp()
da.get_files_by_user(<user_id>)
modify_column(user_id, file_id, key, value)[source]

Update a key value pair for the record

Parameters:
  • user_id (str) – Identifier to uniquely locate the users files. Can be set to “common” if the files can be shared between users
  • file_id (str) – ID of the file. This is the value returned when a file is loaded into the DMP or is the _id for a given file when the files have been retrieved.
  • key (str) – Unique key for the identification of the extra meta data. If the key matches a value already in the meta data then it over-writes the current value.
  • value – Value to be stored for the given key. This can be a str, int, list or dict.
Returns:

This is an id for that file within the system and can be used for tracing this file and where it is used and where it has come from.

Return type:

str

remove_file(user_id, file_id)[source]

Removes a single file from the directory. Returns the ID of the file that was removed

Parameters:
  • user_id (str) – Identifier to uniquely locate the users files. Can be set to “common” if the files can be shared between users
  • file_id (str) – ID of the file. This is the value returned when a file is loaded into the DMP or is the _id for a given file when the files have been retrieved.
Returns:

The file_id of the removed file.

Return type:

str

Example

1
2
3
from dmp import dmp
da = dmp()
da.remove_file(<file_id>)
remove_file_metadata(user_id, file_id, key)[source]

Remove a key value pair from the meta data for a given file

Parameters:
  • user_id (str) – Identifier to uniquely locate the users files. Can be set to “common” if the files can be shared between users
  • file_id (str) – ID of the file. This is the value returned when a file is loaded into the DMP or is the _id for a given file when the files have been retrieved.
  • key (str) – Unique key for the identification of the extra meta data to be removed
Returns:

This is an id for that file within the system and can be used for tracing this file and where it is used and where it has come from.

Return type:

str

set_file(user_id, file_path, path_type, file_type=u'', size=0, parent_dir=u'', data_type=u'', taxon_id=u'', compressed=None, source_id=None, meta_data=None, **kwargs)[source]

Adds a file to the data management API.

Parameters:
  • user_id (str) – Identifier to uniquely locate the users files. Can be set to “common” if the files can be shared between users
  • file_path (str) – Location of the file in the file system
  • path_type (str) –
  • parent_dir (str) – _id of the parent directory
  • file_type (str) – File format (see validate_file)
  • size (int) – File size in bytes
  • data_type (str) – The type of information in the file (RNA-seq, ChIP-seq, etc)
  • taxon_id (int) – Taxon ID that the species that the file has been derived from
  • compressed (str) – Type of compression (None, gzip, zip)
  • source_id (list) – List of IDs of files that were processed to generate this file
  • meta_data (dict) –

    Dictionary object containing the extra data related to the generation of the file or describing the way it was processed

    assembly : string
    Dependent paramenter. If the sequence has been aligned at some point during the production of this file then the assembly must be recorded.
Returns:

This is an id for that file within the system and can be used for tracing this file and where it is used and where it has come from.

Return type:

str

Example

1
2
3
4
from dmp import dmp
da = dmp()
unique_file_id = da.set_file(
    'user1', '/tmp/example_file.fastq', 'fastq', 'RNA-seq', 9606, None)

If there is a processed result of 1 or more files then these can be specified using the file_id:

>>> da.set_file(
    'user1', '/tmp/example_file.fastq', 'fastq', 'RNA-seq', 9606, None,
    source_id=[1, 2])

Meta data about the file can also be included to provide extra information about the file, origins or how it was generated:

>>> da.set_file('user1', '/tmp/example_file.fastq', 'fastq', 'RNA-seq',
    9606, None, meta_data={'assembly' : 'GCA_0000nnnn',
    'downloaded_from' : 'http://www.', })
static validate_file(entry)[source]

Validate that the required meta data for a given entry is present. If there is missing data then a ValueError excepetion is raised. This function checks that all required paths are defined and that when various selections are made then the correct matching data is also present

Parameters:entry (dict) –
user_id : str
Identifier to uniquely locate the users files. Can be set to “common” if the files can be shared between users
file_path : str
Location of the file in the file system
path_type : str
File or folder
file_type : str
File format (“amb”, “ann”, “bam”, “bb”, “bed”, “bt2”, “bw”, “bwt”, “cpt”, “csv”, “dcd”, “fa”, “fasta”, “fastq”, “gem”, “gff3”, “gz”, “hdf5”, “json”, ‘lif’, “pac”, “pdb”, “pdf”, “png”, “prmtop”, “sa”, “tbi”, “tif”, “tpr”, “trj”, “tsv”, “txt”, “wig”)
size : int
Size of the file in bytes
data_type : str
The type of information in the file (RNA-seq, ChIP-seq, etc)
taxon_id : int
Taxon ID that the species that the file has been derived from
compressed : str
Type of compression (None, gzip, zip)
source_id : list
List of IDs of files that were processed to generate this file
meta_data : dict
Dictionary object containing the extra data related to the generation of the file or describing the way it was processed assembly : string
Returns:
  • bool – Returns True if there are no errors with the entry
  • If there are issues with the entry then a ValueError is raised.