Custom Reader APIs¶
HDF5 Files¶
Hi-C Adjacency Files¶
-
class
reader.hdf5_adjacency.
adjacency
(user_id, file_id, resolution=None, cnf_loc='')[source]¶ Class related to handling the functions for interacting directly with the HDF5 files. All required information should be passed to this class.
-
get_chromosome_from_array_index
(index)[source]¶ Identify the chromosome based on either the x or y coordinate in the array.
Parameters: index (int) – Location within the array Returns: chr_id – Identity of the chromosome Return type: str Example
1 2 3
from reader import adjacency r = adjacency('test', '', 10000) cid = r.get_chromosome_from_array_index(1234567890)
-
get_chromosome_parameters
()[source]¶ Return a list of the available resolutions in a given HDF5 file
Returns: chromosomes : list chr_param : dict resolitions Return type: dict Example
1 2 3
from reader import adjacency r = adjacency('test', '', 10000) value = r.get_chromosome_parameters()
-
get_chromosomes
()[source]¶ List of chromosomes that have models at a given resolution
Returns: chromosomes – List of chromosomes at the set resolution Return type: list
-
get_range
(chr_id, start, end, limit_chr=None, limit_start=None, limit_end=None, value_url='/api/getValue', no_links=None)[source]¶ Get the interactions that happen within a defined region on a specific chromosome. Returns inter and intra interactions with the defined region.
Parameters: - chr_id (str) – Chromosomal name
- start (int) – Start position within the chromosome
- end (int) – End position within the chromosome
- limit_chr (str (Optional)) – Limit the results to a particular chromosome
- limit_start (int (Optional)) – Limit the range start position on the limit_chr paramter
- limit_end (int (Optional)) – Limit the range end position on the limit_chr parameter
- value_url (str (Optional)) – Define a custom URL snippet for the location of the file if different from the defaul
- no_links (bool (Optional)) – Will return the URL links to the individual points within the adjacency matrix. In cases where this generates a large number of points it is possible to turn off generating these links. Set this value to 1.
Returns: - log : list
List of messages about the state for debugging
- results : list
List of values for given positions within the adjacency matrix
Return type: dict
Example
1 2 3
from reader import adjacency r = adjacency('test', '', 10000) value = r.get_range(2000000, 1000000)
-
get_resolution
()[source]¶ List the current level of rseolution
Returns: resolution – Current level of resolution Return type: int
-
get_resolutions
()[source]¶ List resolutions that models have been generated for
Returns: list – Available levels of resolution that can be set Return type: str
-
get_value
(bin_i, bin_j)[source]¶ Get a specific value for a given dataset, resolution
Parameters: - bin_i (int) – Array position in the first dimension
- bin_j (int) – Array position in the second dimension
Returns: value – Value for a given cell in the adjacency array
Return type: int
Example
1 2 3
from reader import adjacency r = adjacency('test', '', 10000) value = r.get_value(2000000, 1000000)
-
Hi-C Coordinate Files¶
-
class
reader.hdf5_coord.
coord
(user_id, file_id, resolution=None, cnf_loc='')[source]¶ Class related to handling the functions for interacting directly with the HDF5 files. All required information should be passed to this class.
-
get_centroids
(region_id)[source]¶ List the centroid models for each cluster
Returns: centroids – List of the centroid models for each cluster Return type: list
-
get_chromosomes
()[source]¶ List of chromosomes that have models at a given resolution
Returns: chromosomes – List of chromosomes at the set resolution Return type: list
-
get_clusters
(region_id)[source]¶ List all clusters of models
Returns: clusters – List of models in each cluster Return type: list
-
get_model
(region_id, model_ids=None, page=0, mpp=10)[source]¶ Get the coordinates within a defined region on a specific chromosome. If the model_id is not returned the the consensus models for that region are returned
Parameters: - region_id (str) – Region ID
- model_ids (list) – List of model IDs for the models that are required
- page (int) – Page number
- mpp (int) – Number of models per page (default: 10; max: 100)
Returns: array –
- model : dict
- metadata : dict
Relevant extra meta data added by TADbit
- object : dict
Key value pair of information about the region
- models : list
List of dictionaries for each model
- clusters : list
List of models for each cluster
- centroids : list
List of all centroid models
- restraints : list
List of retraints for each position
- hic_data : dict
Hi-C model data
- metadata : dict
- model_count : int
Count of the number of models for the defined region ID
- page_count : int
Number of pages
Return type: list
-
get_models
(region_id)[source]¶ List all models for a given region
Returns: model_id : int cluster_id : int Return type: List
-
get_object_data
(region_id)[source]¶ Prepare the object header data structure ready for printing
Parameters: region_id (int) – Region that is getting downloaded Returns: objectdata – All headers and values required for the JSON output Return type: dict
-
get_region_order
(chr_id=None, region=None)[source]¶ List the regions on a given chromosome ID or region ID in the order that they are located on the chromosome
Parameters: - chr_id (str) – Chromosome ID
- region (str) – Region ID
Returns: - region_id : str
List of the region IDs
Return type: list
-
get_regions
(chr_id, start, end)[source]¶ List regions that are within a given range on a chromosome
Parameters: - chr_id (str) – Chromosome ID
- start (int) – Start position
- end (int) – Stop position
Returns: regions – List of region IDs whose parameters match those provided
Return type: list
-
get_resolution
()[source]¶ List the current level of rseolution
Returns: resolution – Current level of resolution Return type: int
-
Text File Index¶
Lists all files that are available for a user in bed and wig formats and lists the files than have data in a given region so that only the required files are requested by the client
-
class
reader.hdf5_reader.
hdf5_reader
(user_id, file_id, cnf_loc='')[source]¶ Class related to handling the functions for interacting directly with the HDF5 files. All required information should be passed to this class.
-
close
()[source]¶ Tidy function to close file handles
Example
1 2 3
from hdf5_reader import hdf5_reader h5r = hdf5_reader('test') h5r.close()
-
get_assemblies
()[source]¶ List all assemblies for which there are files that have been indexed
Returns: assembly – List of assemblies in the index Return type: list Example
1 2 3
from hdf5_reader import hdf5_reader h5r = hdf5_reader('test') h5r.assemblies()
-
get_chromosomes
(assembly)[source]¶ List all chromosomes that are covered by the index
Parameters: assembly (str) – Genome assembly ID Returns: chromosomes – List of the chromosomes for a given assembly in the index Return type: list Example
1 2 3 4
from hdf5_reader import hdf5_reader h5r = hdf5_reader('test') asm = h5r.assemblies() chr_list = h5r.get_chromosomes(asm[0])
-
get_files
(assembly)[source]¶ List all files for an assembly. If files are missing they can either get loaded or the search can be performed directly on the bigBed files
Parameters: assembly (str) – Genome assembly ID Returns: file_ids – List of file ids for a given assembly in the index Return type: list Example
1 2 3 4
from hdf5_reader import hdf5_reader h5r = hdf5_reader('test') asm = h5r.assemblies() file_list = h5r.get_files(asm[0])
-
get_regions
(assembly, chromosome_id, start, end)[source]¶ List files that have data in a given region.
Parameters: - assembly (str) – Genome assembly ID
- chromosome_id (str) – Chromosome names as listed by the get_files function
- start (int) – Start position for the region of interest
- end (int) – End position for the region of interest
Returns: file_ids – List of the file_ids that have sequence features within the region of interest
Return type: list
Example
1 2 3 4
from hdf5_reader import hdf5_reader h5r = hdf5_reader('test') asm = h5r.assemblies() file_list = h5r.get_chromosomes(asm[0], 1, 1000000, 1100000)
-
BigBed Files¶
-
class
reader.bigbed.
bigbed_reader
(user_id, file_id, cnf_loc='')[source]¶ Class related to handling the functions for interacting directly with the BigBed files. All required information should be passed to this class.
-
close
()[source]¶ Tidy function to close file handles
Example
1 2 3
from reader.bigbed import bigbed_reader bbr = bigbed_reader('test') bbr.close()
-
get_chromosomes
()[source]¶ List the chromosome names and lengths
Returns: chromosomes – Key value pair of chromosome name and the value is the length of the chromosome. Return type: dict
-
get_range
(chr_id, start, end, file_type='bed')[source]¶ Get entries in a given range
Parameters: - chr_id (str) – Chromosome name
- start (int) – Start of the region to query
- end (int) – End of the region to query
- file_type (string (OPTIONAL)) – bed format returning the whole file as a string is the default option. list will return the bed rows but as a list of lists.
Returns: - bed (str (DEFAULT)) – List of strings for the rows in a bed file
- bed_array (list) – List of lists of each row for the bed file format
-
BigWig Files¶
-
class
reader.bigwig.
bigwig_reader
(user_id, file_id, cnf_loc='')[source]¶ Class related to handling the functions for interacting directly with the BigBed files. All required information should be passed to this class.
-
get_chromosomes
()[source]¶ List the chromosome names and lengths
Returns: chromosomes – Key value pair of chromosome name and the value is the length of the chromosome. Return type: dict
-
get_range
(chr_id, start, end, file_type='wig')[source]¶ Get entries in a given range
Parameters: - chr_id (str) – Chromosome name
- start (int) – Start of the region to query
- end (int) – End of the region to query
- format (string (OPTIONAL)) – wig format returning the whole file as a string is the default option. list will return the wig rows but as a list of lists.
Returns: - wig (str (DEFAULT)) – List of strings for the rows in a wig file
- wig_array (list) – List of lists of each row for the wig file format
-
Tabix Files¶
-
class
reader.tabix.
tabix
(user_id, file_id, cnf_loc='')[source]¶ Class related to handling the functions for interacting directly with the BigBed files. All required information should be passed to this class.
-
get_range
(chr_id, start, end, file_type='gff3')[source]¶ Get entries in a given range
Parameters: - chr_id (str) – Chromosome name
- start (int) – Start of the region to query
- end (int) – End of the region to query
- format (string (OPTIONAL)) – gff3 format returning the whole file as a string is the default option. list will return the gff3 rows but as a list of lists.
Returns: - gff3 (str (DEFAULT)) – List of strings for the rows in a gff3 file
- wig_array (list) – List of each row for the gff3 file format
-