Custom Reader APIs¶

HDF5 Files¶

Hi-C Adjacency Files¶

class reader.hdf5_adjacency.adjacency(user_id, file_id, resolution=None, cnf_loc='')[source]¶

Class related to handling the functions for interacting directly with the HDF5 files. All required information should be passed to this class.

close()[source]¶: Close the HDF5 data file handle

get_chromosome_from_array_index(index)[source]¶

Identify the chromosome based on either the x or y coordinate in the array.

Parameters:	index (int) – Location within the array
Returns:	chr_id – Identity of the chromosome
Return type:	str

Example

from reader import adjacency
r = adjacency('test', '', 10000)
cid = r.get_chromosome_from_array_index(1234567890)

get_chromosome_parameters()[source]¶

Return a list of the available resolutions in a given HDF5 file

Returns:	chromosomes : list chr_param : dict resolitions
Return type:	dict

Example

from reader import adjacency
r = adjacency('test', '', 10000)
value = r.get_chromosome_parameters()

get_chromosomes()[source]¶

List of chromosomes that have models at a given resolution

Returns:	chromosomes – List of chromosomes at the set resolution
Return type:	list

get_details()[source]¶: Return a list of the available resolutions in a given HDF5 file

get_range(chr_id, start, end, limit_chr=None, limit_start=None, limit_end=None, value_url='/api/getValue', no_links=None)[source]¶

Get the interactions that happen within a defined region on a specific chromosome. Returns inter and intra interactions with the defined region.

Parameters:

chr_id (str) – Chromosomal name
start (int) – Start position within the chromosome
end (int) – End position within the chromosome
limit_chr (str (Optional)) – Limit the results to a particular chromosome
limit_start (int (Optional)) – Limit the range start position on the limit_chr paramter
limit_end (int (Optional)) – Limit the range end position on the limit_chr parameter
value_url (str (Optional)) – Define a custom URL snippet for the location of the file if different from the defaul
no_links (bool (Optional)) – Will return the URL links to the individual points within the adjacency matrix. In cases where this generates a large number of points it is possible to turn off generating these links. Set this value to 1.

Returns:

log : list: List of messages about the state for debugging
results : list: List of values for given positions within the adjacency matrix

Return type:

dict

Example

from reader import adjacency
r = adjacency('test', '', 10000)
value = r.get_range(2000000, 1000000)

get_resolution()[source]¶

List the current level of rseolution

Returns:	resolution – Current level of resolution
Return type:	int

get_resolutions()[source]¶

List resolutions that models have been generated for

Returns:	list – Available levels of resolution that can be set
Return type:	str

get_value(bin_i, bin_j)[source]¶

Get a specific value for a given dataset, resolution

Parameters:	bin_i (int) – Array position in the first dimension bin_j (int) – Array position in the second dimension
Returns:	value – Value for a given cell in the adjacency array
Return type:	int

Example

from reader import adjacency
r = adjacency('test', '', 10000)
value = r.get_value(2000000, 1000000)

set_resolution(resolution)[source]¶

Set, or change, the resolution level

Parameters:	resolution (int) – Level of resolution

Hi-C Coordinate Files¶

class reader.hdf5_coord.coord(user_id, file_id, resolution=None, cnf_loc='')[source]¶

Class related to handling the functions for interacting directly with the HDF5 files. All required information should be passed to this class.

close()[source]¶: Tidy function to close file handles

get_centroids(region_id)[source]¶

List the centroid models for each cluster

Returns:	centroids – List of the centroid models for each cluster
Return type:	list

get_chromosomes()[source]¶

List of chromosomes that have models at a given resolution

Returns:	chromosomes – List of chromosomes at the set resolution
Return type:	list

get_clusters(region_id)[source]¶

List all clusters of models

Returns:	clusters – List of models in each cluster
Return type:	list

get_model(region_id, model_ids=None, page=0, mpp=10)[source]¶

Get the coordinates within a defined region on a specific chromosome. If the model_id is not returned the the consensus models for that region are returned

Parameters:

region_id (str) – Region ID
model_ids (list) – List of model IDs for the models that are required
page (int) – Page number
mpp (int) – Number of models per page (default: 10; max: 100)

Returns:

array –

model : dict

metadata : dict: Relevant extra meta data added by TADbit
object : dict: Key value pair of information about the region
models : list: List of dictionaries for each model
clusters : list: List of models for each cluster
centroids : list: List of all centroid models
restraints : list: List of retraints for each position
hic_data : dict: Hi-C model data

metadata : dict

model_count : int: Count of the number of models for the defined region ID
page_count : int: Number of pages

Return type:

list

get_models(region_id)[source]¶

List all models for a given region

Returns:	model_id : int cluster_id : int
Return type:	List

get_object_data(region_id)[source]¶

Prepare the object header data structure ready for printing

Parameters:	region_id (int) – Region that is getting downloaded
Returns:	objectdata – All headers and values required for the JSON output
Return type:	dict

get_region_order(chr_id=None, region=None)[source]¶

List the regions on a given chromosome ID or region ID in the order that they are located on the chromosome

Parameters:

chr_id (str) – Chromosome ID
region (str) – Region ID

Returns:

region_id : str: List of the region IDs

Return type:

list

get_regions(chr_id, start, end)[source]¶

List regions that are within a given range on a chromosome

Parameters:	chr_id (str) – Chromosome ID start (int) – Start position end (int) – Stop position
Returns:	regions – List of region IDs whose parameters match those provided
Return type:	list

get_resolution()[source]¶

List the current level of rseolution

Returns:	resolution – Current level of resolution
Return type:	int

get_resolutions()[source]¶

List resolutions that models have been generated for

Returns:	list – Available levels of resolution that can be set
Return type:	str

set_resolution(resolution)[source]¶

Set, or change, the resolution level

Parameters:	resolution (int) – Level of resolution

Text File Index¶

Lists all files that are available for a user in bed and wig formats and lists the files than have data in a given region so that only the required files are requested by the client

class reader.hdf5_reader.hdf5_reader(user_id, file_id, cnf_loc='')[source]¶

Class related to handling the functions for interacting directly with the HDF5 files. All required information should be passed to this class.

close()[source]¶

Tidy function to close file handles

Example

from hdf5_reader import hdf5_reader
h5r = hdf5_reader('test')
h5r.close()

get_assemblies()[source]¶

List all assemblies for which there are files that have been indexed

Returns:	assembly – List of assemblies in the index
Return type:	list

Example

from hdf5_reader import hdf5_reader
h5r = hdf5_reader('test')
h5r.assemblies()

get_chromosomes(assembly)[source]¶

List all chromosomes that are covered by the index

Parameters:	assembly (str) – Genome assembly ID
Returns:	chromosomes – List of the chromosomes for a given assembly in the index
Return type:	list

Example

from hdf5_reader import hdf5_reader
h5r = hdf5_reader('test')
asm = h5r.assemblies()
chr_list = h5r.get_chromosomes(asm[0])

get_files(assembly)[source]¶

List all files for an assembly. If files are missing they can either get loaded or the search can be performed directly on the bigBed files

Parameters:	assembly (str) – Genome assembly ID
Returns:	file_ids – List of file ids for a given assembly in the index
Return type:	list

Example

from hdf5_reader import hdf5_reader
h5r = hdf5_reader('test')
asm = h5r.assemblies()
file_list = h5r.get_files(asm[0])

get_regions(assembly, chromosome_id, start, end)[source]¶

List files that have data in a given region.

Parameters:	assembly (str) – Genome assembly ID chromosome_id (str) – Chromosome names as listed by the get_files function start (int) – Start position for the region of interest end (int) – End position for the region of interest
Returns:	file_ids – List of the file_ids that have sequence features within the region of interest
Return type:	list

Example

from hdf5_reader import hdf5_reader
h5r = hdf5_reader('test')
asm = h5r.assemblies()
file_list = h5r.get_chromosomes(asm[0], 1, 1000000, 1100000)

BigBed Files¶

class reader.bigbed.bigbed_reader(user_id, file_id, cnf_loc='')[source]¶

Class related to handling the functions for interacting directly with the BigBed files. All required information should be passed to this class.

close()[source]¶

Tidy function to close file handles

Example

from reader.bigbed import bigbed_reader
bbr = bigbed_reader('test')
bbr.close()

get_chromosomes()[source]¶

List the chromosome names and lengths

Returns:	chromosomes – Key value pair of chromosome name and the value is the length of the chromosome.
Return type:	dict

get_header()[source]¶

Get the bigBed header

Returns:	header
Return type:	dict

get_range(chr_id, start, end, file_type='bed')[source]¶

Get entries in a given range

Parameters:

chr_id (str) – Chromosome name
start (int) – Start of the region to query
end (int) – End of the region to query
file_type (string (OPTIONAL)) – bed format returning the whole file as a string is the default option. list will return the bed rows but as a list of lists.

Returns:

bed (str (DEFAULT)) – List of strings for the rows in a bed file
bed_array (list) – List of lists of each row for the bed file format

BigWig Files¶

class reader.bigwig.bigwig_reader(user_id, file_id, cnf_loc='')[source]¶

Class related to handling the functions for interacting directly with the BigBed files. All required information should be passed to this class.

get_chromosomes()[source]¶

List the chromosome names and lengths

Returns:	chromosomes – Key value pair of chromosome name and the value is the length of the chromosome.
Return type:	dict

get_header()[source]¶

Get the bigWig header

Returns:	header
Return type:	dict

get_range(chr_id, start, end, file_type='wig')[source]¶

Get entries in a given range

Parameters:

chr_id (str) – Chromosome name
start (int) – Start of the region to query
end (int) – End of the region to query
format (string (OPTIONAL)) – wig format returning the whole file as a string is the default option. list will return the wig rows but as a list of lists.

Returns:

wig (str (DEFAULT)) – List of strings for the rows in a wig file
wig_array (list) – List of lists of each row for the wig file format

Tabix Files¶

class reader.tabix.tabix(user_id, file_id, cnf_loc='')[source]¶

Class related to handling the functions for interacting directly with the BigBed files. All required information should be passed to this class.

get_range(chr_id, start, end, file_type='gff3')[source]¶

Get entries in a given range

Parameters:

chr_id (str) – Chromosome name
start (int) – Start of the region to query
end (int) – End of the region to query
format (string (OPTIONAL)) – gff3 format returning the whole file as a string is the default option. list will return the gff3 rows but as a list of lists.

Returns:

gff3 (str (DEFAULT)) – List of strings for the rows in a gff3 file
wig_array (list) – List of each row for the gff3 file format