Custom Reader APIs

HDF5 Files

Hi-C Adjacency Files

class reader.hdf5_adjacency.adjacency(user_id, file_id, resolution=None, cnf_loc='')[source]

Class related to handling the functions for interacting directly with the HDF5 files. All required information should be passed to this class.

close()[source]

Close the HDF5 data file handle

get_chromosome_from_array_index(index)[source]

Identify the chromosome based on either the x or y coordinate in the array.

Parameters:index (int) – Location within the array
Returns:chr_id – Identity of the chromosome
Return type:str

Example

1
2
3
from reader import adjacency
r = adjacency('test', '', 10000)
cid = r.get_chromosome_from_array_index(1234567890)
get_chromosome_parameters()[source]

Return a list of the available resolutions in a given HDF5 file

Returns:chromosomes : list chr_param : dict resolitions
Return type:dict

Example

1
2
3
from reader import adjacency
r = adjacency('test', '', 10000)
value = r.get_chromosome_parameters()
get_chromosomes()[source]

List of chromosomes that have models at a given resolution

Returns:chromosomes – List of chromosomes at the set resolution
Return type:list
get_details()[source]

Return a list of the available resolutions in a given HDF5 file

get_range(chr_id, start, end, limit_chr=None, limit_start=None, limit_end=None, value_url='/api/getValue', no_links=None)[source]

Get the interactions that happen within a defined region on a specific chromosome. Returns inter and intra interactions with the defined region.

Parameters:
  • chr_id (str) – Chromosomal name
  • start (int) – Start position within the chromosome
  • end (int) – End position within the chromosome
  • limit_chr (str (Optional)) – Limit the results to a particular chromosome
  • limit_start (int (Optional)) – Limit the range start position on the limit_chr paramter
  • limit_end (int (Optional)) – Limit the range end position on the limit_chr parameter
  • value_url (str (Optional)) – Define a custom URL snippet for the location of the file if different from the defaul
  • no_links (bool (Optional)) – Will return the URL links to the individual points within the adjacency matrix. In cases where this generates a large number of points it is possible to turn off generating these links. Set this value to 1.
Returns:

log : list

List of messages about the state for debugging

results : list

List of values for given positions within the adjacency matrix

Return type:

dict

Example

1
2
3
from reader import adjacency
r = adjacency('test', '', 10000)
value = r.get_range(2000000, 1000000)
get_resolution()[source]

List the current level of rseolution

Returns:resolution – Current level of resolution
Return type:int
get_resolutions()[source]

List resolutions that models have been generated for

Returns:list – Available levels of resolution that can be set
Return type:str
get_value(bin_i, bin_j)[source]

Get a specific value for a given dataset, resolution

Parameters:
  • bin_i (int) – Array position in the first dimension
  • bin_j (int) – Array position in the second dimension
Returns:

value – Value for a given cell in the adjacency array

Return type:

int

Example

1
2
3
from reader import adjacency
r = adjacency('test', '', 10000)
value = r.get_value(2000000, 1000000)
set_resolution(resolution)[source]

Set, or change, the resolution level

Parameters:resolution (int) – Level of resolution

Hi-C Coordinate Files

class reader.hdf5_coord.coord(user_id, file_id, resolution=None, cnf_loc='')[source]

Class related to handling the functions for interacting directly with the HDF5 files. All required information should be passed to this class.

close()[source]

Tidy function to close file handles

get_centroids(region_id)[source]

List the centroid models for each cluster

Returns:centroids – List of the centroid models for each cluster
Return type:list
get_chromosomes()[source]

List of chromosomes that have models at a given resolution

Returns:chromosomes – List of chromosomes at the set resolution
Return type:list
get_clusters(region_id)[source]

List all clusters of models

Returns:clusters – List of models in each cluster
Return type:list
get_model(region_id, model_ids=None, page=0, mpp=10)[source]

Get the coordinates within a defined region on a specific chromosome. If the model_id is not returned the the consensus models for that region are returned

Parameters:
  • region_id (str) – Region ID
  • model_ids (list) – List of model IDs for the models that are required
  • page (int) – Page number
  • mpp (int) – Number of models per page (default: 10; max: 100)
Returns:

array

model : dict
metadata : dict

Relevant extra meta data added by TADbit

object : dict

Key value pair of information about the region

models : list

List of dictionaries for each model

clusters : list

List of models for each cluster

centroids : list

List of all centroid models

restraints : list

List of retraints for each position

hic_data : dict

Hi-C model data

metadata : dict
model_count : int

Count of the number of models for the defined region ID

page_count : int

Number of pages

Return type:

list

get_models(region_id)[source]

List all models for a given region

Returns:model_id : int cluster_id : int
Return type:List
get_object_data(region_id)[source]

Prepare the object header data structure ready for printing

Parameters:region_id (int) – Region that is getting downloaded
Returns:objectdata – All headers and values required for the JSON output
Return type:dict
get_region_order(chr_id=None, region=None)[source]

List the regions on a given chromosome ID or region ID in the order that they are located on the chromosome

Parameters:
  • chr_id (str) – Chromosome ID
  • region (str) – Region ID
Returns:

region_id : str

List of the region IDs

Return type:

list

get_regions(chr_id, start, end)[source]

List regions that are within a given range on a chromosome

Parameters:
  • chr_id (str) – Chromosome ID
  • start (int) – Start position
  • end (int) – Stop position
Returns:

regions – List of region IDs whose parameters match those provided

Return type:

list

get_resolution()[source]

List the current level of rseolution

Returns:resolution – Current level of resolution
Return type:int
get_resolutions()[source]

List resolutions that models have been generated for

Returns:list – Available levels of resolution that can be set
Return type:str
set_resolution(resolution)[source]

Set, or change, the resolution level

Parameters:resolution (int) – Level of resolution

Text File Index

Lists all files that are available for a user in bed and wig formats and lists the files than have data in a given region so that only the required files are requested by the client

class reader.hdf5_reader.hdf5_reader(user_id, file_id, cnf_loc='')[source]

Class related to handling the functions for interacting directly with the HDF5 files. All required information should be passed to this class.

close()[source]

Tidy function to close file handles

Example

1
2
3
from hdf5_reader import hdf5_reader
h5r = hdf5_reader('test')
h5r.close()
get_assemblies()[source]

List all assemblies for which there are files that have been indexed

Returns:assembly – List of assemblies in the index
Return type:list

Example

1
2
3
from hdf5_reader import hdf5_reader
h5r = hdf5_reader('test')
h5r.assemblies()
get_chromosomes(assembly)[source]

List all chromosomes that are covered by the index

Parameters:assembly (str) – Genome assembly ID
Returns:chromosomes – List of the chromosomes for a given assembly in the index
Return type:list

Example

1
2
3
4
from hdf5_reader import hdf5_reader
h5r = hdf5_reader('test')
asm = h5r.assemblies()
chr_list = h5r.get_chromosomes(asm[0])
get_files(assembly)[source]

List all files for an assembly. If files are missing they can either get loaded or the search can be performed directly on the bigBed files

Parameters:assembly (str) – Genome assembly ID
Returns:file_ids – List of file ids for a given assembly in the index
Return type:list

Example

1
2
3
4
from hdf5_reader import hdf5_reader
h5r = hdf5_reader('test')
asm = h5r.assemblies()
file_list = h5r.get_files(asm[0])
get_regions(assembly, chromosome_id, start, end)[source]

List files that have data in a given region.

Parameters:
  • assembly (str) – Genome assembly ID
  • chromosome_id (str) – Chromosome names as listed by the get_files function
  • start (int) – Start position for the region of interest
  • end (int) – End position for the region of interest
Returns:

file_ids – List of the file_ids that have sequence features within the region of interest

Return type:

list

Example

1
2
3
4
from hdf5_reader import hdf5_reader
h5r = hdf5_reader('test')
asm = h5r.assemblies()
file_list = h5r.get_chromosomes(asm[0], 1, 1000000, 1100000)

BigBed Files

class reader.bigbed.bigbed_reader(user_id, file_id, cnf_loc='')[source]

Class related to handling the functions for interacting directly with the BigBed files. All required information should be passed to this class.

close()[source]

Tidy function to close file handles

Example

1
2
3
from reader.bigbed import bigbed_reader
bbr = bigbed_reader('test')
bbr.close()
get_chromosomes()[source]

List the chromosome names and lengths

Returns:chromosomes – Key value pair of chromosome name and the value is the length of the chromosome.
Return type:dict
get_header()[source]

Get the bigBed header

Returns:header
Return type:dict
get_range(chr_id, start, end, file_type='bed')[source]

Get entries in a given range

Parameters:
  • chr_id (str) – Chromosome name
  • start (int) – Start of the region to query
  • end (int) – End of the region to query
  • file_type (string (OPTIONAL)) – bed format returning the whole file as a string is the default option. list will return the bed rows but as a list of lists.
Returns:

  • bed (str (DEFAULT)) – List of strings for the rows in a bed file
  • bed_array (list) – List of lists of each row for the bed file format

BigWig Files

class reader.bigwig.bigwig_reader(user_id, file_id, cnf_loc='')[source]

Class related to handling the functions for interacting directly with the BigBed files. All required information should be passed to this class.

get_chromosomes()[source]

List the chromosome names and lengths

Returns:chromosomes – Key value pair of chromosome name and the value is the length of the chromosome.
Return type:dict
get_header()[source]

Get the bigWig header

Returns:header
Return type:dict
get_range(chr_id, start, end, file_type='wig')[source]

Get entries in a given range

Parameters:
  • chr_id (str) – Chromosome name
  • start (int) – Start of the region to query
  • end (int) – End of the region to query
  • format (string (OPTIONAL)) – wig format returning the whole file as a string is the default option. list will return the wig rows but as a list of lists.
Returns:

  • wig (str (DEFAULT)) – List of strings for the rows in a wig file
  • wig_array (list) – List of lists of each row for the wig file format

Tabix Files

class reader.tabix.tabix(user_id, file_id, cnf_loc='')[source]

Class related to handling the functions for interacting directly with the BigBed files. All required information should be passed to this class.

get_range(chr_id, start, end, file_type='gff3')[source]

Get entries in a given range

Parameters:
  • chr_id (str) – Chromosome name
  • start (int) – Start of the region to query
  • end (int) – End of the region to query
  • format (string (OPTIONAL)) – gff3 format returning the whole file as a string is the default option. list will return the gff3 rows but as a list of lists.
Returns:

  • gff3 (str (DEFAULT)) – List of strings for the rows in a gff3 file
  • wig_array (list) – List of each row for the gff3 file format