VariantKey  5.4.1
Numerical Encoding for Human Genetic Variants
rsidvar.h File Reference

Functions to read VariantKey-rsID binary files. More...

#include "binsearch.h"
#include "variantkey.h"

Go to the source code of this file.

Data Structures

struct  rsidvar_cols_t
 

Typedefs

typedef struct rsidvar_cols_t rsidvar_cols_t
 

Functions

static void mmap_vkrs_file (const char *file, mmfile_t *mf, rsidvar_cols_t *cvr)
 
static void mmap_rsvk_file (const char *file, mmfile_t *mf, rsidvar_cols_t *crv)
 
static uint64_t find_rv_variantkey_by_rsid (rsidvar_cols_t crv, uint64_t *first, uint64_t last, uint32_t rsid)
 
static uint64_t get_next_rv_variantkey_by_rsid (rsidvar_cols_t crv, uint64_t *pos, uint64_t last, uint32_t rsid)
 
static uint32_t find_vr_rsid_by_variantkey (rsidvar_cols_t cvr, uint64_t *first, uint64_t last, uint64_t vk)
 
static uint32_t get_next_vr_rsid_by_variantkey (rsidvar_cols_t cvr, uint64_t *pos, uint64_t last, uint64_t vk)
 
static uint32_t find_vr_chrompos_range (rsidvar_cols_t cvr, uint64_t *first, uint64_t *last, uint8_t chrom, uint32_t pos_min, uint32_t pos_max)
 

Detailed Description

The functions provided here allows fast search for rsID and VariantKey values from binary files made of adjacent constant-length binary blocks sorted in ascending order.

rsvk.bin: Lookup table to retrieve VariantKey from rsID. This binary file can be generated by the `resources/tools/rsvk.sh' script from a TSV file. This can also be in Apache Arrow File format with a single RecordBatch, or Feather format. The first column must contain the rsID sorted in ascending order.

vkrs.bin: Lookup table to retrieve rsID from VariantKey. This binary file can be generated by the `resources/tools/vkrs.sh' script from a TSV file. This can also be in Apache Arrow File format with a single RecordBatch, or Feather format. The first column must contain the VariantKey sorted in ascending order.

Typedef Documentation

Struct containing the RSVK or VKRS memory mapped file column info.

Function Documentation

static uint64_t find_rv_variantkey_by_rsid ( rsidvar_cols_t  crv,
uint64_t *  first,
uint64_t  last,
uint32_t  rsid 
)
inlinestatic

Search for the specified rsID and returns the first occurrence of VariantKey in the RV file.

Parameters
crvStructure containing the pointers to the RSVK memory mapped file columns (rsvk.bin).
firstPointer to the first element of the range to search (min value = 0). This will hold the position of the first record found.
lastElement (up to but not including) where to end the search (max value = nitems).
rsidrsID to search.
Returns
VariantKey data or zero data if not found
static uint32_t find_vr_chrompos_range ( rsidvar_cols_t  cvr,
uint64_t *  first,
uint64_t *  last,
uint8_t  chrom,
uint32_t  pos_min,
uint32_t  pos_max 
)
inlinestatic

Search for the specified CHROM-POS range and returns the first occurrence of rsID in the VR file.

Parameters
cvrStructure containing the pointers to the VKRS memory mapped file columns (vkrs.bin).
firstPointer to the first element of the range to search (min value = 0).
lastPointer to the Element (up to but not including) where to end the search (max value = nitems).
chromChromosome encoded number.
pos_minStart reference position, with the first base having position 0.
pos_maxEnd reference position, with the first base having position 0.
Returns
rsID
static uint32_t find_vr_rsid_by_variantkey ( rsidvar_cols_t  cvr,
uint64_t *  first,
uint64_t  last,
uint64_t  vk 
)
inlinestatic

Search for the specified VariantKey and returns the first occurrence of rsID in the VR file.

Parameters
cvrStructure containing the pointers to the VKRS memory mapped file columns (vkrs.bin).
firstPointer to the first element of the range to search (min value = 0). This will hold the position of the first record found.
lastElement (up to but not including) where to end the search (max value = nitems).
vkVariantKey.
Returns
rsID or 0 if not found
static uint64_t get_next_rv_variantkey_by_rsid ( rsidvar_cols_t  crv,
uint64_t *  pos,
uint64_t  last,
uint32_t  rsid 
)
inlinestatic

Get the next VariantKey for the specified rsID in the RV file. This function should be used after find_rv_variantkey_by_rsid. This function can be called in a loop to get all VariantKeys that are associated with the same rsID (if any).

Parameters
crvStructure containing the pointers to the RSVK memory mapped file columns (rsvk.bin).
posPointer to the current item. This will hold the position of the next record.
lastElement (up to but not including) where to end the search (max value = nitems).
rsidrsID to search.
Returns
VariantKey data or zero data if not found
static uint32_t get_next_vr_rsid_by_variantkey ( rsidvar_cols_t  cvr,
uint64_t *  pos,
uint64_t  last,
uint64_t  vk 
)
inlinestatic

Get the next rsID for the specified VariantKey in the VR file. This function should be used after find_vr_rsid_by_variantkey. This function can be called in a loop to get all rsIDs that are associated with the same VariantKey (if any).

Parameters
cvrStructure containing the pointers to the VKRS memory mapped file columns (vkrs.bin).
posPointer to the current item. This will hold the position of the next record.
lastElement (up to but not including) where to end the search (max value = nitems).
vkVariantKey.
Returns
rsID data or zero data if not found
static void mmap_rsvk_file ( const char *  file,
mmfile_t mf,
rsidvar_cols_t crv 
)
inlinestatic

Memory map the RSVK binary file.

Parameters
filePath to the file to map.
mfStructure containing the memory mapped file.
crvStructure containing the pointers to the RSVK memory mapped file columns.
Returns
Returns the memory-mapped file descriptors.
static void mmap_vkrs_file ( const char *  file,
mmfile_t mf,
rsidvar_cols_t cvr 
)
inlinestatic

Memory map the VKRS binary file.

Parameters
filePath to the file to map.
mfStructure containing the memory mapped file.
cvrStructure containing the pointers to the VKRS memory mapped file columns.
Returns
Returns the memory-mapped file descriptors.