VariantKey  5.4.1
Numerical Encoding for Human Genetic Variants
nrvk.h File Reference

Functions to retrieve REF and ALT values by VariantKey from binary data file. More...

#include <stdio.h>
#include <string.h>
#include "binsearch.h"
#include "variantkey.h"

Go to the source code of this file.

Data Structures

struct  variantkey_rev_t
 
struct  nrvk_cols_t
 

Macros

#define ALLELE_MAXSIZE   256
 Maximum allele length. More...
 

Typedefs

typedef struct variantkey_rev_t variantkey_rev_t
 
typedef struct nrvk_cols_t nrvk_cols_t
 

Functions

static void mmap_nrvk_file (const char *file, mmfile_t *mf, nrvk_cols_t *nvc)
 
static size_t get_nrvk_ref_alt_by_pos (nrvk_cols_t nvc, uint64_t pos, char *ref, size_t *sizeref, char *alt, size_t *sizealt)
 
static size_t find_ref_alt_by_variantkey (nrvk_cols_t nvc, uint64_t vk, char *ref, size_t *sizeref, char *alt, size_t *sizealt)
 
static size_t reverse_variantkey (nrvk_cols_t nvc, uint64_t vk, variantkey_rev_t *rev)
 
static size_t get_variantkey_ref_length (nrvk_cols_t nvc, uint64_t vk)
 
static uint32_t get_variantkey_endpos (nrvk_cols_t nvc, uint64_t vk)
 
static uint64_t get_variantkey_chrom_startpos (uint64_t vk)
 Get the CHROM + START POS encoding from VariantKey. More...
 
static uint64_t get_variantkey_chrom_endpos (nrvk_cols_t nvc, uint64_t vk)
 Get the CHROM + END POS encoding from VariantKey. More...
 
static size_t nrvk_bin_to_tsv (nrvk_cols_t nvc, const char *tsvfile)
 

Detailed Description

The functions provided here allows to retrieve the REF and ALT strings for a given VariantKey from a binary file.

The input binary files can be generated from a normalized VCF file using the resources/tools/vkhexbin.sh. The VCF file can be normalized using the resources/tools/vcfnorm.sh script.

The binary file can be generated by the `resources/tools/nrvk.sh' script from a TSV file with the following format:

[16 BYTE VARIANTKEY HEX]\t[REF STRING]\t[ALT STRING]\n...

for example:

b800c35bbcece603    AAAAAAAAGG      AG
1800c351f61f65d3    A       AAGAAAGAAAG

Macro Definition Documentation

#define ALLELE_MAXSIZE   256

Typedef Documentation

typedef struct nrvk_cols_t nrvk_cols_t

Struct containing the NRVK memory mapped file column info.

VariantKey decoded struct

Function Documentation

static size_t find_ref_alt_by_variantkey ( nrvk_cols_t  nvc,
uint64_t  vk,
char *  ref,
size_t *  sizeref,
char *  alt,
size_t *  sizealt 
)
inlinestatic

Retrieve the REF and ALT strings for the specified VariantKey.

Parameters
nvcStructure containing the pointers to the memory mapped file columns.
vkVariantKey to search.
refREF string buffer to be returned.
sizerefPointer to the size of the ref buffer, excluding the terminating null byte. This will contain the final ref size.
altALT string buffer to be returned.
sizealtPointer to the size of the alt buffer, excluding the terminating null byte. This will contain the final alt size.
Returns
REF+ALT length or 0 if the VariantKey is not found.
static size_t get_nrvk_ref_alt_by_pos ( nrvk_cols_t  nvc,
uint64_t  pos,
char *  ref,
size_t *  sizeref,
char *  alt,
size_t *  sizealt 
)
inlinestatic
static uint64_t get_variantkey_chrom_endpos ( nrvk_cols_t  nvc,
uint64_t  vk 
)
inlinestatic
Parameters
nvcStructure containing the pointers to the memory mapped file columns.
vkVariantKey code.
Returns
CHROM + END POS.
static uint64_t get_variantkey_chrom_startpos ( uint64_t  vk)
inlinestatic
Parameters
vkVariantKey code.
Returns
CHROM + START POS.
static uint32_t get_variantkey_endpos ( nrvk_cols_t  nvc,
uint64_t  vk 
)
inlinestatic

Get the VariantKey end position (POS + REF length).

Parameters
nvcStructure containing the pointers to the memory mapped file columns.
vkVariantKey.
Returns
Variant end position (POS + REF length).
static size_t get_variantkey_ref_length ( nrvk_cols_t  nvc,
uint64_t  vk 
)
inlinestatic

Retrieve the REF length for the specified VariantKey.

Parameters
nvcStructure containing the pointers to the memory mapped file columns.
vkVariantKey.
Returns
REF length or 0 if the VariantKey is not reversible and not found.
static void mmap_nrvk_file ( const char *  file,
mmfile_t mf,
nrvk_cols_t nvc 
)
inlinestatic

Memory map the NRVK binary file.

Parameters
filePath to the file to map.
mfStructure containing the memory mapped file.
nvcStructure containing the pointers to the memory mapped file columns.
Returns
Returns the memory-mapped file descriptors.
static size_t nrvk_bin_to_tsv ( nrvk_cols_t  nvc,
const char *  tsvfile 
)
inlinestatic

Convert a vrnr.bin file to a simple TSV. For the reverse operation see the resources/tools/nrvk.sh script.

Parameters
nvcStructure containing the pointers to the memory mapped file columns.
tsvfileOutput tsv file name. NOTE: existing files will be replaced.
Returns
Number of written bytes or 0 in case of error.
static size_t reverse_variantkey ( nrvk_cols_t  nvc,
uint64_t  vk,
variantkey_rev_t rev 
)
inlinestatic

Reverse a VariantKey code and returns the normalized components as variantkey_rev_t structure.

Parameters
nvcStructure containing the pointers to the memory mapped file columns.
vkVariantKey code.
revStructure containing the return values.
Returns
A variantkey_rev_t structure.