VCF, BCF and GMEF
VCF(Variant Call Format) is a text file format for storing the genetic variant marker data: SNPs and INDELs(INsertion and DELetion). The VCF file functionally contain an meta-information header line and the following marker data lines. Each marker data line contains the marker' position information in the genome and all accessions' genotype information as the the REFerence allele and/or ALTernative allele.
BCF, being as the binary version of VCF, keeps the same information in VCF but is much more efficient to be processed.
BCFtools is a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF. The official development repository for BCFtools is avaliable here.
Sometimes, some users are only interested in some kinds of variant markers and its genotypes. Additionally, these users are not familiar with the script processing skills to use bcftools to extract the necessary genetic variant marker information.
GMEF(Genotype Marker Extracting and Filtering), being as web based pipeline, was developed. In GMEF, several genetic varaint call data including the medicago hapmap data, are hosted. The user only need to select the corresponding .vcf/bcf file, configure the necessary parameters, and submit. The GMEF pipeline then will call the bcftools and some scripts to extract and filter each marker data.
After finishing, the GMEF will return the accession information, extracted genotype marker information, and the genotype matrix data. The accession and marker information are stored as AccessionAtlas.txt,and MarkerAtlas.txt respectively. The extracted genotype matrix information are stored as GenotypeMatrix.txt and it compressed version GenotypeMatrix.txt.gz.
The returned genotype matrix data can be used in our another tool MAD-HiDTreefor multiple accessions distinguishment.