GMEF is a web pipeline, which is aiming to extract and filter the genotype markers from the original genetic variant file, such as .vcf and .bcf. In GMEF, we have hosted several genetic variant data including the medicago hapmap data . The users only need to select the required variant call file, configure the parameters and submit.
The user can specific the marker type as SNP or/and INDEL, whether need to exclude the markers from chloroplast or/and scaffolds, whether the markers meet the specific criterias. The specific criterias include: MAF(Minum Allele Frequencey), Heterogenous Genotype Ratio, Unknown Genotype Ratio, and the Allele Length Difference. The first 3 criterias are calculated as a percentages.
Function Module Description
GMEF mainly include the following functional sub-procedures.
1. Call bcftool to extract the specific SNP/INDEL Markers
The main commands to extract the specific SNP/INDEL markers are as the folowings:
bcftools query -H -i'%TYPE="indels"' -f "%CHROM %POS %REF %ALT %DP %QUAL %AF %AC %AN %TYPE[ %GT] " bcf_file.bcf> bcf_text.txt
bcftools query -H -i'%TYPE="snps"' -f "%CHROM %POS %REF %ALT %DP %QUAL %AF %AC %AN %TYPE[ %GT] " bcf_file.bcf> bcf_text.txt
2. Call some script(mainly written by perl, bash) to exclude some specific markers.
The main commands to exclude the SNP/INDEL Markers from scaffold/chloroplast are as the followings:
cat bcf_text.txt | grep -v scaffold > bcf_text_exclude.txt
cat bcf_text.txt | grep -v chl > bcf_text_exclude.txt
3. Call an executable program(written by C/C++) to filter the markers that do not meet the specific criterias and output the results as 3 independent files.
The main command to call our executable program to filter the marker line and export the three result data is as the following:
GMEF -B bcf_text_exclude.txt -G genotype_matrix_result.txt -A accesion_atlas_result.txt -M marker_atlas_result.txt -S 10 -F 5 -U 5 -E 5 -D 5 .