Module Description

  • PIP_SNP is SNP Data Preprocessing Pipelines, which are specifically developed to solve the challenges of SNP Data due to high dimension of SNP markers and too much incompleteness of genotype status. The biological concept of LD is utilized, which can be understood as a group of SNP markers from a local range are inheritied together. Therefore, the whole genome can be mapped with LD Bins, and each LD bin can contain several correlated SNP Markers. A missing genotype can be inferred from the known genotypes at the same LD Bin. A representative marker of a bin can be generated by integrating the whole SNP markers of a LD Bin. PIP_SNP functionally include the LD mapping across the whole genome, missing genotype's imputing for each LD Bin, and representative marker synthesising for each LD Bin.
  • To be flexible, we developed two related but discerned pipelines PIP_SNP_Venue1 and PIP_SNP_Venue2. PIP_SNP_Venue1 start from the only numerical genotyped SNP Data, and it mainly include three modules: LD Bin Detecting and Mapping , LD Based Missing Genotype Imputing, and LD Based Synthesizing. PIP_SNP_Venue2 start from the existing LD Mapping result and the numerical genotyped SNP Data, and it mainly include two modules: LD Based Missing Genotype Imputing and LD Based Marker Synthesizing.
  • To efficiently synthesis the SNP data from random population, such as HapMap, and achieve a higher synthesising ratio, we considered the correlation of the neighbor connected and skipped SNPs. Another program called Deep Synthesising has been developed, which can be selected as the synthesising option to further synthesis the markers output from PIP_SNP.
  • PIP_SNP is developed in C++, and compiled in linux via a Open-Source IDE Code:Blocks and Windows via Visual Studio 2015. The user can download the proper version to local, compiled and run it as a Command Line .
  • Deep Synthesising is also developed in C++, and compiled in linux via Open-Source IDE Code:Blocks. The user can download the proper version of source code to local, compiled and run it as a Command Line.
  • In this Web interface version, we tried to provide an user-friendly convenience. The original genotyped SNP data can be very huge, we technically developed a moudle which can work in HTML5 browser and implement the resumable multithreading chunked data uploading. Additionally, the original genotyped SNP data can be stored in remote cloud server, such as google drive, PIP_SNP provide the options to allow user to provide the shared URL .

Download

User Manual

Test Dataset

Performance Evaluation

Source Code

Development Information

Language:C/C++
Current Version:V1.0
Platform:Linux (Code:Blocks) or Windows(Visual Studio 2015)
Licence:GPL 3.0
Status:Active
Last Update:04/18/2020
Contact:Wenchao Zhang, wezhang AT noble.org