KMC1D: GPU Empowered Parallel Pipeline for Main Effect (1D) Kinship Matrix Calculation

Kinship Matrix, Main effect(1D) Kinship Matrix and GPU Empowered Main effect (1D) Kinship Matrix Calculating

In population genetics, Kinship matrix is a relatedness matrix, which can be used to measure the degree of relationship between any of two related individuals. Suppose there are n individuals, and each individual is represented explicitly by a node of pedigree tree or by a large number of genotypic markers(m). The kinship matrix K is a 2D matrix with the dimension of nxn . We can use the genotypic markers to generate a kinship matrix and its calculation is a very critical step in genome-wide association studies (GWAS). Here, the kinship matrix entry K(i,j) is a coefficient to assess the genetic resemblance between individual i and individual j. Consider the symmetry, the kinship matrix is a diagonal matrix. Such Kinship matrix is called the main effect (1D) kinship matrix, which is compared to our defined 2D epistatic effect(marker pair) kinship matrix.

Recently, high-throughput sequencing, particularly NGS technology make it capable of sequencing and discovering a massive number of SNPs and furtherly explore the within-species diversity via constructing haplotype maps and conducting (GWAS). A typical GWAS study may need to call millions scale SNPs, and the genotypic markers. The kinship matrix calculating, as the first step of GWAS study, requires loading the massive genotypic data at first and then compute pair-wise individuals' relatedness. Therefore, the kinship matrix calculation essentially is very time-consuming, especially when the individual number amount to several thousands and the genotypic marker number reach to several millions.

In the recent years, GPU (Graphics Processing Units) with multiple hardware processor (>1,000) cores has become a standard HPC (High Performance Computing) solution system for large scale computing, e.g., large scale matrix operation.

We have analyzed the math principle and the complexity of the marker-assist kinship matrix, and successfully developed this GPU empowered pipeline, KMC1D, for main effect kinship matrix calculating. Briefly, we first divide the ultra-high-dimensional markers into successive blocks. We then calculate the kinship matrix for each block and merge the block-wise kinship matrices to form the genome-wide kinship matrix. All the matrix operations have been parallelized using GPU kernels on our NVIDIA GPU-accelerated server platform. Our performance analyses show that the calculation of KMC1D can achieve speed acceleration by hundreds of times over the conventional CPU-based computing.

The users are required to upload one kind of genotype matrix file for computing one of two main kinship matrix, e.g. additive, or dominance

Also, to assist in transmitting large-size genotype matrix file, we implemented a "resumable multithreading-chunked uploading" function for HTML5-compatible browsers

To calculate the epistatic effect 2D kinship matrix, you may use our other GPU pipeline KMC2D.

Reference

1. Xu, S., "Mapping Quantitative Trait Loci by Controlling Polygenic Background Effects". Genetics, 2013. 195(4):p.1709-23.

2. Zhang W., Dai X., Wang Q., Xu S., Zhao P.X., "PEPIS: A Pipeline for Estimating Epistatic Effects in Quantitative Trait Locus Mapping and Genome-Wide Association Studies", 2016. PLoS Comput Biol, 12(5)

3. Cecilia J. M. , Garc´ıa J. M. , and Ujaldon M., “The GPU on the Matrix-Matrix Multiply: Performance Study and Contributions”, in Parallel Computing: From Multicores and GPU’s to Petascale, B. Chapman et al., Eds. Advances in Parallel Computing, vol. 19, pp. 331-340, 2010.

4. Dobravec T., Bulic P., "Comparing CPU and GPU Implementations of a Simple Matrix Multiplication Algorithm", IJCEE, vol 9, 430-438, 2017.