Kinship Matrix, Epistatic(2D) Kinship Matrix and GPU Empowered Epistatic (2D) Kinship Matrix Calculating |
|
In population genetics, the Kinship matrix is relatedness matrix, which is always used to measure the degree of relationship between any of two related individuals. Suppose there are n individuals, and each individual is represented explicitly by a node of pedigree tree or by a large number of genotypic markers(m). The kinship matrix K is a 2D matrix with the dimension of nxn . We can use the genotypic marker to generate a kinship and its calculation is a very critical step in genome-wide association studies(GWASs). Here, the kinship matrix entry K(i,j) is a coefficient to assess the genetic resemblance between individual i and individual j. Such a Kinship matrix is called the main effect (1D) kinship matrix when compared to our defined 2D epistatic effect (marker pair) kinship matrix. Consider the symmetry, the kinship matrix is a diagonal matrix. Xu et al. proposed a new polygenic Linear Mixed Model(LMM) for epistatic effect GWAS analysis. To solve the polygenic LMM, 4 kinds of epistatic kinship matrix and the formulas to use the marker pairs to calculate the epistatic kinship matrix were mathematically defined. Suppose there are m markers, the marker pair number is C(m,2)=m(m-1)/2 . Therefore, compared with the direct marker-assist main effect(1D) kinship matrix, the epistatic kinship matrix calculations are particular very time-consuming. In the recent years, GPU (Graphics Processing Units) with multiple hardware processor (>1,000) cores has become a standard HPC (High Performance Computing) solution system for large scale computing, e.g. large scale matrix operations. We have analyzed the math principle and the complexity of marker-pair assist epistatic kinship matrix, and successfully developed this GPU empowered pipeline,KMC2D , for epistatic effect kinship matrix calculation. Briefly, we first divide the ultra-high-dimensional marker pairs into successive blocks. We then calculate the kinship matrix for each block and merge the block-wise kinship matrices to form the genome-wide kinship matrix. All the matrix operations have been parallelized using GPU kernels on our NVIDIA GPU-accelerated server platform. Our performance analyses show that the calculation speed of KMC2D can be accelerated by several hundred times over the conventional CPU-based computing. The user are required to upload the two kinds of genotype matrix files for computing one of the four epistatic kinship matrix, e.g., additive-additive , additive-dominance, dominance-additive, or dominance-dominance, respectively. To calculate the main effect 1D kinship matrix, you may use our other GPU pipeline KMC1D. Reference: 1. Xu, S., "Mapping Quantitative Trait Loci by Controlling Polygenic Background Effects". Genetics, 2013. 195:1209-1222. 2. Zhang W., Dai X., Wang Q., Xu S., Zhao P.X., "PEPIS: A Pipeline for Estimating Epistatic Effects in Quantitative Trait Locus Mapping and Genome-Wide Association Studies", 2016. PLoS Comput Biol, 12(5) 3. Cecilia J. M. , Garc´ıa J. M. , and Ujaldon M., “The GPU on the Matrix-Matrix Multiply: Performance Study and Contributions”, in Parallel Computing: From Multicores and GPU’s to Petascale, B. Chapman et al., Eds. Advances in Parallel Computing, vol. 19, pp. 331-340, 2010. 4. Dobravec T., Bulic P., "Comparing CPU and GPU Implementations of a Simple Matrix Multiplication Algorithm", IJCEE, vol 9, 430-438, 2017. |
|
|||||