To support the continuous development of GPLEXUS, please cite: GPLEXUS: Enabling Genome-scale Gene Association Network Reconstruction and Analysis for Very Large-scale Expression Data, Nucleic Acids Research, 2013, doi: 10.1093/nar/gkt983.
The Zhao Bioinformatics Laboratory
GPLEXUS: Genome-scale Gene Association Network Reconstruction and Analysis for very large-scale expression data    
Location:   Home 

  Genome-scale Gene Association Network Reconstruction and Analysis for Very Large-scale Expression Data


Biological Significance

The availability of terabyte or even petabyte of gene expression data (big-data) in public repositories, for example, the Gene Expression Omnibus and the ArrayExpress Archive, has made it theoretically possible to reconstruct more accurate and unbiased genome-wide Gene Association Networks (GANs) to decipher the regulatory interactions that govern various complex traits or biological processes. The use of the large number of samples can substantially increase the accuracy of gene network reconstruction under a circumstance where we have a very large gene space (high dimension). However, the major obstacle for using the available big data is the lack of computationally less complex but accurate algorithms and the powerful computing platform. To develop accurate and ultrafast algorithm and parallel it on powerful platform will facilitate the use of big data to build more accurate and unbiased genome-wide GANs to decipher how biological processes and complex traits are regulated.

Bioinformatics Challenges

Many algorithms have been developed for reconstructing GANs. However, few methods are capable of identifying experimental conditions that would aid explain how and why genes are associated in the reconstructed networks. Furthermore, almost all current state-of-the-art computational methods for large-scale GAN reconstruction are computational resource hungry, demanding heavily on both CPU and memory resources. For this reason, these algorithms are valuable for constructing local networks but can hardly be extended to high dimensional gene expression data with large number of genes and samples. Challenges often occur for the organisms with large genomes, such as plants, which in turn demand a large sample size for building accurate GANs. Although gene reduction, a very challenging subject too, can be performed to obtain subsets of genes for building local GANs, functionally or topologically important sub-networks can be missed and bias can be generated during gene association analysis. Therefore, development of ultrafast methods to enable construction of genome-wide GANs and understand their mechanism of genetic interaction is imperative.

   The GPLEXUS: Genome-scale GAN Analysis Enabled and Empowered by Ultrafast Parallel Mutual Information Computing

We developed a novel online platform GPLEXUS, a publicly and freely available web server that enables and empowers genome-scale GAN Analysis. Key features of GPLEXUS include high performance construction of Gene Association Network (GANs), identification of functional subnetworks, and network analyses for novel biological discovery. Briefly, the GPLEXUS integrates the following key components and functionalities:

  • GPLEXUS adopts a construction-followed-by-refinement procedure to build GANs from very large-scale gene expression data;

  • GPLEXUS adopt a ultrafast Spearman correlation-based transformation to estimate the mutual information; Other two methods B-Spline-based pair-wise MI estimation, and Gaussian kernel-based pair-wise MI estimation are also integrated in our Platform;

  • GPLEXUS constructs GANs at high accuracy and sensitivity through efficiently removing potential false positive edges by applying Data Processing Inequality (DPI) filtering;

  • GPLEXUS integrates the Markov Clustering Algorithm (MCL) for effective subnetwork identification and discovery;

  • And, it is worth highlighting here that GPLEXUS implements an innovative function to identify experiment-specific conditions that majorly contribute to gene-gene associations in the constructed networks. Such analysis may greatly aid scientists to understand the regulatory mechanisms behind these interactions. This function is particularly important for studying plants, whose sessile life-style requires them to deal with ambient conditions. Learning subnetworks and their operating conditions will provide opportunities to understand and enhance plant adaptation for increasingly challenging environment conditions.

Performance Demonstration

The GPLEXUS was specially designed to enable genome-scale GAN analysis for the organisms with large number of genes, and is capable of analysis very larger-scale expression profile data; all integrative algorithms were implemented with parallel programming techniques in efficient C++ computing language, and deployed on a Linux cluster to achieve such high-performance computing capacity.

We demonstrated effectiveness of GPLEXUS in constructing global GANs, and identifying valid, novel, and explainable biological subnetworks involved in defense to biotic and abiotic stresses, and the gene subnetwork controlling cell cycle and division in the model plant, Arabidopsis thaliana. We constructed and analyzed genome-wide Arabidopsis GANs using 1848 arrays collected from the ArrayExpress Archive. The normalized gene expression data and array description are available for download at here (around 700MB) and here, respectively. The analysis results are available at

   Funding by the National Science Foundation    Funding by the Oklahoma Center for the Advancement of Science & Technology    Additional funding by the Samuel Roberts Noble Foundation

© Copyright 2013, the Samuel Roberts Noble Foundation, Inc. Site by Zhao Lab