Article quick-view

Incorporating prior knowledge into Gene Network Study.

ABSTRACT

A major goal in genomic research is to identify genes that may jointly influence a biological response. From many years of intensive biomedical research, a large body of biological knowledge, or pathway information, has accumulated in available databases. There is a strong interest in leveraging these pathways to improve the statistical power and interpretability in studying gene networks associated with complex phenotypes. This prior information is a valuable complement to large-scale genomic data such as gene expression data generated from microarrays. However, it is a non-trivial task to effectively integrate available biological knowledge into gene expression data when reconstructing gene networks.In this article, we developed and applied a Lasso method from a Bayesian perspective, a method we call prior Lasso (pLasso), for the reconstruction of gene networks. In this method, we partition edges between genes into two subsets: one subset of edges is present in known pathways, whereas the other has no prior information associated. Our method assigns different prior distributions to each subset according to a modified Bayesian information criterion that incorporates prior knowledge on both the network structure and the pathway information. Simulation studies have indicated that the method is more effective in recovering the underlying network than a traditional Lasso method that does not use the prior information. We applied pLasso to microarray gene expression datasets, where we used information from the Pathway Commons (PC) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) as prior information for the network reconstruction, and successfully identified network hub genes associated with clinical outcome in cancer patients.The source code is available at http://nba.uth.tmc.edu/homepage/liu/pLasso.