A pinboard by
Juanjiangmeng Du

Postdoc | Computational biologist | Cologne Center for Genomics


An interactive web-application for pathogenic prediction of variants identified in patients DNA

Currently available algorithms to predict DNA mutation pathogenicity are not able to accurately distinguish between pathogenic variants and benign rare variants in genes involved in disease etiology (e.g. epilepsy). Inability to predict the consequences of variants and thus disease severity minimizes the diagnostic yield of the genetic test.

Here, we developed a novel statistical inference framework that improves pathogenic variant identification based on the integration of genetic data (e.g. exome) from large healthy population (138,632 individuals) and disease databases. Our method allows fast prioritization from thousands of variants from a whole exome of a patient to a few significantly susceptible variants.

Our results will be available as an online resource, which will not only provide researchers, clinicians, patients and their families with improved and reliable information regarding the pathogenicity of variants found in disease-related genes, but also hold promise for early disease diagnostics, drug development and personalized therapy.


A unifying framework for evaluating the predictive power of genetic variants based on the level of heritability explained.

Abstract: An increasing number of genetic variants have been identified for many complex diseases. However, it is controversial whether risk prediction based on genomic profiles will be useful clinically. Appropriate statistical measures to evaluate the performance of genetic risk prediction models are required. Previous studies have mainly focused on the use of the area under the receiver operating characteristic (ROC) curve, or AUC, to judge the predictive value of genetic tests. However, AUC has its limitations and should be complemented by other measures. In this study, we develop a novel unifying statistical framework that connects a large variety of predictive indices together. We showed that, given the overall disease probability and the level of variance in total liability (or heritability) explained by the genetic variants, we can estimate analytically a large variety of prediction metrics, for example the AUC, the mean risk difference between cases and non-cases, the net reclassification improvement (ability to reclassify people into high- and low-risk categories), the proportion of cases explained by a specific percentile of population at the highest risk, the variance of predicted risks, and the risk at any percentile. We also demonstrate how to construct graphs to visualize the performance of risk models, such as the ROC curve, the density of risks, and the predictiveness curve (disease risk plotted against risk percentile). The results from simulations match very well with our theoretical estimates. Finally we apply the methodology to nine complex diseases, evaluating the predictive power of genetic tests based on known susceptibility variants for each trait.

Pub.: 15 Dec '10, Pinned: 27 Aug '17