Postdoctoral scholar, Tufts University
Analysis on our ability to build disease hierarchies from disease-gene association information
Disease taxonomies have played crucial roles in various research and clinical contexts, such as standardizing vocabularies in clinical and healthcare reports, systematically classifying disease symptoms, indexing Medline articles, and defining inter-relationships between biomedical terms. However, existing disease taxonomies, because they are focused on traditional diagnostic criteria or physiological features of human disease, fail to fully incorporate the rapidly growing amount of high-throughput data about molecular causes of human disease. More modern disease taxonomies would presumably be built by incorporating new molecular knowledge about human disease. As a first step towards the ultimate goal of building more molecular disease taxonomies, we analyzed our ability to infer disease relationships from molecular data alone. We designed a new technique called Parent Promotion to infer relationships between disease terms using disease-gene association information and compared its performance with an established ontology inference method (CliXO) and a minimum weight spanning tree approach. Our results imply that disease-gene association information can play an important part of the foundation of the ultimate disease taxonomies. Our experiments provide insights about the inference algorithms, impact of molecular data on the performance of the inference algorithms, and the current molecular content in existing disease taxonomies, and suggest future work which may ultimately lead to true modern disease taxonomies.
Abstract: The similarity of pair-wise diseases reveals the molecular relationships between them. For example, similar diseases have the potential to be treated by common therapeutic chemicals (TCs). In this paper, we introduced DisSim, an online system for exploring similar diseases, and comparing corresponding TCs. Currently, DisSim implemented five state-of-the-art methods to measure the similarity between Disease Ontology (DO) terms and provide the significance of the similarity score. Furthermore, DisSim integrated TCs of diseases from the Comparative Toxicogenomics Database (CTD), which can help to identify potential relationships between TCs and similar diseases. The system can be accessed from http://220.127.116.11:8080/DisSim.
Pub.: 28 Jul '16, Pinned: 04 Jul '17
Abstract: Pre-eclampsia (PE) is a clinical syndrome characterized by new-onset hypertension and proteinuria at ≥20 weeks of gestation, and is a leading cause of maternal and perinatal morbidity and mortality. Previous studies have gathered abundant data about PE such as risk factors and pathological findings. However, most of these data are not semantically structured. Clinical data on PE patients are often generated with semantic heterogeneity such as using disparate terminology to describe the same phenomena. In clinical studies, interoperability of heterogenic clinical data is required in various situations. In such a situation, it is necessary to develop an interoperable and standardized semantic framework to research the pathology of PE more comprehensively and to achieve interoperability of heterogenic clinical data of PE patients. In this study, we developed an ontology representing clinical features, treatments, genetic factors, environmental factors, and other aspects of the current knowledge in the domain of PE. We call this pre-eclampsia ontology "PEO". To achieve interoperability with other ontologies, the core structure of PEO was compliant with the hierarchy of the Basic Formal Ontology (BFO). The PEO incorporates a wide range of key concepts and terms of PE from clinical and biomedical research in structuring the knowledge base that is specific to PE; therefore, PEO is expected to enhance PE-specific information retrieval and knowledge discovery in both clinical and biomedical research fields.
Pub.: 28 Oct '16, Pinned: 04 Jul '17
Abstract: Successful applications of the gene ontology to the inference of functional relationships between gene products in recent years have raised the need for computational methods to automatically calculate semantic similarity between gene products based on semantic similarity of gene ontology terms. Nevertheless, existing methods, though having been widely used in a variety of applications, may significantly overestimate semantic similarity between genes that are actually not functionally related, thereby yielding misleading results in applications. To overcome this limitation, we propose to represent a gene product as a vector that is composed of information contents of gene ontology terms annotated for the gene product, and we suggest calculating similarity between two gene products as the relatedness of their corresponding vectors using three measures: Pearson's correlation coefficient, cosine similarity, and the Jaccard index. We focus on the biological process domain of the gene ontology and annotations of yeast proteins to study the effectiveness of the proposed measures. Results show that semantic similarity scores calculated using the proposed measures are more consistent with known biological knowledge than those derived using a list of existing methods, suggesting the effectiveness of our method in characterizing functional relationships between gene products.
Pub.: 26 Jun '14, Pinned: 04 Jul '17
Abstract: The extraction of complex events from biomedical text is a challenging task and requires in-depth semantic analysis. Previous approaches associate lexical and syntactic resources with ontologies for the semantic analysis, but fall short in testing the benefits from the use of domain knowledge.We developed a system that deduces implicit events from explicitly expressed events by using inference rules that encode domain knowledge. We evaluated the system with the inference module on three tasks: First, when tested against a corpus with manually annotated events, the inference module of our system contributes 53.2% of correct extractions, but does not cause any incorrect results. Second, the system overall reproduces 33.1% of the transcription regulatory events contained in RegulonDB (up to 85.0% precision) and the inference module is required for 93.8% of the reproduced events. Third, we applied the system with minimum adaptations to the identification of cell activity regulation events, confirming that the inference improves the performance of the system also on this task.Our research shows that the inference based on domain knowledge plays a significant role in extracting complex events from text. This approach has great potential in recognizing the complex concepts of such biomedical ontologies as Gene Ontology in the literature.
Pub.: 15 Dec '11, Pinned: 04 Jul '17
Abstract: Identification of novel drug targets is a critical step in drug development. Many recent studies have produced multiple types of data, which provides an opportunity to mine the relationships among them to predict drug targets. In this study, we present a novel integrative approach that combines ontology reasoning with network-assisted gene ranking to predict new drug targets. We utilized colorectal cancer (CRC) as a proof-of-concept use case to illustrate the approach. Starting from FDA-approved CRC drugs and the relationships among disease, drug, gene, pathway, and SNP in an ontology representing PharmGKB data, we inferred 113 potential CRC drug targets. We further prioritized these genes based on their relationships with CRC disease genes in the context of human protein-protein interaction networks. Thus, among the 113 potential drug targets, 15 were selected as the promising drug targets, including some genes that are supported by previous studies. Among them, EGFR, TOP1 and VEGFA are known targets of FDA-approved drugs. Additionally, CCND1 (cyclin D1), and PTGS2 (prostaglandin-endoperoxide synthase 2) have reported to be relevant to CRC or as potential drug targets based on the literature search. These results indicate that our approach is promising for drug target prediction for CRC treatment, which might be useful for other cancer therapeutics.
Pub.: 31 Mar '15, Pinned: 04 Jul '17
Abstract: Capture devices rise large scale trajectory data from moving objects. These devices use different technologies like global navigation satellite system (GNSS), wireless communication, radio-frequency identification (RFID), and other sensors. Huge trajectory data are available today. In this paper, we use an ontological data modeling approach to build a trajectory ontology from such large data. To accomplish reasoning over trajectories, the ontology must consider mobile object, domain and other knowledge. In our approach, we suggest expressing this knowledge in the form of rules. To annotate data with these rules, we need an inference mechanism over trajectory ontology. Experiments over our model using domain and temporal rules address an inference computation complexity. This complexity has two important factors: time computations and space storage. In order to reduce the inference complexity, we proposed optimizations, such as domain constraints and temporal neighbor refinements. In this paper, we define a refinement specifically for the application domain. Then, we evaluate our contribution over real trajectory data. Finally, the results show the positive impact of the last refinement on reducing the complexity of the inference mechanism. This refinement reduces half of the time computation and then allow considering larger data sets.
Pub.: 03 Mar '16, Pinned: 04 Jul '17