Graduate student, Brown University
Algorithms characterizing cellular evolution and metastatic migration in cancer patients.
Tumors develop through the accumulation of mutations during the lifetime of an individual through an evolutionary process. As a result, many tumors are heterogeneous, containing multiple populations of cells each with its own unique combination of somatic mutations. This intra-tumor heterogeneity complicates the diagnosis and treatment of cancer. Accurate characterization of the evolutionary process is crucial to understanding cancer development and to providing targeted treatment for individual patients. Recent studies have shown that metastasis often occurs from cell populations present in minor proportions of the tumor and patients being considered for specific treatments may have cells within their tumor that already possess resistance to the therapy.
DNA sequencing of cancer genomes has played a large role in broadening our understanding of the process of cancer development. However, DNA sequencing data is large, complex and noisy. The human genome contains about 3 billion nucleotides, but currently we lack the technology to sequence the genome end-to-end; instead, we measure short snippets of the genome of each cell. Moreover, each sequencing sample is generally a mixture, composed of thousands of tumor cells. These factors significantly complicate analysis of tumors, as we see a high-level heterogeneity amongst individual tumor cells, and we do not observe the mutational profiles cells directly.
Like any evolutionary process, the mutational process giving rise to a tumor can be described by a phylogenetic tree, whose leaves correspond to present cells, and whose edges describe the ancestral relationships. However, due to the heterogeneity of tumors and the complexity of sequencing data, precise mathematical models and specialized algorithms are needed to accurately characterize tumor composition and reconstruct the evolutionary process.
Abstract: While in principle a seemingly infinite variety of combinations of mutations could result in tumor development, in practice it appears that most human cancers fall into a relatively small number of "sub-types," each characterized a roughly equivalent sequence of mutations by which it progresses in different patients. There is currently great interest in identifying the common sub-types and applying them to the development of diagnostics or therapeutics. Phylogenetic methods have shown great promise for inferring common patterns of tumor progression, but suffer from limits of the technologies available for assaying differences between and within tumors. One approach to tumor phylogenetics uses differences between single cells within tumors, gaining valuable information about intra-tumor heterogeneity but allowing only a few markers per cell. An alternative approach uses tissue-wide measures of whole tumors to provide a detailed picture of averaged tumor state but at the cost of losing information about intra-tumor heterogeneity.The present work applies "unmixing" methods, which separate complex data sets into combinations of simpler components, to attempt to gain advantages of both tissue-wide and single-cell approaches to cancer phylogenetics. We develop an unmixing method to infer recurring cell states from microarray measurements of tumor populations and use the inferred mixtures of states in individual tumors to identify possible evolutionary relationships among tumor cells. Validation on simulated data shows the method can accurately separate small numbers of cell states and infer phylogenetic relationships among them. Application to a lung cancer dataset shows that the method can identify cell states corresponding to common lung tumor types and suggest possible evolutionary relationships among them that show good correspondence with our current understanding of lung tumor development.Unmixing methods provide a way to make use of both intra-tumor heterogeneity and large probe sets for tumor phylogeny inference, establishing a new avenue towards the construction of detailed, accurate portraits of common tumor sub-types and the mechanisms by which they develop. These reconstructions are likely to have future value in discovering and diagnosing novel cancer sub-types and in identifying targets for therapeutic development.
Pub.: 22 Jan '10, Pinned: 30 Jun '17
Abstract: We present a latent feature allocation model to reconstruct tumor subclones subject to phylogenetic evolution that mimics tumor evolution. Similar to most current methods, we consider data from next-generation sequencing. Unlike most methods that use information in short reads mapped to single nucleotide variants (SNVs), we consider subclone reconstruction using pairs of two proximal SNVs that can be mapped by the same short reads. As part of the Bayesian inference model, we construct a phylogenetic tree prior. The use of the tree structure in the prior greatly strengthens inference. Only subclones that can be approximated by a phylogenetic tree are assigned non-negligible probability. The proposed Bayesian framework implies posterior distributions on the number of subclones, their genotypes, cellular proportions, and the phylogenetic tree spanned by the inferred subclones. The proposed method is validated against different sets of simulated and real-world data using single and multiple tumor samples. An open source software package is available at http://www.compgenome.org/pairclonetree
Pub.: 10 Mar '17, Pinned: 30 Jun '17
Abstract: Determining the evolutionary history of metastases is a key problem in cancer biology. Several recent studies have presented inferences regarding the origin of metastases based on phylogenies of cancer lineages. Many of these studies have concluded that the observed monophyly of metastatic subclones favored metastasis-to-metastasis spread ("a metastatic cascade" rather than parallel metastases from the primary tumor). In this article, we argue that identifying a monophyletic clade of metastatic subclones does not provide sufficient evidence to unequivocally establish a history of metastatic cascades. In the absence of a complete phylogeny of the subclones within the primary tumor, a scenario of parallel metastatic events from the primary tumor is an equally plausible interpretation. Future phylogenetic studies on the origin of metastases should obtain a complete phylogeny of subclones within the primary tumor. This complete phylogeny may be obtainable by ultra-deep sequencing and phasing of large sections or by targeted sequencing of many small, spatially heterogeneous sections, followed by phylogenetic reconstruction using well-established molecular evolutionary models. In addition to resolving the evolutionary history of metastases, a complete phylogeny of subclones within the primary tumor facilitates the identification of driver mutations by application of phylogeny-based tests of natural selection.
Pub.: 12 Aug '15, Pinned: 30 Jun '17
Abstract: Clinical oncology is being revolutionized by the increasing use of molecularly targeted therapies. This paradigm holds great promise for improving cancer treatment; however, allocating specific therapies to the patients who are most likely to derive a durable benefit continues to represent a considerable challenge. Evidence continues to emerge that cancers are characterized by extensive intratumour genetic heterogeneity, and that patients being considered for treatment with a targeted agent might, therefore, already possess resistance to the drug in a minority of cells. Indeed, multiple examples of pre-existing subclonal resistance mutations to various molecularly targeted agents have been described, which we review herein. Early detection of pre-existing or emerging drug resistance could enable more personalized use of targeted cancer therapy, as patients could be stratified to receive the therapies that are most likely to be effective. We consider how monitoring of drug resistance could be incorporated into clinical practice to optimize the use of targeted therapies in individual patients.
Pub.: 21 Oct '15, Pinned: 30 Jun '17
Abstract: High-throughput sequencing allows the detection and quantification of frequencies of somatic single nucleotide variants (SNV) in heterogeneous tumor cell populations. In some cases, the evolutionary history and population frequency of the subclonal lineages of tumor cells present in the sample can be reconstructed from these SNV frequency measurements. However, automated methods to do this reconstruction are not available and the conditions under which reconstruction is possible have not been described. We describe the conditions under which the evolutionary history can be uniquely reconstructed from SNV frequencies from single or multiple samples from the tumor population and we introduce a new statistical model, PhyloSub, that infers the phylogeny and genotype of the major subclonal lineages represented in the population of cancer cells. It uses a Bayesian nonparametric prior over trees that groups SNVs into major subclonal lineages and automatically estimates the number of lineages and their ancestry. We sample from the joint posterior distribution over trees to identify evolutionary histories and cell population frequencies that have the highest probability of generating the observed SNV frequency data. When multiple phylogenies are consistent with a given set of SNV frequencies, PhyloSub represents the uncertainty in the tumor phylogeny using a partial order plot. Experiments on a simulated dataset and two real datasets comprising tumor samples from acute myeloid leukemia and chronic lymphocytic leukemia patients demonstrate that PhyloSub can infer both linear (or chain) and branching lineages and its inferences are in good agreement with ground truth, where it is available.
Pub.: 02 Nov '13, Pinned: 30 Jun '17
Abstract: DNA sequencing of multiple samples from the same tumor provides data to analyze the process of clonal evolution in the population of cells that give rise to a tumor.We formalize the problem of reconstructing the clonal evolution of a tumor using single-nucleotide mutations as the variant allele frequency (VAF) factorization problem. We derive a combinatorial characterization of the solutions to this problem and show that the problem is NP-complete. We derive an integer linear programming solution to the VAF factorization problem in the case of error-free data and extend this solution to real data with a probabilistic model for errors. The resulting AncesTree algorithm is better able to identify ancestral relationships between individual mutations than existing approaches, particularly in ultra-deep sequencing data when high read counts for mutations yield high confidence VAFs.An implementation of AncesTree is available at: http://compbio.cs.brown.edu/software.
Pub.: 15 Jun '15, Pinned: 30 Jun '17
Abstract: Phylogenetic techniques are increasingly applied to infer the somatic mutational history of a tumor from DNA sequencing data. However, standard phylogenetic tree reconstruction techniques do not account for the fact that bulk sequencing data measures mutations in a population of cells. We formulate and solve the multi-state perfect phylogeny mixture deconvolution problem of reconstructing a phylogenetic tree given mixtures of its leaves, under the multi-state perfect phylogeny, or infinite alleles model. Our somatic phylogeny reconstruction using combinatorial enumeration (SPRUCE) algorithm uses this model to construct phylogenetic trees jointly from single-nucleotide variants (SNVs) and copy-number aberrations (CNAs). We show that SPRUCE addresses complexities in simultaneous analysis of SNVs and CNAs. In particular, there are often many possible phylogenetic trees consistent with the data, but the ambiguity decreases considerably with an increasing number of samples. These findings have implications for tumor sequencing strategies, suggest caution in drawing strong conclusions based on a single tree reconstruction, and explain difficulties faced by applying existing phylogenetic techniques to tumor sequencing data.
Pub.: 29 Jul '16, Pinned: 30 Jun '17
Abstract: Cancer is a somatic evolutionary process characterized by the accumulation of mutations, which contribute to tumor growth, clinical progression, immune escape, and drug resistance development. Evolutionary theory can be used to analyze the dynamics of tumor cell populations and to make inference about the evolutionary history of a tumor from molecular data. We review recent approaches to modeling the evolution of cancer, including population dynamics models of tumor initiation and progression, phylogenetic methods to model the evolutionary relationship between tumor subclones, and probabilistic graphical models to describe dependencies among mutations. Evolutionary modeling helps to understand how tumors arise and will also play an increasingly important prognostic role in predicting disease progression and the outcome of medical interventions, such as targeted therapy.
Pub.: 09 Oct '14, Pinned: 30 Jun '17
Join Sparrho today to stay on top of science
Discover, organise and share research that matters to you