A pinboard by
Gryte Satas

Graduate student, Brown University


Algorithms characterizing cellular evolution and metastatic migration in cancer patients.

Tumors develop through the accumulation of mutations during the lifetime of an individual through an evolutionary process. As a result, many tumors are heterogeneous, containing multiple populations of cells each with its own unique combination of somatic mutations. This intra-tumor heterogeneity complicates the diagnosis and treatment of cancer. Accurate characterization of the evolutionary process is crucial to understanding cancer development and to providing targeted treatment for individual patients. Recent studies have shown that metastasis often occurs from cell populations present in minor proportions of the tumor and patients being considered for specific treatments may have cells within their tumor that already possess resistance to the therapy.

DNA sequencing of cancer genomes has played a large role in broadening our understanding of the process of cancer development. However, DNA sequencing data is large, complex and noisy. The human genome contains about 3 billion nucleotides, but currently we lack the technology to sequence the genome end-to-end; instead, we measure short snippets of the genome of each cell. Moreover, each sequencing sample is generally a mixture, composed of thousands of tumor cells. These factors significantly complicate analysis of tumors, as we see a high-level heterogeneity amongst individual tumor cells, and we do not observe the mutational profiles cells directly.

Like any evolutionary process, the mutational process giving rise to a tumor can be described by a phylogenetic tree, whose leaves correspond to present cells, and whose edges describe the ancestral relationships. However, due to the heterogeneity of tumors and the complexity of sequencing data, precise mathematical models and specialized algorithms are needed to accurately characterize tumor composition and reconstruct the evolutionary process.


Applying unmixing to gene expression data for tumor phylogeny inference.

Abstract: While in principle a seemingly infinite variety of combinations of mutations could result in tumor development, in practice it appears that most human cancers fall into a relatively small number of "sub-types," each characterized a roughly equivalent sequence of mutations by which it progresses in different patients. There is currently great interest in identifying the common sub-types and applying them to the development of diagnostics or therapeutics. Phylogenetic methods have shown great promise for inferring common patterns of tumor progression, but suffer from limits of the technologies available for assaying differences between and within tumors. One approach to tumor phylogenetics uses differences between single cells within tumors, gaining valuable information about intra-tumor heterogeneity but allowing only a few markers per cell. An alternative approach uses tissue-wide measures of whole tumors to provide a detailed picture of averaged tumor state but at the cost of losing information about intra-tumor heterogeneity.The present work applies "unmixing" methods, which separate complex data sets into combinations of simpler components, to attempt to gain advantages of both tissue-wide and single-cell approaches to cancer phylogenetics. We develop an unmixing method to infer recurring cell states from microarray measurements of tumor populations and use the inferred mixtures of states in individual tumors to identify possible evolutionary relationships among tumor cells. Validation on simulated data shows the method can accurately separate small numbers of cell states and infer phylogenetic relationships among them. Application to a lung cancer dataset shows that the method can identify cell states corresponding to common lung tumor types and suggest possible evolutionary relationships among them that show good correspondence with our current understanding of lung tumor development.Unmixing methods provide a way to make use of both intra-tumor heterogeneity and large probe sets for tumor phylogeny inference, establishing a new avenue towards the construction of detailed, accurate portraits of common tumor sub-types and the mechanisms by which they develop. These reconstructions are likely to have future value in discovering and diagnosing novel cancer sub-types and in identifying targets for therapeutic development.

Pub.: 22 Jan '10, Pinned: 30 Jun '17

Inferring the Origin of Metastases from Cancer Phylogenies.

Abstract: Determining the evolutionary history of metastases is a key problem in cancer biology. Several recent studies have presented inferences regarding the origin of metastases based on phylogenies of cancer lineages. Many of these studies have concluded that the observed monophyly of metastatic subclones favored metastasis-to-metastasis spread ("a metastatic cascade" rather than parallel metastases from the primary tumor). In this article, we argue that identifying a monophyletic clade of metastatic subclones does not provide sufficient evidence to unequivocally establish a history of metastatic cascades. In the absence of a complete phylogeny of the subclones within the primary tumor, a scenario of parallel metastatic events from the primary tumor is an equally plausible interpretation. Future phylogenetic studies on the origin of metastases should obtain a complete phylogeny of subclones within the primary tumor. This complete phylogeny may be obtainable by ultra-deep sequencing and phasing of large sections or by targeted sequencing of many small, spatially heterogeneous sections, followed by phylogenetic reconstruction using well-established molecular evolutionary models. In addition to resolving the evolutionary history of metastases, a complete phylogeny of subclones within the primary tumor facilitates the identification of driver mutations by application of phylogeny-based tests of natural selection.

Pub.: 12 Aug '15, Pinned: 30 Jun '17

Inferring clonal evolution of tumors from single nucleotide somatic mutations

Abstract: High-throughput sequencing allows the detection and quantification of frequencies of somatic single nucleotide variants (SNV) in heterogeneous tumor cell populations. In some cases, the evolutionary history and population frequency of the subclonal lineages of tumor cells present in the sample can be reconstructed from these SNV frequency measurements. However, automated methods to do this reconstruction are not available and the conditions under which reconstruction is possible have not been described. We describe the conditions under which the evolutionary history can be uniquely reconstructed from SNV frequencies from single or multiple samples from the tumor population and we introduce a new statistical model, PhyloSub, that infers the phylogeny and genotype of the major subclonal lineages represented in the population of cancer cells. It uses a Bayesian nonparametric prior over trees that groups SNVs into major subclonal lineages and automatically estimates the number of lineages and their ancestry. We sample from the joint posterior distribution over trees to identify evolutionary histories and cell population frequencies that have the highest probability of generating the observed SNV frequency data. When multiple phylogenies are consistent with a given set of SNV frequencies, PhyloSub represents the uncertainty in the tumor phylogeny using a partial order plot. Experiments on a simulated dataset and two real datasets comprising tumor samples from acute myeloid leukemia and chronic lymphocytic leukemia patients demonstrate that PhyloSub can infer both linear (or chain) and branching lineages and its inferences are in good agreement with ground truth, where it is available.

Pub.: 02 Nov '13, Pinned: 30 Jun '17