Ph.D. Candidate at NYU working on the role of three-dimensional structure of the genome in cancer.
I develop algorithms to measure TAD boundary strength and detect boundary disruptions in cancer
When we hear about deoxyribonucleic acid or simply DNA, the first thing that comes to mind is probably a spiral staircase-shaped molecule that carries the genetic information from generation to generation. However, when we picture DNA in the three-dimensional space, it becomes an even more fascinating molecule. Despite the fact that the total DNA of each cell is around two meters long (for humans), it is so compacted that it fits in the nucleus, a cellular compartment with a diameter in the micrometer scale. The compaction of DNA has some interesting implications. A molecular biology technique known as Hi-C, has revealed that the mammalian DNA is compartmentalized in neighborhoods of highly interacting chromatin known as topologically associating domains (TADs). TADs are demarcated by boundaries. TAD boundaries can be imagined as physical barriers (insulators) preventing the communication of adjacent TADs. Mounting evidence suggests that TAD boundaries are disrupted in cancer, resulting in aberrant activation of oncogenes. During the first two years of my PhD, I focused on increasing the robustness and reproducibilty of Hi-C data analysis in order to precisely detect TADs. Towards this end, I developed a computational platform called HiC-bench, which can be used for complete Hi-C analysis allowing for parameter exploration and benchmarking of the various tools used in the multiple steps of this complicated analysis . This work was also presented at the Cold Spring Harbor Meeting on "Nuclear Organization & Function" in 2016. During the last two years, I have been developing a machine-learning approach to categorize TAD boundaries based on their insulating strength. In a new manuscript where I am co-first author , we provide the first genome-wide analysis of boundary strength and we demonstrate that weak boundaries are more prone to disruption, superenhancers (that consist key regulatory elements of cell identity) are protected by strong boundaries and finally, superenhancers and key oncogenes are co-amplified in cancer. My work may reveal boundary disruptions that, if reversible, can consist new drug targets.
Lazaris C, Kelly S, Ntziachristos P, Aifantis I, Tsirigos A. BMC Genomics. 2017 Jan 5;18(1):22.
Gong Y., Lazaris C., Lozano A., Kam P. Ntziachristos P, Aifantis I. Tsirigos A. bioRxiv 141481; https://doi.org/10.1101/141481 (submitted to Nature Methods)
Full publication list: https://tinyurl.com/yc84hetb
Abstract: It is estimated that the human genome contains hundreds of thousands of enhancers, so understanding these gene-regulatory elements is a crucial goal. Several fundamental questions need to be addressed about enhancers, such as how do we identify them all, how do they work, and how do they contribute to disease and evolution? Five prominent researchers in this field look at how much we know already and what needs to be done to answer these questions.
Pub.: 19 Mar '13, Pinned: 29 Jun '17
Abstract: HiC-Pro is an optimized and flexible pipeline for processing Hi-C data from raw reads to normalized contact maps. HiC-Pro maps reads, detects valid ligation products, performs quality controls and generates intra- and inter-chromosomal contact maps. It includes a fast implementation of the iterative correction method and is based on a memory-efficient data format for Hi-C contact maps. In addition, HiC-Pro can use phased genotype data to build allele-specific contact maps. We applied HiC-Pro to different Hi-C datasets, demonstrating its ability to easily process large data in a reasonable time. Source code and documentation are available at http://github.com/nservant/HiC-Pro .
Pub.: 02 Dec '15, Pinned: 29 Jun '17
Abstract: During the cell cycle, the genome must undergo dramatic changes in structure, from a decondensed, yet highly organized interphase structure to a condensed, generic mitotic chromosome and then back again. For faithful cell division, the genome must be replicated and chromosomes and sister chromatids physically segregated from one another. Throughout these processes, there is feedback and tension between the information-storing role and the physical properties of chromosomes. With a combination of recent techniques in fluorescence microscopy, chromosome conformation capture (Hi-C), biophysical experiments, and computational modeling, we can now attribute mechanisms to many long-observed features of chromosome structure changes during cell division. Apparent conflicts that arise when integrating the concepts from these different proposed mechanisms emphasize that orchestrating chromosome organization during cell division requires a complex system of factors rather than a simple pathway. Cell division is both essential for and threatening to proper genome organization. As interphase three-dimensional (3D) genome structure is quite static at a global level, cell division provides an important window of opportunity to make substantial changes in 3D genome organization in daughter cells, allowing for proper differentiation and development. Mistakes in the process of chromosome condensation or rebuilding the structure after mitosis can lead to diseases such as cancer, premature aging, and neurodegeneration. For further resources related to this article, please visit the WIREs website.
Pub.: 17 May '17, Pinned: 29 Jun '17
Abstract: Topologically associating domains (TADs) have been proposed to be the basic unit of chromosome folding and have been shown to play key roles in genome organization and gene regulation. Several different tools are available for TAD prediction, but their properties have never been thoroughly assessed. In this manuscript, we compare the output of seven different TAD prediction tools on two published Hi-C data sets. TAD predictions varied greatly between tools in number, size distribution and other biological properties. Assessed against a manual annotation of TADs, individual TAD boundary predictions were found to be quite reliable, but their assembly into complete TAD structures was much less so. In addition, many tools were sensitive to sequencing depth and resolution of the interaction frequency matrix. This manuscript provides users and designers of TAD prediction tools with information that will help guide the choice of tools and the interpretation of their predictions.
Pub.: 24 Mar '17, Pinned: 29 Jun '17
Abstract: How does the non-coding portion of the genome contribute to the regulation of genome architecture? A recent paper by Tan et al. focuses on the relationship between cis-acting complex-trait-associated lincRNAs and the formation of chromosomal contacts in topologically associating domains (TADs).
Pub.: 27 Mar '17, Pinned: 29 Jun '17
Abstract: The molecular mechanisms underlying folding of mammalian chromosomes remain poorly understood. The transcription factor CTCF is a candidate regulator of chromosomal structure. Using the auxin-inducible degron system in mouse embryonic stem cells, we show that CTCF is absolutely and dose-dependently required for looping between CTCF target sites and insulation of topologically associating domains (TADs). Restoring CTCF reinstates proper architecture on altered chromosomes, indicating a powerful instructive function for CTCF in chromatin folding. CTCF remains essential for TAD organization in non-dividing cells. Surprisingly, active and inactive genome compartments remain properly segregated upon CTCF depletion, revealing that compartmentalization of mammalian chromosomes emerges independently of proper insulation of TADs. Furthermore, our data support that CTCF mediates transcriptional insulator function through enhancer blocking but not as a direct barrier to heterochromatin spreading. Beyond defining the functions of CTCF in chromosome folding, these results provide new fundamental insights into the rules governing mammalian genome organization.
Pub.: 20 May '17, Pinned: 29 Jun '17
Abstract: Hi-C is a genome-wide sequencing technique used to investigate 3D chromatin conformation inside the nucleus. Computational methods are required to analyze Hi-C data and identify chromatin interactions and topologically associating domains (TADs) from genome-wide contact probability maps. We quantitatively compared the performance of 13 algorithms in their analyses of Hi-C data from six landmark studies and simulations. This comparison revealed differences in the performance of methods for chromatin interaction identification, but more comparable results for TAD detection between algorithms.
Pub.: 13 Jun '17, Pinned: 29 Jun '17
Abstract: Chromatin conformation capture techniques have evolved rapidly over the last few years and have provided new insights into genome organization at an unprecedented resolution. Analysis of Hi-C data is complex and computationally intensive involving multiple tasks and requiring robust quality assessment. This has led to the development of several tools and methods for processing Hi-C data. However, most of the existing tools do not cover all aspects of the analysis and only offer few quality assessment options. Additionally, availability of a multitude of tools makes scientists wonder how these tools and associated parameters can be optimally used, and how potential discrepancies can be interpreted and resolved. Most importantly, investigators need to be ensured that slight changes in parameters and/or methods do not affect the conclusions of their studies.To address these issues (compare, explore and reproduce), we introduce HiC-bench, a configurable computational platform for comprehensive and reproducible analysis of Hi-C sequencing data. HiC-bench performs all common Hi-C analysis tasks, such as alignment, filtering, contact matrix generation and normalization, identification of topological domains, scoring and annotation of specific interactions using both published tools and our own. We have also embedded various tasks that perform quality assessment and visualization. HiC-bench is implemented as a data flow platform with an emphasis on analysis reproducibility. Additionally, the user can readily perform parameter exploration and comparison of different tools in a combinatorial manner that takes into account all desired parameter settings in each pipeline task. This unique feature facilitates the design and execution of complex benchmark studies that may involve combinations of multiple tool/parameter choices in each step of the analysis. To demonstrate the usefulness of our platform, we performed a comprehensive benchmark of existing and new TAD callers exploring different matrix correction methods, parameter settings and sequencing depths. Users can extend our pipeline by adding more tools as they become available.HiC-bench consists an easy-to-use and extensible platform for comprehensive analysis of Hi-C datasets. We expect that it will facilitate current analyses and help scientists formulate and test new hypotheses in the field of three-dimensional genome organization.
Pub.: 07 Jan '17, Pinned: 28 Jun '17
Abstract: Mammalian genomes are organized into megabase-scale topologically associated domains (TADs). We demonstrate that disruption of TADs can rewire long-range regulatory architecture and result in pathogenic phenotypes. We show that distinct human limb malformations are caused by deletions, inversions, or duplications altering the structure of the TAD-spanning WNT6/IHH/EPHA4/PAX3 locus. Using CRISPR/Cas genome editing, we generated mice with corresponding rearrangements. Both in mouse limb tissue and patient-derived fibroblasts, disease-relevant structural changes cause ectopic interactions between promoters and non-coding DNA, and a cluster of limb enhancers normally associated with Epha4 is misplaced relative to TAD boundaries and drives ectopic limb expression of another gene in the locus. This rewiring occurred only if the variant disrupted a CTCF-associated boundary domain. Our results demonstrate the functional importance of TADs for orchestrating gene expression via genome architecture and indicate criteria for predicting the pathogenicity of human structural variants, particularly in non-coding regions of the human genome.
Pub.: 12 May '15, Pinned: 28 Jun '17
Abstract: The spatial organization of the genome is intimately linked to its biological function, yet our understanding of higher order genomic structure is coarse, fragmented and incomplete. In the nucleus of eukaryotic cells, interphase chromosomes occupy distinct chromosome territories, and numerous models have been proposed for how chromosomes fold within chromosome territories. These models, however, provide only few mechanistic details about the relationship between higher order chromatin structure and genome function. Recent advances in genomic technologies have led to rapid advances in the study of three-dimensional genome organization. In particular, Hi-C has been introduced as a method for identifying higher order chromatin interactions genome wide. Here we investigate the three-dimensional organization of the human and mouse genomes in embryonic stem cells and terminally differentiated cell types at unprecedented resolution. We identify large, megabase-sized local chromatin interaction domains, which we term 'topological domains', as a pervasive structural feature of the genome organization. These domains correlate with regions of the genome that constrain the spread of heterochromatin. The domains are stable across different cell types and highly conserved across species, indicating that topological domains are an inherent property of mammalian genomes. Finally, we find that the boundaries of topological domains are enriched for the insulator binding protein CTCF, housekeeping genes, transfer RNAs and short interspersed element (SINE) retrotransposons, indicating that these factors may have a role in establishing the topological domain structure of the genome.
Pub.: 13 Apr '12, Pinned: 28 Jun '17
Abstract: Chromosome conformation capture methods have identified subchromosomal structures of higher-order chromatin interactions called topologically associated domains (TADs) that are separated from each other by boundary regions. By subdividing the genome into discrete regulatory units, TADs restrict the contacts that enhancers establish with their target genes. However, the mechanisms that underlie partitioning of the genome into TADs remain poorly understood. Here we show by chromosome conformation capture (capture Hi-C and 4C-seq methods) that genomic duplications in patient cells and genetically modified mice can result in the formation of new chromatin domains (neo-TADs) and that this process determines their molecular pathology. Duplications of non-coding DNA within the mouse Sox9 TAD (intra-TAD) that cause female to male sex reversal in humans, showed increased contact of the duplicated regions within the TAD, but no change in the overall TAD structure. In contrast, overlapping duplications that extended over the next boundary into the neighbouring TAD (inter-TAD), resulted in the formation of a new chromatin domain (neo-TAD) that was isolated from the rest of the genome. As a consequence of this insulation, inter-TAD duplications had no phenotypic effect. However, incorporation of the next flanking gene, Kcnj2, in the neo-TAD resulted in ectopic contacts of Kcnj2 with the duplicated part of the Sox9 regulatory region, consecutive misexpression of Kcnj2, and a limb malformation phenotype. Our findings provide evidence that TADs are genomic regulatory units with a high degree of internal stability that can be sculptured by structural genomic variations. This process is important for the interpretation of copy number variations, as these variations are routinely detected in diagnostic tests for genetic disease and cancer. This finding also has relevance in an evolutionary setting because copy-number differences are thought to have a crucial role in the evolution of genome complexity.
Pub.: 06 Oct '16, Pinned: 28 Jun '17