Structural variation in two human genomes mapped at single-nucleotide resolution by whole genome de novo assembly.

Research paper by Yingrui Y Li, Hancheng H Zheng, Ruibang R Luo, Honglong H Wu, Hongmei H Zhu, Ruiqiang R Li, Hongzhi H Cao, Boxin B Wu, Shujia S Huang, Haojing H Shao, Hanzhou H Ma, Fan F Zhang, Shuijian S Feng, Wei W Zhang, Hongli H Du, et al.

Indexed on: 26 Jul '11Published on: 26 Jul '11Published in: Nature Biotechnology


Here we use whole-genome de novo assembly of second-generation sequencing reads to map structural variation (SV) in an Asian genome and an African genome. Our approach identifies small- and intermediate-size homozygous variants (1-50 kb) including insertions, deletions, inversions and their precise breakpoints, and in contrast to other methods, can resolve complex rearrangements. In total, we identified 277,243 SVs ranging in length from 1-23 kb. Validation using computational and experimental methods suggests that we achieve overall <6% false-positive rate and <10% false-negative rate in genomic regions that can be assembled, which outperforms other methods. Analysis of the SVs in the genomes of 106 individuals sequenced as part of the 1000 Genomes Project suggests that SVs account for a greater fraction of the diversity between individuals than do single-nucleotide polymorphisms (SNPs). These findings demonstrate that whole-genome de novo assembly is a feasible approach to deriving more comprehensive maps of genetic variation.