SV-STAT accurately detects structural variation via alignment to reference-based assemblies.

Research paper by Caleb F CF Davis, Deborah I DI Ritter, David A DA Wheeler, Hongmei H Wang, Yan Y Ding, Shannon P SP Dugan, Matthew N MN Bainbridge, Donna M DM Muzny, Pulivarthi H PH Rao, Tsz-Kwong TK Man, Sharon E SE Plon, Richard A RA Gibbs, Ching C CC Lau

Indexed on: 23 Jun '16Published on: 23 Jun '16Published in: Source Code for Biology and Medicine


Genomic deletions, inversions, and other rearrangements known collectively as structural variations (SVs) are implicated in many human disorders. Technologies for sequencing DNA provide a potentially rich source of information in which to detect breakpoints of structural variations at base-pair resolution. However, accurate prediction of SVs remains challenging, and existing informatics tools predict rearrangements with significant rates of false positives or negatives.To address this challenge, we developed 'Structural Variation detection by STAck and Tail' (SV-STAT) which implements a novel scoring metric. The software uses this statistic to quantify evidence for structural variation in genomic regions suspected of harboring rearrangements. To demonstrate SV-STAT, we used targeted and genome-wide approaches. First, we applied a custom capture array followed by Roche/454 and SV-STAT to three pediatric B-lineage acute lymphoblastic leukemias, identifying five structural variations joining known and novel breakpoint regions. Next, we detected SVs genome-wide in paired-end Illumina data collected from additional tumor samples. SV-STAT showed predictive accuracy as high as or higher than leading alternatives. The software is freely available under the terms of the GNU General Public License version 3 at https://gitorious.org/svstat/svstat.SV-STAT works across multiple sequencing chemistries, paired and single-end technologies, targeted or whole-genome strategies, and it complements existing SV-detection software. The method is a significant advance towards accurate detection and genotyping of genomic rearrangements from DNA sequencing data.