Indexed on: 26 Jun '15Published on: 26 Jun '15Published in: PloS one
Tumor single nucleotide polymorphism (SNP) array is a common platform for investigating the cancer genomic aberration and the functionally important altered genes. Original SNP array signals are usually corrupted by noise, and need to be de-convoluted into absolute copy number profile by analytical methods. Unfortunately, in contrast with the popularity of tumor Affymetrix SNP array, the methods that are specifically designed for this platform are still limited. The complicated characteristics of noise in signals is one of the difficulties for dissecting tumor Affymetrix SNP array data, as they inevitably blur the distinction between aberrations and create an obstacle for the copy number aberration (CNA) identification.We propose a tool named TAFFYS for comprehensive analysis of tumor Affymetrix SNP array data. TAFFYS introduce a wavelet-based de-noising approach and copy number-specific signal variance model for suppressing and modelling the noise in signals. Then a hidden Markov model is employed for copy number inference. Finally, by using the absolute copy number profile, statistical significance of each aberration region is calculated in term of different aberration types, including amplification, deletion and loss of heterozygosity (LOH). The result shows that copy number specific-variance model and wavelet de-noising algorithm fits well with the Affymetrix SNP array signals, leading to more accurate estimation for diluted tumor sample (even with only 30% of cancer cells) than other existed methods. Results of examinations also demonstrate a good compatibility and extensibility for different Affymetrix SNP array platforms. Application on the 35 breast tumor samples shows that TAFFYS can automatically dissect the tumor samples and reveal statistically significant aberration regions where cancer-related genes locate.TAFFYS provide an efficient and convenient tool for identifying the copy number alteration and allelic imbalance and assessing the recurrent aberrations for the tumor Affymetrix SNP array data.