Indexed on: 14 Jun '16Published on: 11 Jun '16Published in: Neurocomputing
The 14-3-3 proteins are a highly conserved family of homodimeric and heterodimeric molecules, expressed in all eukaryotic cells. In human cells, this family consists of seven distinct but highly homologous 14-3-3 isoforms. 14-3-3σ is the only isoform directly linked to cancer in epithelial cells, which is regulated by major tumor suppressor gene. For each 14-3-3 isoform, we have 1,000 peptide motifs with experimental binding affinity values. In this paper, we present a novel method for identifying peptide motifs binding to 14-3-3σ isoform. First, we select nine physicochemical properties of amino acids to describe each peptide motif. We also use auto-cross covariance to extract correlative properties of amino acids in any two positions. Then, a similarity-based undersampling approach and a SMOTE-like oversampling approach are used to deal with imbalanced distribution of the known peptide motifs. Finally, we consider locally weighted regression to predict affinity values of peptide motifs, which combines the simplicity of linear least squares regression with the flexibility of nonlinear regression. Our method tests on the 1,000 peptide motifs binding to seven 14-3-3 isoforms. On the 14-3-3σ isoform, our method has overall pearson-product-moment correlation coefficient(PCC) and the root mean squared error(RMSE) values of 0.83 and 258.31 for N-terminal sublibrary, and 0.80 and 250.89 for C-terminal sublibrary. We identify phosphopeptides that preferentially bind to 14-3-3σ over other isoforms. Several positions on peptide motifs have the same amino acid as experimental substrate specificity of phosphopeptides binding to 14-3-3σ. Our method is a fast and reliable computational method that can be used in peptide-protein binding identification in proteomics research.