Pattern Analysis for the Prediction of Fungal Pro-peptide Cleavage Sites
Ozogur S., Shawe-Taylor J., Weber, G.-W., Ogel Z. B.
to appear in special issue of Discrete Applied Mathematics on "Networks in Computational Biology"
Support vector machines (SVMs) have many applications in investigating biological data from gene expression arrays to understanding EEG signals of sleep stages. In this paper, we have developed an application that will support the prediction of the pro-peptide cleavage site of fungal extracellular proteins which display mostly a monobasic or dibasic processing site. Many of the secretory proteins and peptides are synthesized as inactive precursors and they become active after posttranslational processing. A collection of fungal pro-protein sequences are used as a training data set. A specically designed kernel is expressed as an application of the well-known Gaussian kernel via feature spaces defined for our problem. Rather than fixing the kernel parameters with cross validation or other methods, we introduce a novel approach that simultaneously performs model selection together with the test of accuracy and testing confidence levels. This leads us to higher accuracy at significantly reduced training times. The results of the server ProP1.0 which predicts pro-peptide cleavage sites are compared with the results of this study. A similar mathematical approach may be adapted to pro-peptide cleavage prediction in other eukaryotes.