Protein Fold Recognition using Residue-based Alignments of Sequence and Secondary Structure
Aydin, Z., Erdogan, H., Altunbasak, Y.
Protein structure prediction aims to determine the three-dimensional structure of proteins form their amino acid sequences. When a protein does not have similarity (homology) to any known fold, threading or fold recognition methods are used to predict structure. Fold recognition methods frequently employ secondary structure, solvent accessibility, and evolutionary information to enhance the accuracy and the quality of the predictions. In this paper, we present a residue based alignment method as an alternative to the state-of-the-art SSEA method, originally introduced by Przytycka et al. , and further modified by McGuffin et al. . We introduce a residue-based score function, which can incorporate amino acid similarity matrices such as BLOSUM into secondary structure similarity scoring and compute joint alignments. We show that the power of the SSEA method comes from the length normalization instead of the element alignment technique and similar performance can be achieved using residue-based alignments of secondary structures by optimizing gap costs. In simulations with the two benchmark datasets, our method performs slightly better than the SSEA in terms of the fold recognition accuracy. When the secondary structure similarity matrix is combined with the amino acid based BLOSUM30 matrix, the accuracy of our method improves further (4% for the McGuffin set and 10% for the Ding and Dubchak set). The availability of aligning the amino acid and secondary structure sequences in a joint manner offers a better starting point for more elaborate techniques that employ profile-profile alignments and machine learning methods [3,4].