Bayesian Protein Secondary Structure Prediction With Near-Optimal Segmentations
Aydin, Z., Altunbasak, Y., Erdogan, H.
IEEE Transactions on Signal Processing
Secondary structure prediction is an invaluable tool in determining the 3-D structure and function of proteins. Typically, protein secondary structure prediction methods suffer from low accuracy in beta-strand predictions, where nonlocal interactions play a significant role. There is a considerable need to model such long- range interactions that contribute to the stabilization of a protein molecule. In this paper, we introduce an alternative decoding technique for the hidden semi-Markov model (HSMM) originally employed in the BSPSS algorithm, and further developed in the IPSSP algorithm. The proposed method is based on the N-best paradigm where a set of most likely segmentations is computed. To generate suboptimal segmentations (i.e., alternative prediction sequences), we developed two N-best search algorithms. The first one is an A* stack decoder algorithm that extends paths (or hypotheses) by one symbol at each iteration. The second algorithm locally keeps the end positions of the highest scoring K previous segments and performs backtracking. Both algorithms employ the hidden semi- Markov model described in Aydin etal. , and use Viterbi scoring to compute the N-best list. The availability of near-optimal segmentations and the utilization of the Viterbi scoring enable the sequences to be rescored using more complex dependency models that characterize nonlocal interactions in beta-sheets. After the score update, one can either keep the segmentations to be employed in 3-D structure prediction or predict the secondary structure by applying a weighted voting procedure to a set of top scoring M ges 1 segmentations. The accuracy measures of the N-best method when used to predict the secondary structure are shown to be comparable or better than the classical Viterbi decoder (MAP estimator), tested under the single-sequence condition. When no rescoring is applied, the stack decoder algorithm with sufficiently large M improves the overall sensitivity measure (Q3) of the Viterbi algorithm by 1.1%. At the same M value, the N-best Viterbi algorithm improves the Q3 measure by 0.25% as well as the sensitivity measures specific for each secondary structure type (Qobsalpha, Qobsbeta, QobsL). When the sequences are rescored using the posterior probability distribution computed by the posterior decoding algorithm (MPM estimator), N-best Viterbi improves the Q3 measure of the Viterbi algorithm by 2.6%. The rescored N-best list approach also enables us to generate suboptimal segmentations that are valid sequences (i.e., realizable from the hidden semi-Markov model). Although the N-best algorithms and the score update procedure brought significant improvements over the Viterbi algorithm, they were not able to outperform the posterior decoding algorithm in the single-sequence condition. Further improvements in the prediction accuracy should be possible with the incorporation of sophisticated models for nonlocal interactions and other physical constraints that stabilize the overall structure of a protein.