Protein Secondary Structure Prediction Using Super-chains in PDB
Abstract
The completeness of the protein structures in the current Protein Data Bank (PDB) library for use in secondary structure prediction of unknown structure of protein is examined. To deal with this issue, randomly several 1000 protein chains batches are chosen from PDB. For each protein chain in the batch of PDB dataset that who contain the query protein chain as a subsequence are identified and named as a super-chain and prediction of the secondary structure of the query protein is performed by the use of the corresponding sub sequences of the secondary structure sequence of these chains. The technique is repeated for well known datasets such that CB513, FC699, 640, 25PDB, SCOP, and 1189 as well. It is seen that sequences of around 18% of proteins in the batch are present in other chains of PDB dataset. The average prediction accuracy of this method is found to be 80%. Therefore an unknown protein has a chance of 20% to have a super-chain in Protein Data Bank (PDB), and if a protein has a super-chain in the PDB database, there is a possibility that its secondary structure be predicted with around 80% accuracy.
Keywords
Protein Secondary Structure Prediction; PDB; Super chains
Full Text:
PDFDOI: http://dx.doi.org/10.21533/scjournal.v5i1.101
Refbacks
- There are currently no refbacks.
Copyright (c) 2016 Faruk Berat Akcesme, Mehmet Can
ISSN 2233 -1859
Digital Object Identifier DOI: 10.21533/scjournal
This work is licensed under a Creative Commons Attribution 4.0 International License