摘要: |
蛋白质二级结构的预测,对于研究蛋白质的功能和人类生命科学意义非凡。1951年开始提出预测蛋白质二级结构,1983年对于二级结构的预测只有50%的准确率。经过多年的发展,预测方式不断的改进和完善,到如今准确率已经超过80%。但目前预测在线服务器繁多,连续自动模型评估(CAMEO)也只给出服务器三级结构的预测评估,二级结构评估还未实现。针对上述问题,选取了以下6个服务器:PSRSM、MUFOLD、SPIDER、RAPTORX、JPRED和PSIPRED,对其预测的二级结构进行评估。并且为保证测试集不在训练集内,实验数据选取蛋白质结构数据库(Protein Data Bank,PDB)最新发布的蛋白质。在基于蛋白质同源性30%、50%和70%的实验中,PSRSM取得Q3的准确率分别为91.44%、88.12%和90.17%,比其他预测服务器中最高的MUFOLD分别高出3.19%、1.33%和2.19%,证明在同一类同源性数据中PSRSM比其他服务器有更好的预测效果。除此之外实验也得到其预测的Sov准确度也比其他服务器要高。比较各类服务器的方法与结果,得出今后蛋白质二级结构预测应当重点从大数据、模板和深度学习的角度进行研究。 |
关键词: 蛋白质二级结构 预测 在线服务器 准确率 评估 |
DOI:10.12113/j.issn.1672-5565.201808002 |
分类号:Q518.1 |
文献标识码:A |
基金项目: |
|
Protein secondary structure online server predictive evaluation |
ZHU Shuping, LIU Yihui
|
(School of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan 250353, China)
|
Abstract: |
The prediction of protein secondary structure is of great significance for studying the function of proteins and human life sciences. The prediction of protein secondary structure was put forward in 1951, but the accuracy rate was only 50% in 1983. During years of development, the prediction method has been continuously optimized, and the accuracy rate has already exceeded 80%. However, there are many online servers, and Continuous Automate Model EvaluatiOn (CAMEO) can only provide predictive evaluation of the servers three-level structure, while the secondary structure evaluation has not been realized. Aiming to solve the above problems, PSRSM, MUFOLD, SPIDER, RAPTORX, JPRED, and PSIPRED were selected to evaluate their predicted secondary structure. The latest released protein from the Protein Data Bank (PDB) was applied to ensure that the test set is not included in the training set. In the experiments where the protein homology was 30%, 50% and 70%, the obtained accuracy of PSRSM for Q3 were 91.44%, 88.12%, and 90.17%, respectively. The accuracy was higher than the best prediction server MUFOLD by 3.19%, 1.33%, and 2.19% correspondingly, which proved that PSRSM has better prediction accuracy than other servers for the same kind of homology data and for the Sov.This paper focuses on analyzing the operating methods and corresponding results of various servers, thus it is concluded that the prediction of protein secondary structure should be studied from the perspectives of big data, templates, and in-depth learning. |
Key words: Protein secondary structure Prediction Online server Accuracy Evaluation |