引用本文: | 王剑,成金勇,赵志刚,鹿文鹏.基于CNN与LSTM模型的蛋白质二级结构预测[J].生物信息学,2018,16(2):130-135. |
| WANG Jian,CHENG Jinyong,ZHAO Zhigang,LU Wenpeng.Protein secondary structure prediction based on CNN and LSTM models[J].Chinese Journal of Bioinformatics,2018,16(2):130-135. |
|
摘要: |
蛋白质结构的预测在理解蛋白质结构组成和蛋白质的生物学功能有重要意义,而蛋白质二级结构预测是蛋白质结构预测的重要环节。当PSSM位置特异性进化矩阵被广泛应用于将蛋白质初级结构序列编码作为输入样本后,每个残基可以被表示成二维空间的数据平面,由此文中尝试利用卷积神经网络对其进行训练。文中还设计了另一种卷积神经网络,利用长短记忆网络感知了CNN最后卷积特征面的横向特征和纵向特征后连同卷积神经网络的全连接共同完成分类,最后用ensemble方法对两类卷积神经网络模型进行了整合,最终ensemble方法中包含两类卷积神经网络的六个模型,在CB513蛋白质数据集测得的Q3结果为77.2。 |
关键词: 卷积神经网络 长短记忆网络 蛋白质二级结构预测 Ensemble方法 |
DOI:10.3969/j.issn.1672-5565.201712004 |
分类号:TP391.4 |
文献标识码:A |
基金项目:国家自然科学基金(61375013);山东省自然科学基金(ZR2013FM020). |
|
Protein secondary structure prediction based on CNN and LSTM models |
WANG Jian1,CHENG Jinyong1,ZHAO Zhigang2, LU Wenpeng1
|
(1.College of Information, Qilu University of Technology(Shandong Academy of Sciences), Jinan 250353, China, 2.Shandong Computer Science Center( National Supercomputer Center in Jinan) Qilu University of Technology (Shandong Academy of Sciences), Jinan 250101, China)
|
Abstract: |
The prediction of protein structure is of great significance in understanding the structure and the biological function of proteins. The prediction of protein secondary structure is an important part of protein structure prediction. When PSSM position-specific evolution matrix is widely used to encode the primary sequence of a protein, and used as input sample, each residue can be represented as a two-dimensional data plane. Therefore, a convolutional neural network can be adopted as a model to train them. In this paper, we also designed another type of CNN in which LSTM were used to perceive the features of CNN last convolution feature maps both horizontally and vertically, and completed classification collaboratively with the fully-connected neural elements of convolution model. Finally, an ensemble method was adopted to integrate these two types of CNN models. This designed ensemble method includes six models of these two types of CNN. The Q3 accuracy obtained from CB513 is 77.2. |
Key words: CNN LSTM Protein secondary structure prediction Ensemble method |