引用本文: | 章广能,张育芳,张宝.基于表示学习的图神经网络模型预测化合物-蛋白质相互作用[J].生物信息学,2024,22(4):287-295. |
| ZHANG Guangneng,ZHANG Yufang,ZHANG Bao.A graph convolutional network model based on representing learning for compound-protein interaction prediction[J].Chinese Journal of Bioinformatics,2024,22(4):287-295. |
|
摘要: |
化合物-蛋白质互作的鉴定对药物发现、靶标鉴定,网络药理学和蛋白质功能的阐明等至关重要。本文开发了一种基于表示学习的图神经网络预测化合物-蛋白质互作模型。首先利用Word2vec表示学习方法自动提取化合物和蛋白质的特征;然后将特征输入构建图神经网络预测模型,并与传统机器学习方法和前人的先进方法对比。结果显示模型在曲线下面积,准确率等评价指标上表现出更好的结果。预测Binding-DB数据库中所有未知的化合物-蛋白质互作对的概率,其中预测得分排名前五的化合物-蛋白质互作对中有四个得到了外部证据的验证,进一步证明了模型的鲁棒性和有效性。本模型可以充分利用聚合邻居信息,节点特征和自适应地捕获化合物-蛋白质空间的拓扑结构,从而实现较高的模型精度。本研究成果为化合物和蛋白质互作鉴定的研究提供了新的思路和方法。 |
关键词: 化合物-蛋白质相互作用 表示学习 图神经网络 药物发现 |
DOI:10.12113/202307005 |
分类号:R96;TP18; |
文献标识码:A |
基金项目: |
|
A graph convolutional network model based on representing learning for compound-protein interaction prediction |
ZHANG Guangneng1,ZHANG Yufang2,ZHANG Bao1
|
(1.School of Public Health, Southern Medical University, Guangzhou 511495, China;2.School of Mathematical Science, Shanghai Jiaotong University, Shanghai 200240, China)
|
Abstract: |
The identification of compound-protein interactions is crucial for drug discovery, target identification, network pharmacology, and elucidation of protein function. In this paper, we develop a representation learning based graph neural network model for predicting compound-protein interactions. Firstly, Word2vec representation learning method is used to extract features of compounds and proteins automatically. Then the features are input to construct a graph neural network prediction model. Compared with traditional machine learning methods and previous advanced methods, this model shows better results in AUC, accuracy and other model evaluation indicators. Predict the probability of all unknown compound-protein interactions in the Binding-DB database, with four of the top five compound-protein interactions with the highest prediction score confirmed by external evidence. The robustness and effectiveness of the model are further proved. This model can fully utilize aggregated neighbor information, node features, and adaptively capture the topological structure of the compound protein space, thereby achieving high model accuracy. The results of this study provide a new idea and method for the study of compound-protein interaction identification. |
Key words: Compound-protein interactions Representation learning Graph neural network Drug discovery |