引用本文: | 赵博璇,刘明,李建伟.基于生物信息学的胃癌早期诊断预测模型研究[J].生物信息学,2022,20(4):274-283. |
| ZHAO Boxuan,LIU Ming,LI Jianwei.Research on an early diagnosis and prediction model of gastriccancer based on bioinformatics[J].Chinese Journal of Bioinformatics,2022,20(4):274-283. |
|
摘要: |
利用The Cancer Genome Atlas和Genotype-Tissue Expression公共数据检索收集胃癌(Gastric cancer,GC)基因表达数据集,筛选与早期胃癌密切相关的基因并构建胃癌早期诊断预测模型。运用Deseq2软件包筛选早期胃癌差异基因,并对差异基因进行GO和KEGG富集分析。通过STRING数据库建立其蛋白质相互作用网络并利用Cytoscape软件提取关键子网得到候选关键基因,进一步利用MedCalc软件确认胃癌早期诊断关键基因。根据筛选得到的10个关键基因构建基于支持向量机、随机森林、朴素贝叶斯、K-近邻、极限梯度提升和自适应提升等六种算法的胃癌早期诊断预测模型,依据ROC曲线和准确率等评价指标对各个分类器模型进行评估,通过独立测试集验证得到极致梯度提升诊断预测模型为最优模型。本研究成果为提高结胃癌早期诊断的研究提供了新的思路和方法。 |
关键词: 胃癌 关键基因 生物信息学 诊断预测模型 极限梯度提升 |
DOI:10.12113/202108017 |
分类号:Q344+.13 |
文献标识码:A |
基金项目:国家自然科学基金项目(No.62072154). |
|
Research on an early diagnosis and prediction model of gastriccancer based on bioinformatics |
ZHAO Boxuan, LIU Ming, LI Jianwei
|
(School of Artificial Intelligence,Hebei University of Technology,Tianjin 300401, China)
|
Abstract: |
Gastric cancer (GC) gene expression data were retrieved from the Cancer Genome Atlas and Genotype-Tissue Expression public databases. Genes closely related to early gastric cancer were screened and utilized to construct an early diagnosis and prediction model for gastric cancer. The differential genes of early gastric cancer were screened with Deseq2 software package, and GO and KEGG enrichment analyses were performed on the differential genes. The protein-protein interaction network of the differential genes was established with STRING database. The key subnetworks were extracted from the network to obtain candidate key genes by Cytoscape, and ten key genes were identified by MedCalc software. According to the ten key genes, six early diagnosis and prediction models of gastric cancer were constructed based on the algorithms of Support Vector Machine, Random Forest, Naive Bayes, K-nearest neighbor, XGBoost, and Adaptive Boosting. Each model was evaluated by ROC curve, accuracy rate, and other indicators. The diagnosis and prediction model based on XGBoost was verified as the optimal model by independent test set validation. The results of this study provide new ideas and methods for researchers to improve the efficiency of early diagnosis and prediction for gastric cancer. |
Key words: Gastric cancer Key genes Bioinformatics Diagnosis and prediction model XGBoost |