期刊检索

  • 2022年第20卷
  • 2021年第19卷
  • 2020年第18卷
  • 2019年第17卷
  • 2018年第16卷
  • 2017年第15卷
  • 2016年第14卷
  • 2015年第13卷
  • 2014年第12卷
  • 2013年第11卷
  • 第1期
  • 第2期

主管单位 工业和信息化部 主办单位 哈尔滨工业大学 主编 任南琪 国际刊号ISSN 1672-5565 国内刊号CN 23-1513/Q

期刊网站二维码
微信公众号二维码
引用本文:赵博璇,刘明,李建伟.基于生物信息学的胃癌早期诊断预测模型研究[J].生物信息学,2022,20(4):274-283.
ZHAO Boxuan,LIU Ming,LI Jianwei.Research on an early diagnosis and prediction model of gastriccancer based on bioinformatics[J].Chinese Journal of Bioinformatics,2022,20(4):274-283.
【打印本页】   【HTML】   【下载PDF全文】   查看/发表评论  下载PDF阅读器  关闭
←前一篇|后一篇→ 过刊浏览    高级检索
本文已被:浏览 69次   下载 61 本文二维码信息
码上扫一扫!
分享到: 微信 更多
基于生物信息学的胃癌早期诊断预测模型研究
赵博璇,刘明,李建伟
(河北工业大学 人工智能与数据科学学院,天津 300401)
摘要:
利用The Cancer Genome Atlas和Genotype-Tissue Expression公共数据检索收集胃癌(Gastric cancer,GC)基因表达数据集,筛选与早期胃癌密切相关的基因并构建胃癌早期诊断预测模型。运用Deseq2软件包筛选早期胃癌差异基因,并对差异基因进行GO和KEGG富集分析。通过STRING数据库建立其蛋白质相互作用网络并利用Cytoscape软件提取关键子网得到候选关键基因,进一步利用MedCalc软件确认胃癌早期诊断关键基因。根据筛选得到的10个关键基因构建基于支持向量机、随机森林、朴素贝叶斯、K-近邻、极限梯度提升和自适应提升等六种算法的胃癌早期诊断预测模型,依据ROC曲线和准确率等评价指标对各个分类器模型进行评估,通过独立测试集验证得到极致梯度提升诊断预测模型为最优模型。本研究成果为提高结胃癌早期诊断的研究提供了新的思路和方法。
关键词:  胃癌  关键基因  生物信息学  诊断预测模型  极限梯度提升
DOI:10.12113/202108017
分类号:Q344+.13
文献标识码:A
基金项目:国家自然科学基金项目(No.62072154).
Research on an early diagnosis and prediction model of gastriccancer based on bioinformatics
ZHAO Boxuan, LIU Ming, LI Jianwei
(School of Artificial Intelligence,Hebei University of Technology,Tianjin 300401, China)
Abstract:
Gastric cancer (GC) gene expression data were retrieved from the Cancer Genome Atlas and Genotype-Tissue Expression public databases. Genes closely related to early gastric cancer were screened and utilized to construct an early diagnosis and prediction model for gastric cancer. The differential genes of early gastric cancer were screened with Deseq2 software package, and GO and KEGG enrichment analyses were performed on the differential genes. The protein-protein interaction network of the differential genes was established with STRING database. The key subnetworks were extracted from the network to obtain candidate key genes by Cytoscape, and ten key genes were identified by MedCalc software. According to the ten key genes, six early diagnosis and prediction models of gastric cancer were constructed based on the algorithms of Support Vector Machine, Random Forest, Naive Bayes, K-nearest neighbor, XGBoost, and Adaptive Boosting. Each model was evaluated by ROC curve, accuracy rate, and other indicators. The diagnosis and prediction model based on XGBoost was verified as the optimal model by independent test set validation. The results of this study provide new ideas and methods for researchers to improve the efficiency of early diagnosis and prediction for gastric cancer.
Key words:  Gastric cancer  Key genes  Bioinformatics  Diagnosis and prediction model  XGBoost

友情链接LINKS