| 引用本文: | 黄新蒙,苏凌昊,杨一帆,乔文慧,何家霖,赵培源,刘喜红.机器学习结合生物信息学鉴定多[]发性硬化症的关键基因[J].生物信息学,2026,24(1):44-56. |
| HUANG Xinmeng,SU Linghao,YANG Yifan,QIAO Wenhui,HE Jialin,ZHAO Peiyuan,LIU Xihong.Machine learning combined with bioinformatics identifies keygenes in multiple sclerosis[J].Chinese Journal of Bioinformatics,2026,24(1):44-56. |
|
| |
|
|
| 本文已被:浏览 47次 下载 26次 |
 码上扫一扫! |
|
|
| 机器学习结合生物信息学鉴定多[]发性硬化症的关键基因 |
|
黄新蒙1,苏凌昊2,杨一帆2,乔文慧2,何家霖2,赵培源3,刘喜红3
|
|
(1.河南中医药大学 第二临床医学院,郑州 450046;2.河南中医药大学 第一临床医学院(中西医结合学院),郑州 450046;3.河南中医药大学 中医学院(仲景学院),郑州 450046)
|
|
| 摘要: |
| 本研究基于生物信息学和机器学习方法探究多发性硬化症(Multiple sclerosis, MS)的关键基因,从GEO数据库获取MS基因表达谱GSE21942和GSE32988,GSE32988作为验证数据集,采用主成分分析(Principal component analysis, PCA)评估样本聚类情况,筛选差异表达基因(Differential expression analysis, DEGs)并进行基因本体论分析(Gene ontology, GO)和京都基因与基因组百科全书(Kyoto encyclopedia of genes and genomes, KEGG)通路富集分析,使用加权基因共表达网络分析(Weighted gene co-expression network analysis, WGCNA)筛选与MS密切相关的基因模块,将基因模块与DEGs取交集获得候选基因,采用最小绝对收缩和选择算子算法(Least absolute shrinkage and selection operator, LASSO)及随机森林(Random forest, RF)对候选基因筛选得到潜在关键基因,采用第三方数据集GSE32988验证潜在关键基因的差异表达。通过进行受试者工作特征曲线(Receiver operating characteristic curve, ROC)验证,得到关键基因。采用MS动物模型验证关键基因表达水平。结果显示GSE21942具有良好重复性和相关性,共获得506个DEGs,富集分析结果表明DEGs主要富集于B细胞活化、谷氨酸(Glutamic acid, GLU)代谢、氧化应激(Oxidative stress, OS)等生物功能、内质网(Endoplasmic reticulum, ER)等细胞组分以及EB病毒感染(Epstein-barr virus, EBV)等。29个候选基因经机器学习算法筛选得到5个潜在关键基因,用GSE32988验证后得到GLUD1、VDAC1、DDX3X、LAMP1共4个关键基因。RT-qPCR鉴定DDX3X、LAMP1、GLUD1和VDAC1的表达水平与mRNA芯片的生物信息学分析结果一致;因此,DDX3X、LAMP1、GLUD1和VDAC1有可能成为MS治疗的新靶标。 |
| 关键词: 多发性硬化症 生物信息学 关键基因 机器学习 |
| DOI:10.12113/202409001 |
| 分类号:R744.5+1 |
| 文献标识码:A |
| 基金项目:国家自然科学基金青年科学基金项目(No.82104579);中国博士后科学基金面上项目(No.2023M731024);河南省自然科学基金项目(No.202300410258);河南省高等学校青年骨干教师培养计划项目(No.2023GGJS080). |
|
| Machine learning combined with bioinformatics identifies keygenes in multiple sclerosis |
|
HUANG Xinmeng1, SU Linghao2, YANG Yifan2, QIAO Wenhui2, HE Jialin2, ZHAO Peiyuan3, LIU Xihong3
|
|
(1.The Second Clinical Medical College of Henan University of Traditional Chinese Medicine, Zhengzhou 450046, China;2.The First Clinical Medical (Integrative Medicine) College of Henan University of Traditional Chinese Medicine, Zhengzhou 450046, China;3.The traditional Chinese Medicine (Zhong Jing) College of Henan University of Chinese Medicine, Zhengzhou 450046, China)
|
| Abstract: |
| HJ1.5mm]In order to explore the key genes of multiple sclerosis (MS) based on bioinformatics and machine learning methods; MS gene expression profiles GSE21942 and GSE32988 were obtained from the GEO database. GSE32988 was used as a validation dataset, Sample clustering was assessed using PCA to screen for differentially expressed genes (DEGs) and analyzed for GO and KEGG enrichment, the gene modules closely related to MS were identified using weighted gene co-expression network analysis (WGCNA), the intersection of the gene modules and DEGs was analyzed to obtain candidate genes. Candidate genes were screened to obtain potential key genes using machine learning algorithms, which include the Least Absolute Shrinkage Operator Algorithm (LASSO) and the Random Forest Algorithm (RF). The third-party dataset GSE32988 was used to validate the differential expression of potential key genes. key genes were obtained by performing subject operating characteristic curve (ROC) validation. An MS animal model was used to verify the expression levels of key genes; The results show that GSE21942 showed good repeatability and correlation, and a total of 506 DEGs were obtained. Enrichment analysis showed that DEGs were mainly enriched in biological functions such as B cell activation, glutamic acid (GLU) metabolism, oxidative stress (OS), as well as in EBV infection and the B cell receptor signaling pathway, etc. The 29 candidate genes were screened by a machine learning algorithm to obtain five potential key genes, and a total of four key genes, GLUD1, VDAC1, DDX3X, and LAMP1, were obtained after validation with GSE32988. RT-qPCR identified the expression levels of DDX3X, LAMP1, GLUD1, and VDAC1 in accordance with the results of bioinformatics analysis of mRNA microarrays; Consequently, DDX3X, LAMP1, GLUD1, and VDAC1 may become new targets for MS therapy. |
| Key words: Multiple sclerosis Bioinformatics Key genes Machine learning |
|
|
|
|