引用本文: | 曹阿成,李晓琴,高斌.基于基因突变频率识别低级别脑胶质瘤驱动基因[J].生物信息学,2023,21(1):37-44. |
| CAO Acheng,LI Xiaoqin,GAO Bin.Identification of low-grade glioma driver genes based on gene mutation frequency[J].Chinese Journal of Bioinformatics,2023,21(1):37-44. |
|
摘要: |
癌症通常由基因变异的累积所驱动,有效地识别癌症的驱动突变是一个巨大的挑战。目前已有方法更多是通过将基因组区域中观察到的突变率与背景突变率(BMR)预期的突变率进行比较或功能影响测试来识别驱动基因,该驱动基因本质上是存在统计异常的基因。而且并未对已有明确分类的癌症的子类之间驱动基因进行研究。本文引入关联规则算法,探寻发生该基因突变诱使病人患该子类低级别脑胶质瘤的有效规则,将突变数据与患癌结果通过算法建立关系,再通过支持度、置信度和提升度这三个指标对产生的规则进行筛选和评估,来预测候选驱动基因以及类间驱动基因差异。最后利用491例低级别脑胶质瘤体细胞突变数据,得到22个与结果存在关联的驱动基因及其所属的子类,敏感性和假阳性结果优于目前已有的单一算法,且22个基因均具有重要的生物学功能。同时建立了基于22个基因的低级别脑胶质瘤子类识别方法,模型总体准确率达98.99%,方法可有效区分三子类。 |
关键词: 驱动基因 关联规则 Apriori 低级别脑胶质瘤 |
DOI:10.12113/202110020 |
分类号:Q7 |
文献标识码:A |
基金项目:国家重点研发计划资助项目(No.2017YFC0111104); 国家自然科学基金资助项目(No.61931013). |
|
Identification of low-grade glioma driver genes based on gene mutation frequency |
CAO Acheng, LI Xiaoqin, GAO Bin
|
(Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, China)
|
Abstract: |
Cancer is often driven by the accumulation of genetic variants, and effectively identifying the driver mutations in cancer is a great challenge. The current methods of identifying driver genes mainly include comparing observed mutation rates in regions of the genome with those predicted from background mutation rates (BMR) or conducting functional impact tests, and the genes are essentially statistically abnormal genes. Besides, driver genes between subclasses of well-defined cancers have not been studied. In this study, an association rule algorithm was introduced to explore the effective rules for the occurrence of this gene mutation that induces patients to suffer from this subtype of low-grade glioma, and the relationship between the mutation data and the results of cancer was established through the algorithm. Then,three metrics of support, confidence, and lift were used to screen and evaluate the obtained rules to predict candidate driver genes as well as between-class driver gene differences. Finally, using the somatic mutation data of 491 cases of low-grade gliomas, we obtained 22 driver genes associated with the results and their subclasses. The sensitivity and false-positive results were better than the existing single algorithm, and the 22 genes had important biological functions. At the same time, a subclass identification method of low-grade glioma based on the 22 genes was established. The overall model accuracy rate was 98.99%, and the method could effectively distinguish three subclasses. |
Key words: Driver genes Association rule algorithm Apriori Low-grade glioma |