引用本文: | 闫学新,汪大伟,栾郑豪,汪桐,宋佳,尚策,梁德森.丝氨酸蛋白酶抑制剂(SERPIN)家族相关基因在结肠腺癌中预后模型的建立及应用[J].生物信息学,2024,22(1):37-46. |
| YAN Xuexin,WANG Dawei,LUAN Zhenghao,WANG Tong,SONG Jia,SHANG Ce,LIANG Desen.Establishment and application of serine protease inhibitor (SERPIN) family related genes in a prognostic model of colon adenocarcinoma[J].Chinese Journal of Bioinformatics,2024,22(1):37-46. |
|
摘要: |
应用生物信息学方法,构建结肠腺癌(COAD)丝氨酸蛋白酶抑制剂(SERPIN)家族相关基因预后模型。从TCGA数据库和GEO数据库下载结肠腺癌(COAD)转录组和临床数据,根据数据中SERPINs家族基因的表达量对COAD患者进行一致性聚类分析;将数据随机均分为训练集(Train)组和验证集(Test)组,基于两个亚型的差异基因,利用Train组进行COX回归和Lasso回归构建预后模型,根据模型风险评分中位值将样本分为高、低风险两组,绘制高低风险组患者生存曲线;通过ROC曲线评价模型预测能力;利用Test组数据验证模型;构建列线图,评估患者生存率模型预测值与实际值的一致性;并利用利用ESTIMATE算法和CIBERSORT算法评估风险评分和肿瘤微环境(TME)以及免疫浸润的相关性。通过34个SERPIN基因确定了两个亚型,基于2个亚型筛选出了436个预后相关分型差异基因,通过Lasso回归确定出了11个预后相关基因参与风险模型的构建,根据模型评分区分的高低风险组具有明显的生存差异,列线图可以准确预测1、3和5年生存率。肿瘤微环境分析和免疫浸润分析显示高风险评分组患者免疫活性差。SERPIN家族相关基因构建的风险评分模型能够预测COAD的预后,有利于进一步指导临床对COAD的诊治,提高患者生存率。 |
关键词: 丝氨酸蛋白酶抑制剂 结肠腺癌 预后模型 免疫浸润 |
DOI:10.12113/202212002 |
分类号:R73 |
文献标识码:A |
基金项目:哈尔滨医科大学附属第一医院优秀(杰出)青年医学人才培养资助项目(No.HYD2020JQ0011). |
|
Establishment and application of serine protease inhibitor (SERPIN) family related genes in a prognostic model of colon adenocarcinoma |
YAN Xuexin,WANG Dawei,LUAN Zhenghao,WANG Tong,SONG Jia,SHANG Ce,LIANG Desen
|
(Department of Anal and Intestinal Surgery, The First Hospital of Harbin Medical University, Harbin 150001, China)
|
Abstract: |
To construct a prognostic model of serine protease inhibitor (SERPIN) family-related genes in colon adenocarcinoma (COAD) was constructed by applying bioinformatics methods. Transcriptomic and clinical data of colon adenocarcinoma (COAD) were downloaded from the TCGA database and GEO database, and consistent clustering analysis of COAD patients was performed based on the expression of SERPINsfamily genes in the data; the data was randomly and equally divided into training set (Train) and validation set (Test). Based on the differential genes of the two subtypes, the COX regression and Lasso regression were used to construct a prognostic model using the Train set. The samples were divided into two groups of high and low risk according to the median risk score of the model, and survival curves were drawn for patients in the high and low risk groups; the predictive ability of the model was evaluated by ROC curves; the model was validated using the data from the Test group. Column line plots were constructed to assess the consistency between the predicted and actual values of patient survival rates. The ESTIMATE algorithm and CIBERSORT algorithms were utilized to assess the correlation between risk scores and tumor microenvironment (TME) as well as immune infiltration. The results showed that two subtypes were identified based on 34 SERPIN genes, and 436 prognosis-related staging differences were screened based on two subtypes. 11 prognosis-related genes were identified through Lasso regression to participate in the construction of the risk model, and the high and low risk groups distinguished by model scores had significant survival differences.In addition, the column line graphs could accurately predict 1-year, 3-year, and 5-year survival rates. Tumor microenvironment analysis and immune infiltration analysis showed poor immune activity in patients in the high-risk score group. In addition, the risk score model constructed by SERPINfamily-related genes can predict the prognosis of COAD, which is beneficial to further guide the clinical diagnosis and treatment of COAD and improve the survival rate of patients. |
Key words: Serine protease inhibitor Prognosis model Colon adenocarcinoma Immune infiltration |