引用本文: | 彭显,贺建峰.基于狄利克雷多项式过程模型与K-means 结合的菌群分析[J].生物信息学,2024,22(1):47-57. |
| PENG Xian,HE Jianfeng.Flora analysis based on Dirichlet polynomial process model and K-means[J].Chinese Journal of Bioinformatics,2024,22(1):47-57. |
|
摘要: |
群体分型是一种有助于更好的理解人类身心健康等复杂生物学问题的有效方法,聚类是一种为了对样本分组来降低复杂性的定义肠型的方法,而传统K-means聚类算法的K值选取无法确定,本文在传统K-means聚类算法的基础上进行了改进,并公开数据集上进行了验证,实验表明改进算法能够解决K值选取无法确定的问题,且聚类结果的稳定性、准确性和聚类质量都得到显著提高。将改进后的模型运用于肠道菌群OTUs数据,发现不仅能够有效地区分2-型糖尿病患者样本间的相似性,而且能鉴定出影响菌群结构异质性最大的OTUs菌,为临床解决2-型糖尿病问题提供了一种新的思路。 |
关键词: K-means算法 狄利克雷过程混合模型 菌群分析 群体分型 聚类 |
DOI:10.12113/202202014 |
分类号:TP181 |
文献标识码:A |
基金项目: |
|
Flora analysis based on Dirichlet polynomial process model and K-means |
PENG Xian, HE Jianfeng
|
(School of Information Engineering and Automation,Kunming University of Technology,Kunming 650000,China)
|
Abstract: |
Population typing is an effective method to better understand complex biological problems such as human physical and mental health. Clustering is a method to define intestinal type in order to reduce complexity by grouping samples. However, the selection of K value of traditional K-means clustering algorithm cannot be determined. This paper improves the traditional K-means clustering algorithm and verifies it on the public dataset, The experimental results show that the improved algorithm can solve the problem of undetermined K value selection, and the stability, accuracy and quality of clustering results are significantly improved. Applying the improved model to the OTUs data of intestinal flora, it is found that it can not only effectively distinguish the similarities between samples of patients with type 2 diabetes, but also identify the OTUs bacteria that have the greatest impact on the heterogeneity of flora structure, providing a new perspective for clinical solutions to the problem of type 2 diabetes. |
Key words: K-means algorithm Dirichlet process mixed model Flora analysis Population typing Clustering |