引用本文: | 张林,石玥,汪菲,李琪,万苏磊,王雪松.ILLUMINA Golden Gate DNA甲基化芯片的KL-FCM聚类分析[J].生物信息学,2014,12(02):106-109. |
| ZHANG Lin,SHI Yue,WANG Fei,LI Qi,WAN Sulei,Wang Xuesong.KL-FCM clustering analysis inIllumina golden gate DNA methylation microarrray[J].Chinese Journal of Bioinformatics,2014,12(02):106-109. |
|
摘要: |
DNA甲基化作为一种重要的表观遗传修饰,其甲基化水平被发现与疾病的发生发展密切相关,对其进行聚类分析有希望发现新的疾病亚型并建立有效的疾病预测预后方法。传统的聚类分析方法之一模糊C-均值(FCM:Fuzzy C-means)适用于特征空间呈球形或椭球形分布的场景,缺乏普适性。而Illumina Golden Gate平台通过计算基因的各甲基化位点的甲基化百分比描述其甲基化程度,其值位于(0,1)之间,服从混合贝塔分布,不能直接采用FCM进行聚类分析。鉴于此,本文提出基于KL特征测度的KL-FCM聚类算法,采用各样本间的K-L距离作为样本划分时的度量准则。最后,本文基于KL-FCM算法实现IRIS测试数据集和基因的DNA甲基化水平数据的聚类分析。实验结果表明该方法可以以更低的计算负荷获得优于k-均值(k-means)和传统FCM的分类效果。 |
关键词: 模糊C均值 ILLUMINA DNA甲基化芯片 K-L距离 |
DOI:10.3969/j.issn.1672-5565.2014-02.20140205 |
分类号:TP181 |
基金项目:中国博士后基金面上项目(2012M511336、2012M511335);江苏省大学生创新创业训练计划;霍英东教育基金会青年教师基金(121066)资助。 |
|
KL-FCM clustering analysis inIllumina golden gate DNA methylation microarrray |
ZHANG Lin,SHI Yue,WANG Fei,LI Qi,WAN Sulei,Wang Xuesong
|
(School of Information and Electrical Engineering China University of Mining and Technology,Xuzhou Jiangsu 221116,China)
|
Abstract: |
DNA methylation is an important epigenetic modification, which has been found to be closely related to the occurrence and development of disease. Clustering analysis of DNA methylation is expected to find novel subtype of disease or novel method of prediction and prognosis. Fuzzy C-means (FCM) is one of the common clustering methods. However it is more suitable in the condition that the feature space follows spherical or elliptical distribution, which makes it lack in universality. Illumina Golden Gate platform describes the methylation level based on the methylation percentage of each locus in each gene, and it is in (0,1), which follows beta mixture distribution. Thus we can not adopt FCM for clustering directly. This paper introduces the KL-FCM clustering method, which calculates the K-L distance of samples as partition measure. The KL-FCM is used to cluster the IRIS test dataset and some DNA methylation profile data. The validation results show that KL-FCM,with less computational load, can get better clustering performance than k-means and traditional FCM clustering methods. |
Key words: Fuzzy C-means DNA methylation expressionmicroarray K-L distance |