引用本文: | 刘亚东,刘忠宇,崔淼,卢振浩,杨忠博,王亚东.基于深度学习的PacBio测序数据[]DNA甲基化检测方法[J].生物信息学,2025,23(3):175-183. |
| LIU Yadong,LIU Zhongyu,CUI Miao,LU Zhenhao,YANG Zhongbo,WANG Yadong.A deep learning-based DNA methylation detection for PacBio sequencing data[J].Chinese Journal of Bioinformatics,2025,23(3):175-183. |
|
|
|
本文已被:浏览 0次 下载 0次 |
 码上扫一扫! |
|
基于深度学习的PacBio测序数据[]DNA甲基化检测方法 |
刘亚东1,2,刘忠宇1,崔淼1,卢振浩1,杨忠博1,王亚东1,2
|
(1.哈尔滨工业大学 计算学部,哈尔滨 150001; 2.哈尔滨工业大学 郑州研究院,郑州 450000)
|
|
摘要: |
DNA甲基化是存在于真核细胞中的一种关键的表观遗传修饰形式,它能在不改变碱基序列的前提下控制基因表达,并影响生物的发展进程。研究DNA甲基化有助于揭示发育过程中的表观遗传调控机制,为疾病诊断和精准医疗的发展提供重要支撑。Pacific Biosciences(PacBio)的单分子实时测序技术能够进行单分子甲基化检测,无需依赖化学转换过程,保留了更完整的表观遗传信息;并且能够产生平均长度达10 000 bp的测序片段,有助于跨越复杂区域并提供更连贯的甲基化信息。然而,现有面向PacBio测序数据的甲基化检测可选工具仍然较少,且存在检测精度不足的瓶颈问题。因此,本研究提出一种基于深度学习技术的面向PacBio测序数据的DNA甲基化检测方法,该方法通过交叉注意力融合机制,并利用Transformer和BiGRU网络融合碱基特征与信号特征,以充分识别潜在的甲基化位点,实现DNA甲基化的高效和精准检测。文中使用GIAB标准品HG002的全基因组PacBio测序数据进行DNA甲基化检测,与当前主流工具相比,本文提出的方法在单片段水平以及基因组水平上均有最高的检测准确性。同时,该方法采用并行处理显著降低了分析时间,并在内存使用上保持了可控性,为大规模人群甲基化的高效检测提供了重要的技术支撑。 |
关键词: DNA甲基化 PacBio测序技术 深度学习 |
DOI:10.12113/202409010 |
分类号:TP399 |
文献标识码:A |
基金项目: |
|
A deep learning-based DNA methylation detection for PacBio sequencing data |
LIU Yadong1,2, LIU Zhongyu1,CUI Miao1,LU Zhenhao1,YANG Zhongbo1, WANG Yadong1,2
|
(1.Faculty of Computing, Harbin Institute of Technology, Harbin 150001, China;2.Zhengzhou Research Institute, Harbin Institute of Technology, Zhengzhou 450000, China)
|
Abstract: |
DNA methylation is a key epigenetic modification present in eukaryotic cells that can regulate gene expression without altering the DNA sequence itself, impacting the developmental processes of organisms. Studying DNA methylation helps reveal the mechanisms of epigenetic regulation throughout development and provides crucial support for disease diagnosis and precision medicine. Pacific Biosciences (PacBio) real time sequencing technology enables single-molecule methylation detection without relying on chemical conversion processes, preserving a more complete epigenetic profile; it also produces sequencing reads with an average length of 0,0 bp, which helps span complex regions and provides more coherent methylation information. However, the existing tools for methylation detection based on PacBio data are still limited and often lack sufficient accuracy. Therefore, this study introduces a new DNA methylation detection method tailored for PacBio sequencing data using deep learning, which utilizes a cross-attention fusion mechanism and combines Transformer and BiGRU networks to integrate base and signal features, effectively identifying potential methylation sites for efficient and precise DNA methylation detection. This method was applied to the whole-genome PacBio sequencing data of the GIAB standard HG002 sample, and compared to current mainstream tools, our method demonstrated the highest detection accuracy at both the single-fragment and genome levels. Additionally, by using parallel processing, this method significantly reduces analysis time and maintains controlled memory usage, providing important technical support for efficient methylation detection in large-scale populations. |
Key words: DNA methylation PacBio sequencing technology Deep learning |
|
|
|
|