摘要: |
拷贝数变异是指基因组中发生大片段的DNA序列的拷贝数增加或者减少。根据现有的研究可知,拷贝数变异是多种人类疾病的成因,与其发生与发展机制密切相关。高通量测序技术的出现为拷贝数变异检测提供了技术支持,在人类疾病研究、临床诊疗等领域,高通量测序技术已经成为主流的拷贝数变异检测技术。虽然不断有新的基于高通量测序技术的算法和软件被人们开发出来,但是准确率仍然不理想。本文全面地综述基于高通量测序数据的拷贝数变异检测方法,包括基于reads深度的方法、基于双末端映射的方法、基于拆分read的方法、基于从头拼接的方法以及基于上述4种方法的组合方法,深入探讨了每类不同方法的原理,代表性的软件工具以及每类方法适用的数据以及优缺点等,并展望未来的发展方向。 |
关键词: 高通量测序数据 基因组变异 拷贝数变异检测 |
DOI:10.12113/202206012 |
分类号:TP391 |
文献标识码:A |
基金项目: |
|
A review of methods for copy number variation detectionusing high throughput sequencing data |
LIU Zhen1,LIU Yongzhuang2
|
(1. Harbin Genars Technology Co., Ltd, Harbin 150001, China; 2. School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China)
|
Abstract: |
Copy number variation refers to the increase or decrease in the copy number of a large segment of DNA sequence in the genome. Previous studies have revealed that copy number variation is the cause of many human diseases and is closely related to their mechanisms of occurrence and development. The emergence of high-throughput sequencing technology has provided technical support for copy number variation detection, which has become the mainstream copy number variation detection technology in human disease research and clinical diagnosis. Although new algorithms and softwares based on high-throughput sequencing technology have been developed, the accuracy is still in challenge. This paper presents a comprehensive review of copy number variation detection methods based on high-throughput sequencing data, including methods based on the methods of depth of reads, double-end mapping, reads splitting, scratch splicing, and the method based on a combination of the above four techniques. Moreover, the principles of each type of method, representative software tools, and applicable data as well as advantages and disadvantages of each type of method are discussed in depth. In addition, the future directions for development in high-throughput sequencing technology are also explored. |
Key words: Next Generation sequencing data Genome structure variant Copy number variation detection |