引用本文: | 吕俊增,曹舒淇,姜涛.基于大规模人群变异的中国人参考基因组构建方法[J].生物信息学,2025,23(2):88-95. |
| Lu Junzeng,CAO Shuqi,JIANG Tao.A method for Chinese-specific reference genome construction basedon large-scale population genomic variants[J].Chinese Journal of Bioinformatics,2025,23(2):88-95. |
|
摘要: |
基因组变异作为生物遗传多样性产生的核心,对解析生命演化、揭示物种内个体差异、探究疾病机制等方面有重要影响,而参考基因组作为遗传研究中的参考框架,其序列表征能力会直接影响遗传变异的准确识别。当前广泛应用的人类参考基因组主要由西方人群样本组成,对中国人群特异性遗传变异解析能力不足,亟需构建有中国人遗传特性的新参考基因组,以促进对中国人群遗传和进化机制的深入研究。本研究提出一种基于人群基因组变异的参考基因组改造方法,利用单核苷酸变异(SNV)、短插入删除变异(Indel)以及结构变异(SV)三种类型的东亚人群变异数据对GRCh38版本人类参考基因组进行改造,经过多层筛选、修订,建立了一系列包含不同变异频率、变异类型的中国人参考基因组。通过选取不同地域的中国人样本测序数据对所改造的中国人参考基因组进行序列比对测试,选取变异频率超过2/3,1/2,1/2的东亚人SV,Indel和SNV变异改造GRCh38参考基因组时分别获得了最佳比对效果。最终整合上述对应变异频率下的全部变异改造参考基因组时,得到了最优的中国人参考基因组。本研究所建立的中国人参考基因组将有望提升大规模中国人群基因组变异识别的能力,为后续中国人参考基因组构建工作提供有效方法。方法详见:https://github.com/azheasir/Chinese-specific-reference-genome-construction。 |
关键词: 大规模人群 参考基因组 基因组序列比对 |
DOI:10.12113/202403010 |
分类号:TP399 |
文献标识码:A |
基金项目: |
|
A method for Chinese-specific reference genome construction basedon large-scale population genomic variants |
Lu Junzeng1, CAO Shuqi1, JIANG Tao1,2
|
(1. Faculty of Computing, Harbin Institute of Technology, Harbin 150001, China;2. Zhengzhou Research Institute, Harbin Institute of Technology, Zhengzhou 450000, China)
|
Abstract: |
Genomic variation is at the core of genetic diversity and has a significant impact on the analysis of evolution, the revelation of individual differences within a species, and the investigation of disease mechanisms. The ability to characterize the reference genome sequence is crucial for genetic research as it directly affects the accurate identification of genetic variants. Currently, the human reference genome is based on samples from Western populations, which may not accurately represent the genomic variants in Chinese populations. Therefore, constructing a new reference genome with Chinese genetic characteristics is urgently needed to facilitate in-depth research on the genetic and evolutionary mechanisms of Chinese populations. The objective of this study is to propose a method for modifying the GRCh38 version of the human reference genome based on population genomic variants. This method employs three types of East Asian population variants: single nucleotide variants (SNVs), short insertion-deletion variants (Indels), and structural variants (SVs). After screening and revisions, a series of Chinese reference genomes with different allele frequencies and variant types were established. Sequencing data from various regions in China were used to benchmark the modified Chinese reference genomes. The reference genome, which respectively selected high-frequency SVs, Indels, and SNVs from East Asian populations with frequencies of over 2/3,1/2, and 1/2, achieved the best read mapping results. The optimal Chinese reference genome was obtained by incorporating all the above variants into GRCh38. The Chinese reference genome established in this study is expected to enhance the ability to identify large-scale genome variants in the Chinese population and provide an effective method for subsequent Chinese reference genome construction. Further details on the methodology can be found at: https://github.com/azheasir/Chinese-specific-reference-genome-construction. |
Key words: Large-scale population Reference genome Sequencing read alignment |