| 引用本文: | 刘彦军,杨越飞,胡迎新.基于非天然核酸的高密度DNA存储[]编码方法[J].生物信息学,2026,24(1):70-84. |
| LIU Yanjun,YANG Yuefei,HU Yingxin.High-density DNA storage encoding method based on unnatural nucleic acids[J].Chinese Journal of Bioinformatics,2026,24(1):70-84. |
|
| 摘要: |
| 为了适配非天然核酸(dP、dZ、dS、dB)和天然核酸(A、T、C、G)的存储,进一步提高DNA存储密度,本文提出了一种基于非天然核酸的高密度DNA存储编码方法。该方法首先设计了控制GC含量和均聚物长度的动态轮转映射算法,然后根据动态轮转映射算法,提出了适配八核酸的七进制哈夫曼算法,提升了DNA存储密度,最后利用七进制汉明码实现了数据纠错。对文本、图片、音频不同格式的文件利用该方法编码后,该方法平均编码密度可达3.38 bits/nt,各碱基对的含量可稳定保持在23%~26.5%,且不存在单碱基重复的情况。仿真实验结果表明,所提出的方法提高了DNA存储密度,约束了GC含量和均聚物长度且具有良好的纠错能力。 |
| 关键词: DNA存储 人工核苷酸 哈夫曼编码 映射方法 GC含量约束 均聚物 |
| DOI:10.12113/202407007 |
| 分类号:TP399 |
| 文献标识码:A |
| 基金项目:河北省教育厅自然基金重点项目(No.ZD2022098). |
|
| High-density DNA storage encoding method based on unnatural nucleic acids |
|
LIU Yanjun, YANG Yuefei, HU Yingxin
|
|
(College of Information Science and Technology, Shijiazhuang Tiedao University, Shijiazhuang 050043, China)
|
| Abstract: |
| In order to adapt the storage of non-natural nucleic acids (dP, dZ, dS, dB) and natural nucleic acids (A, T, C, G), and to further improve the DNA storage density, this paper proposes a high-density DNA storage coding method based on non-natural nucleic acids. The method firstly designs a dynamic rotation mapping algorithm to control the GC content and homopolymer length, then based on the dynamic rotation mapping algorithm, a heptadecimal Huffman algorithm adapted to octanucleic acid is proposed to enhance the DNA storage density, and finally, the data error correction is realized by utilizing the heptadecimal Hamming code. After utilizing this method to encode files of different formats of text, audio and pictures, the average coding density of this method can reach 3.38 bits/nt, and the content of each base pair can be maintained stably between 23% and 26.5%, and there is no single base repetition. Simulation experiments show that the proposed method improves the DNA storage density, constrains the GC content and homopolymer length and has good error correction capability. |
| Key words: DNA Storage Artificial nucleotide Huffman coding Mapping method GC content constraints Homopolymer |