摘要: |
近年来,高通量单细胞测序技术为生物学研究带来了深刻的新发现,单细胞 RNA 测序技术的快速发展使得人们能够以单个细胞的分辨率定量研究细胞全基因组的转录水平。如何从高维的单细胞 RNA 测序数据中获取关于细胞和基因组的有用信息,是单细胞 RNA 测序数据分析中备受关注的重要问题。为此,广大研究者对单细胞 RNA 测序数据分析问题进行了大量的研究,开发了许多计算方法和相应的软件包。本综述将总结单细胞 RNA 测序数据分析的常用方法及其所涉及的数学基础,主要内容包括对原始数据进行预处理、聚类分析、拷贝数变异分析和非负矩阵分解等。本文重点介绍相应数据处理方法背后的数学原理,揭示如何利用数学工具获取单细胞 RNA 测序数据的信息,以便于数学工作者进行单细胞数据分析工作和对现有方法进行深入探索与改进。 |
关键词: 单细胞转录组测序数据 数据预处理 聚类分析 数学基础 |
DOI:10.12113/202307007 |
分类号:Q522;R318.04 |
文献标识码:A |
基金项目:国家自然科学基金项目(No.11831015). |
|
Mathematics in single-cell RNA transcriptome analysis |
LUO Qinqin, LEI Jinzhi
|
(School of Mathematical Sciences, Center for Applied Mathematics, Tiangong University, Tianjin 300398, China)
|
Abstract: |
In recent years, high-throughput single-cell sequencing technology has brought profound new findings to biological research. The rapid development of single-cell RNA sequencing technology enables researchers to study the transcriptome of individual cells. How we can obtain helpful information about cells and genomes from high-dimensional single-cell RNA sequencing data is an important issue in analyzing single-cell RNA sequencing data. Many researchers have conducted studies on single-cell RNA sequencing data analysis and developed many computational methods and corresponding software packages. This review summarizes the classical computational methods and mathematical basis involved in single-cell RNA sequencing data analysis, including data preprocessing, dimensionality reduction, clustering analysis, pseudo-time analysis, copy number variation analysis, and non-negative matrix factorization. We mainly focus on the mathematical principles behind the corresponding data processing methods, reveal how we can apply mathematical tools to obtain information from single-cell RNA data, and provide guidance for mathematicians to analyze single-cell data and to further develop and improve the existing methods. |
Key words: Single-cell RNA-seq data Data preprocessing Clustering analysis Mathematical basic |