摘要: |
细胞类型注释是单细胞RNA测序(scRNA-Seq)分析的基本任务。为了解决处理稀疏数据时出现的性能下降以及计算复杂度较高的问题,本文提出了一种基于深度学习模型Transformer的scRNA-seq数据的细胞类型识别和注释工具,scTransformer。模型包含四个模块,即基因嵌入、位置编码、变换编码器和分类层;基因嵌入过程将K个高变异基因(HVG)(K=2 000)处理为N个子向量;以未分配率、F1分数、准确度、kappa分数、AUR指标作为评判标准,系统评估模型和其他9种工具的性能。结果表明:在数据集内,scTransformer的准确度达到96.59%,高于其他工具,未分配率达到了0.18%;可能因为样本的不均衡,其平均F1分数为93.46%,低于CHETAH,Clustifyr和SciBet;在跨平台相同组织间测试和完全不同组织间测试中(胰腺、血液),scTransformer的准确率、F1分数和kappa系数均是最好的(>0.99);在小鼠大脑、胰腺、肺组织中,scTransformer的AUR和未分配率仅次于Seurat工具和Clustifyr工具。scTransformer源代码和数据位于https://github.com/nanjingyuanbao/scTransformer。综上,本文提出并系统评估了一种新的基于Transformer的细胞类型注释工具。 |
关键词: 细胞类型注释 变换器模型 单细胞RNA测序 |
DOI:10.12113/202402008 |
分类号:Q811.4; |
文献标识码:A |
基金项目: |
|
scTransformer:A deep learning based method for single cell type annotation |
YUAN Jiaxin, LIU Hongde
|
(School of Biological Science and Medical Engineering,Southeast University,Nanjing 211189, China)
|
Abstract: |
Cell type annotation is an essential task for single-cell RNA sequencing (scRNA-Seq) analysis. In this paper, we propose a tool for cell type identification of scRNA-Seq data, which is based on the Transformer, in order to overcome the performance degradation and high computational complexity that occur when dealing with sparse data. The model includes four modules, which are Gene Embedding, Position Encoding, Transformer Encoder and Classification Layer; The Gene Embedding process processes KHighly Variants Genes (HVGs) (K=2 000) into Nsub-vectors; Unassigned rate, F1score, accuracy, kappascore and AURmetrics are used as evaluation criteria to systematically assess the performance of the model and the other nine tools. The results show that within the dataset, scTransformer achieve 96.59% accuracy, which is higher than the other tools, and the unassigned rate reaches 0.18%. Probably due to the imbalance of the samples, its average F1score is 93.46%, which is lower than that of CHETAH, Clustifyr and SciBet; In the cross-platform same-organisation inter-tissue test and the completely different-organisation inter-tissue test (pancreas, blood), scTransformer has the best accuracy, F1score and kappacoefficient (>0.99); In mouse brain, pancreas and lung tissues, scTransformers AURand unassigned rate are second only to those of the Seurat tool and the Clustifyr tool. scTransformer source code and data are available at https://github.com/nanjingyuanbao/scTransformer. In conclusion, this paper presents and systematically evaluates a new Transformer-based cell type annotation tool. |
Key words: Cell type annotation Transformer scRNA-seq |