基于序列特征的环状RNA识别

周晶; 谢雪英; 顾万君

期刊检索

关键词检索

新闻公告MORE

主管单位 工业和信息化部 主办单位 哈尔滨工业大学主编任南琪 国际刊号ISSN 1672-5565 国内刊号CN 23-1513/Q

期刊网站二维码

微信公众号二维码

引用本文:	周晶,谢雪英,顾万君.基于序列特征的环状RNA识别[J].生物信息学,2018,16(2):113-118.
	ZHOU Jing,XIE Xueying,GU Wanjun.Identification of circular RNAs using genomic sequence features[J].Chinese Journal of Bioinformatics,2018,16(2):113-118.

【打印本页】【HTML】【下载PDF全文】【查看/发表评论】【下载PDF阅读器】【关闭】

←前一篇|后一篇→

过刊浏览高级检索

本文已被：浏览 2380次下载 1855次	码上扫一扫！
分享到：微信更多字体:加大+\|默认\|缩小-
基于序列特征的环状RNA识别
周晶¹,谢雪英^2,3,顾万君^2,3
(1.东南大学学习科学研究中心,南京 210096;2.生物电子学国家重点实验室,东南大学生物科学与医学工程学院,南京 210096;3.生物医学工程国家级实验教学示范中心(东南大学),南京 210096)

摘要:

环状RNA是新发现的一类具有重要生物学功能的RNA。现有的环状RNA识别工具依赖高通量测序数据,因数据本身和识别方式的弊端而普遍存在准确性不足、不同方法间重复性低以及假阳性率/假阴性率高等缺点。为了解决该问题,我们搭建模型来实现不依赖于测序数据而根据序列的内在特征的环状RNA从头预测。本文选取了包括剪接位点上下游内含子的长度、A-to-I密度和Alu重复序列等100个与RNA成环相关的序列特征,建立了机器学习模型,并识别了人类基因组中的环状RNA,比较了两种机器学习方法随机森林法(RF)和支持向量机(SVM)的分类效果。结果表明,所选序列特征能有效地鉴别RNA能否成环,同时,不同序列特征对模型的分类预测能力的贡献也不同。相比于SVM方法,RF分类的效果更好。

关键词: 环状RNA 序列特征机器学习随机森林支持向量机

DOI：10.3969/j.issn.1672-5565.201709002

分类号:Q522+.6

文献标识码:A

基金项目:国家自然科学基金(4,2 , 61571109)；江苏省重点研发计划(BE2016002-3)；中央高校基本科研业务费专项资金(2242017K3DN04).

Identification of circular RNAs using genomic sequence features

ZHOU Jing ¹, XIE Xueying ^2,3, GU Wanjun ^2,3[HJ1.4mm]

(1. Research Center for Learning Sciences, Southeast University, Nanjing 210096, China; 2. State Key Laboratory of Bio-electronics, School of Biological Sciences and Medical Engineering, Southeast University, Nanjing 210096, China; 3. National Demonstration Center for Experimental Biomedical Engineering Education (Southeast University), Nanjing 210096,China)

Abstract:

Circular RNAs (circRNAs) are a class of novel RNAs with important biological functions. Currently, the identification tools of circRNAs are dependent on high-throughput sequencing. However, due to defects in data and their identification mode, low accuracy, low overlapping rate of different methods, high false positive rate, and false negative rate generally exist. To solve this problem, we built a model to identify circRNAs from the very beginning based on the inherent features of the genomic sequence rather than sequencing data. We selected 100 genomic sequence features related to circRNAs including the length of flanking introns, the density of A-to-I RNA editing sites, and the pairing score of Alu elements in the flanking introns, built machine learning model, identified the circRNAs in human genome, compared the classifying results of two machine learning algorithms, random forest (RF) and support vector machine (SVM). The results showed that the selected features could effectively identify circRNAs and different sequence features had different contributions to the identification of circRNAs. In addition, RF model had a better performance than SVM model in identifying RNAs.

Key words: Circular RNAs Sequence feature Machine learning Random forest Support vector machines

期刊检索

关键词检索

新闻公告MORE

友情链接LINKS