引用本文: | YAN Lingjuan,CHEN Yingli,YAN Dongxue,FAN Zhiyu.多特征融合的植物长链非编码RNA的预测[J].生物信息学,2021,19(2):128-135. |
| YAN Lingjuan,CHEN Yingli,YAN Dongxue,FAN Zhiyu.Prediction of plant long non-coding RNA by fusing multiple features[J].Chinese Journal of Bioinformatics,2021,19(2):128-135. |
|
摘要: |
长链非编码RNA(Long non-coding RNA, lncRNA)是一类被定义为转录本的长度大于200 nt、没有蛋白编码能力的RNA转录本。研究表明,lncRNA在调节植物生长发育、表观遗传反应以及各种胁迫反应中起重要作用。但是与人类和动物相比,植物lncRNA的研究仍然处于起步阶段。目前,如何从大量的转录本中准确地挑选出lncRNA仍然是植物lncRNA研究领域的重要问题之一。本文构建了新的植物lncRNA和mRNA数据集,分析了数据集中植物lncRNA的序列及结构特征,提取了序列的k-mer频数信息、二级结构信息、开放阅读框信息以及序列的几何柔性等特征,基于SVM(Support Vector Machine, SVM)算法,用Jackknife检验对植物lncRNA进行了预测,并且计算了各种特征融合后对植物lncRNA预测结果的影响,准确率达到了96.14%。 |
关键词: 植物lncRNA 特征提取 多特征融合 支持向量机 |
DOI:10.12113/202006007 |
分类号:Q61 |
文献标识码:A |
基金项目:国家自然科学基金项目(No.61861035;31870838). |
|
Prediction of plant long non-coding RNA by fusing multiple features |
YAN Lingjuan, CHEN Yingli, YAN Dongxue, FAN Zhiyu
|
(School of Physical Science and Technology, Inner Mongolia University, Hohhot 010021, China)
|
Abstract: |
Long non-coding RNA (lncRNA) is a type of RNA transcript defined as having a length greater than 200 nt and no protein coding ability. Studies have shown that lncRNA plays an important role in regulating plant growth and development, epigenetic responses, and various stress responses. However, compared with humans and animals, the study of plant lncRNA is still in its infancy. How to accurately select lncRNA from a large number of transcripts is still one of the important issues in the field of plant lncRNA research. This study constructed a new plant lncRNA and mRNA dataset, analyzed the sequence and structural features of the plant lncRNA in the dataset, and extracted the k-mer frequency information, secondary structure, open reading frame, and geometric flexibility information of the sequence, based on the Support Vector Machine(SVM) algorithm. Jackknife test was conducted for the prediction of plant lncRNA, and the impact of the fusion of various features on the prediction results of plant lncRNA was calculated, where the accuracy reached 96.14%. |
Key words: Plant lncRNA Feature extraction Multiple features fusion Support Vector Machine |