期刊检索

  • 2019年第17卷
  • 2018年第16卷
  • 2017年第15卷
  • 2016年第14卷
  • 2015年第13卷
  • 2014年第12卷
  • 2013年第11卷
  • 第1期
  • 第2期

主管单位 工业和信息化部 主办单位 哈尔滨工业大学 主编 任南琪 国际刊号ISSN 1672-5565 国内刊号CN 23-1513/Q

期刊网站二维码
微信公众号二维码
引用本文:宋东光,卢博彬,陈柳婷.Unix文本比对分析高通量RNA-Seq测序基因表达[J].生物信息学,2018,16(2):119-129.
SONG Dongguang,LU Bobin,CHEN Liuting.Gene expression analysis from high-throughput RNa-Seq sequencing by Unix Text-aligning[J].Chinese Journal of Bioinformatics,2018,16(2):119-129.
【打印本页】   【HTML】   【下载PDF全文】   查看/发表评论  下载PDF阅读器  关闭
←前一篇|后一篇→ 过刊浏览    高级检索
本文已被:浏览 336次   下载 581 本文二维码信息
码上扫一扫!
分享到: 微信 更多
Unix文本比对分析高通量RNA-Seq测序基因表达
宋东光,卢博彬,陈柳婷
(佛山科学技术学院 园艺系,广东 佛山,528231)
摘要:
从RNA-Seq高通量测序短序列进行比对及拼接获得较长转录本并确定基因表达量的方法随着转录组测序的广泛开展仍在不断改进,本文利用类Unix系统的文本处理命令组合对山茶花开花期叶片及花瓣的转录组序列进行比对、序列拼接及其表达量分析。首先对测序序列进行每1万条准随机排序,选取10万条序列分别与100万条序列进行比对,从每个查询序列随机选取9组20 mer比对100万条序列去重后获得该序列的转录数量。利用查询序列首尾20 mer从匹配的比对重叠群进行拼接,初次拼接最长为410 mer,超过两个及以上拼接序列的再次进行相互比对及再拼接,最长1 174 mer。用查询序列的比对匹配数表示其拼接前后的表达量,与用互补链进行比对得到的负链表达量相当。用拼接序列进行NCBI联网blast比对获得了其基因注释。本文得到的结果表明,利用类Unix系统文本比对可以有效用于高通量测序基因表达量及进行序列从头组装等分析。
关键词:  RNA-Seq  文本比对  基因表达量  重叠群拼接  类Unix系统
DOI:10.3969/j.issn.1672-5565.201709003
分类号:Q344+.13
文献标识码:A
基金项目:
Gene expression analysis from high-throughput RNa-Seq sequencing by Unix Text-aligning
SONG Dongguang,LU Bobin, CHEN Liuting
(Department of Horticulture, Foshan University, Foshan 528231, Guangdong, China)
Abstract:
With the rapid development in transcriptome sequencing nowadays, further improvement in methods for estimating gene expression is underway in aligning and assembling short high-throughput RNA-Seq sequences into longer transcripts. Preliminary sequence alignment, assembly and expression of blade and petal transcriptome of Camellia at flowering stage were reported in this study by the combinations of text-filtering commands in Unix-like operating system. Firstly, near-random sorting of every 10 000 sequences were completed, then 100 000 sequences were aligned to 1 million sequences. 9 randomly selected groups of 20 mers selected from each query sequence were aligned to 1 million sequences, and transcripts were counted after removing duplicated sequences. By first- and-last 20 mers of query sequences, assembly was conducted in matching contigs of each aligned group. The longest sequence in first assembly was 410 mers. The longest sequence was 1174 mers in re-aligning and reassembly of two or more joint sequence. Matched aligning counts of each query sequence were used as its expression before and after assembling, which was approximately equal to the minus strands expression after comparing with that of complementary strand. Gene connotations were obtained by aligning joint sequences to remote NCBI blast server. The results show that gene expression and de novo assembly could be effectively analyzed by text-aligning in Unix-like system.
Key words:  RNA-Seq  Text-aligning  Gene expression  Contig-joining  Unix-Like system

友情链接LINKS