引用本文: | 宋东光,卢博彬,陈柳婷.Unix文本比对分析高通量RNA-Seq测序基因表达[J].生物信息学,2018,16(2):119-129. |
| SONG Dongguang,LU Bobin,CHEN Liuting.Gene expression analysis from high-throughput RNa-Seq sequencing by Unix Text-aligning[J].Chinese Journal of Bioinformatics,2018,16(2):119-129. |
|
摘要: |
从RNA-Seq高通量测序短序列进行比对及拼接获得较长转录本并确定基因表达量的方法随着转录组测序的广泛开展仍在不断改进,本文利用类Unix系统的文本处理命令组合对山茶花开花期叶片及花瓣的转录组序列进行比对、序列拼接及其表达量分析。首先对测序序列进行每1万条准随机排序,选取10万条序列分别与100万条序列进行比对,从每个查询序列随机选取9组20 mer比对100万条序列去重后获得该序列的转录数量。利用查询序列首尾20 mer从匹配的比对重叠群进行拼接,初次拼接最长为410 mer,超过两个及以上拼接序列的再次进行相互比对及再拼接,最长1 174 mer。用查询序列的比对匹配数表示其拼接前后的表达量,与用互补链进行比对得到的负链表达量相当。用拼接序列进行NCBI联网blast比对获得了其基因注释。本文得到的结果表明,利用类Unix系统文本比对可以有效用于高通量测序基因表达量及进行序列从头组装等分析。 |
关键词: RNA-Seq 文本比对 基因表达量 重叠群拼接 类Unix系统 |
DOI:10.3969/j.issn.1672-5565.201709003 |
分类号:Q344+.13 |
文献标识码:A |
基金项目: |
|
Gene expression analysis from high-throughput RNa-Seq sequencing by Unix Text-aligning |
SONG Dongguang,LU Bobin, CHEN Liuting
|
(Department of Horticulture, Foshan University, Foshan 528231, Guangdong, China)
|
Abstract: |
With the rapid development in transcriptome sequencing nowadays, further improvement in methods for estimating gene expression is underway in aligning and assembling short high-throughput RNA-Seq sequences into longer transcripts. Preliminary sequence alignment, assembly and expression of blade and petal transcriptome of Camellia at flowering stage were reported in this study by the combinations of text-filtering commands in Unix-like operating system. Firstly, near-random sorting of every 10 000 sequences were completed, then 100 000 sequences were aligned to 1 million sequences. 9 randomly selected groups of 20 mers selected from each query sequence were aligned to 1 million sequences, and transcripts were counted after removing duplicated sequences. By first- and-last 20 mers of query sequences, assembly was conducted in matching contigs of each aligned group. The longest sequence in first assembly was 410 mers. The longest sequence was 1174 mers in re-aligning and reassembly of two or more joint sequence. Matched aligning counts of each query sequence were used as its expression before and after assembling, which was approximately equal to the minus strands expression after comparing with that of complementary strand. Gene connotations were obtained by aligning joint sequences to remote NCBI blast server. The results show that gene expression and de novo assembly could be effectively analyzed by text-aligning in Unix-like system. |
Key words: RNA-Seq Text-aligning Gene expression Contig-joining Unix-Like system |