基于第三代纳米孔测序技术构建及注释东方蜜蜂微孢子虫的全长转录组
首发时间:2020-04-17
摘要:本研究旨在利用Oxford Nanopore测序技术组装和注释东方蜜蜂微孢子虫(Nosema ceranae)的高质量参考转录组。采用Nanopore PromethION系统对东方蜜蜂微孢子虫的纯净孢子进行测序。利用Guppy软件对raw reads进行base calling。通过过滤短片段和低质量raw reads得到clean reads。通过识别两端引物鉴定全长转录本序列。利用Blast工具将全长转录本比对Nr、Swissprot、KOG、eggNOG、Pfam、GO和KEGG数据库,获得相应注释信息。分别利用CPC、CNCI、CPAT、Pfam四种方法对lncRNA进行预测,取四者的交集作为高可信度的长链非编码RNA(long noncoding RNA, lncRNA)。Nanopore测序共测得6 988 795条raw reads,经质控得到6 953 469条clean reads,其中包含5 143 999条全长clean reads。共鉴定到10 243条非冗余全长转录本,N50和平均长度分别为1 042 bp和894 bp,最大长度为4 855 bp。分别有9 342、4 038、4 283、2 569、4 859、3 450条全长转录本可注释到Nr、KOG、eggNOG、Pfam、GO和KEGG数据库。注释全长转录本数量最多的物种是东方蜜蜂微孢子虫、蜜蜂微孢子虫(Nosema apis)和家蚕微孢子虫(Nosema bombycis)。共鉴定到87个高可信度长链非编码RNA(long non-coding RNA, lncRNA),包含49个正义链lncRNA(sense lncRNA)、25个反义链lncRNA(anti-sense lncRNA)和13个基因间区lncRNA(long intergenic RNA, lincRNA)。本研究的测序量足以检测到全部表达的全长转录本;全长转录本的表达量范围在0.1到10000以上。构建和注释了东方蜜蜂微孢子虫的高质量参考转录组,可为病原的比较转录组分析、转录本的可变剪接和可变腺苷酸化分析、SSR位点挖掘、基因结构优化以及基因全长序列克隆及功能研究提供关键基础。
关键词: 第三代测序技术 纳米孔测序 东方蜜蜂微孢子虫 全长转录组
For information in English, please click here
Construction and annotation and full-length trancriptome of Nosemaceranae based on third-generation Nanopore sequencing technology
Abstract:This study aimed to assemble and annotate a high-quality full-length transcriptome of Nosema ceranae using Oxford Nanopore sequencing technology. Clean spores of N. ceranae were sequenced using Nanopore PromethION system. Guppy software was used to perform base calling of raw reads. After filtering out short fragments and low-quality raw reads, clean reads were obtained. Full-length transcripts were identified by recognizing primers at both ends of clean reads. Full-length transcripts were aligned to Nr, Swissprot, KOG, eggNOG, Pfam, GO and KEGG databases to gain corresponding annotations. Four methods including CPC, CNCI, CPAT, and Pfam were used to predict lncRNAs, and the intersection was deemed to be high-reliability lncRNAs. A total of 6 988 795 raw reads were produced from Nanopore sequencing, and 6953469 clean reads were gained after quality control, including 5 143 999 full-length clean reads. Besides, 10 243 non-redundant full-length transcripts were identified, with an N50 of 1 042 bp and an average length of 894 bp; the maximum length was 4855 bp. Further, 9 342, 4 038, 4 283, 2 569, 4 859 and 3450 full-length transcripts can be annotated to Nr, KOG, eggNOG, Pfam, GO and KEGG, respectively. Additionally, the majority of full-length transcripts were annotated to N. ceranae, Nosema apis and Nosema bombycis. Totally, 87 lncRNAs were identified including 49 sense lncRNAs, 25 anti-sense lncRNAs and 13 long intergenic RNAs (lincRNAs). Moreover, the sequencing depth in this study was enough to detect all expressed full-length transcripts, and the expression level was from 0.1 to more than 10 000. The high-quality reference transcriptome of N. ceranae was constructed and annotated in this work, laying a key foundation for comparative transcriptome analysis, investigation of alternative splicing and alternative adenylation of transcripts, identification of SSR loci, optimization of gene structure, full-length sequence clone and functional study of genes.
Keywords: Third-generation sequencing technology Nanopore sequencing Nosema ceranae full-length transcriptome
引用
No.****
动态公开评议
共计0人参与
勘误表
基于第三代纳米孔测序技术构建及注释东方蜜蜂微孢子虫的全长转录组
评论
全部评论0/1000