详细信息
云南干热河谷地区余甘子转录组分析 被引量:1
Transcriptome Analysis for Phyllanthus emblica Distributed in Dry-hot Valleys in Yunnan,China
文献类型:期刊文献
中文题名:云南干热河谷地区余甘子转录组分析
英文题名:Transcriptome Analysis for Phyllanthus emblica Distributed in Dry-hot Valleys in Yunnan,China
作者:刘雄芳[1] 李太强[1] 李正红[1] 万友名[1] 刘秀贤[1] 张序[1] 安静[1] 马宏[1]
第一作者:刘雄芳
机构:[1]中国林业科学研究院资源昆虫研究所
年份:2018
卷号:31
期号:5
起止页码:1-8
中文期刊名:林业科学研究
外文期刊名:Forest Research
收录:CSTPCD;;Scopus;北大核心:【北大核心2017】;CSCD:【CSCD2017_2018】;
基金:中央级公益性科研院所基本科研业务费专项(CAFYBB2016ZX003-2);"云南省技术创新人才"培养对象项目(2016HB007)
语种:中文
中文关键词:余甘子;转录组;Unigene;功能注释;编码序列;转录因子;抗性基因
外文关键词:Phyllanthus emblica;transcriptome;unigene;functional annotation;CDS : transcription factor;R-Gene
分类号:S722
摘要:[目的]对云南干热河谷地区余甘子转录组特征进行描述,旨在为余甘子微卫星标记的开发和功能基因的挖掘提供较全面的背景信息。[方法]采用Illumina Hiseq 4000测序平台对余甘子叶片进行转录组测序,对原始数据进行过滤、de novo组装及聚类去冗余等处理后,再与公共数据库进行比对,对Unigenes进行基本功能注释、CDS预测、TF编码能力预测及R-Gene预测等分析。[结果]本研究共获得10. 52 Gb的Clean reads,Q20、Q30分别为98. 47%、95. 28%。组装并去冗余后获得76 881条Unigenes,平均长度、N50分别为713、1 257 nt。通过与NR、COG、KEGG和Swiss Prot数据库进行比对,44 768条Unigenes获得功能注释。余甘子转录组Unigenes根据COG功能注释信息大致分为25类;按GO功能注释信息划分为生物学过程、细胞组分和分子功能3大类47亚类;参考KEGG注释信息,可归为6大代谢通路、21类代谢途径,其中约3/5为代谢相关通路。根据以上注释结果共检测出42 953个CDS,其余未比对上的Unigenes用ESTScan预测后得到2 058个CDS。同时,预测到56个TF家族以及18种RGene。[结论]本研究获得的余甘子转录组Unigenes序列的组装质量较高、完整性较好、基因丰富、功能多样,极大地扩充了余甘子基因信息库,为今后余甘子乃至叶下珠属植物功能基因挖掘、抗性机理分析、分子标记开发、分子辅助育种等研究提供了重要的基础数据。
[Objective]To provide comprehensive genetic information for the development of microsatellite markers and the mining of functional genes in Phyllanthus emblica by characterizing the transcriptome of P. emblica in dryhot valleys in Yunnan. [Method]Transcriptome sequencing was conducted on young leaves of Ph. emblica using Illumina Hiseq 4000,followed by filtering,de novo assembly and clustering. Sequence similarity analysis and annotation of the obtained Unigenes were performed based on databases like NCBI-non-redundant( NR) protein database,Gene Ontology( GO),Clusters of Orthologous Groups( COG),KEGG database,SwissProt,PlantTFDB,and PRGdb. [Result]In total,10. 52 Gb Clean reads with Q20 of 98. 47% and Q30 of 95. 28% were generated. A total of 76 881 Unigenes with an average length of 713 nt and N50 of 1 257 nt were obtained by de novo assembly and clustering with Clean reads. Out of them,44 768 Unigenes were functionally annotated against four protein databases. The Unigenes were roughly divided into 25 categories according to COG function,and were grouped into three functional categories( including biological processes,cellular components and molecular function) and 47 sub-cate-gories based on GO functional annotation. KEGG analysis showed that the Unigenes could be fallen into six categories and 21 metabolic pathways,of which about 3/5 were Metabolism. A total of 42 953 CDS were detected based on the results of functional annotation,and 2 058 CDS were predicted using ESTScan with the remaining Unigenes.And 56 Transcription Factor families and 18 resistance genes were predicted. [Conclusion]The Unigenes of transcriptome in Ph. emblica show high quality,good integrality,abundant genes and various functions,which could lay an important foundation for further study of functional gene excavation,resistance mechanism analysis,molecular marker development and molecular assisted breeding of Ph. emblica and other congeneric species.
参考文献:
正在载入数据...