Abstract:RNA-Seq is a new experimental technique for trancriptome research based on highthroughput sequencing. It is increasingly used in the research of alternative splicing variation. There are two difficulties in the analysis of RNA-Seq data. One is readisoform multimapping, the other is nonuniform distribution of reads along the gene reference sequence. This paper proposed a new method, so called LDAseq, to calculate isoform expression level based on LDA (Latent Dirichlet Allocation) commonly used to model text corpora. LDAseq utilized the known geneisoform annotation to constrain the hyperparameters for dealing with readisoform multimapping. To modeling the non-uniform distribution of reads along reference sequence, LDAseq introduced “probes” with fixed length to break up the long reference sequence. We applied LDAseq to a mouse dataset and a human breast cancer dataset, and compared the performance of LDAseq with currently used alternatives, such as Cufflinks and RSEM. Results showed that the computation accuracy of LDAseq was increased 75.5% and 62.8% compared with that of Cufflinks and RSEM, respectively.
[1]Pan Qun, Shai Ofer, Lee LJ, et al. Blencowe BJ. Deep surveying of alternative splicing complexity in the human transcriptome by highthroughput sequencing [J]. Nature Genetices, 2008, 40(12): 1413-1415.
[2]Skotheim RI, Nees M. Alternative splicing in cancer: noise, functional, or systematic? [J]. The International Journal of Biochemistry and Cell Biology, 2007, 39: 1432-1449.
[3]Wang Zhong, Gerstein M, Snyder M. RNASeq: a revolutionary tool for transcriptomics [J]. Nature Reviews Genetics, 2009, 10(1): 57-63.
[4]王曦, 汪小我, 王立坤, et al. 新一代高通量RNA测序数据的处理与分析[J]. 生物化学与生物物理进展, 2010, 37(8): 837-846.
[5]Turro E, Su ShuYi, Goncalves , et al. Haplotype and isoform specific expression estimation using multimapping RNAseq reads [J]. Genome biology, 2011, 12: R13.
[6]Mortazavi A, Williams BA, McCue K, et al. Mapping and quantifying mammalian transcriptomes by RNASeq [J]. Nature Methods, 2008, 5(7): 621-628.
[7]Jiang Hui, Wong Wing Hung. Statistical inferences for isoform expression in RNASeq [J]. Bioinformatics, 2009, 25(8): 1026-1032.
[8]Kim H, Bi Yingtao, Pal S, et al. IsoformEx: isoform level gene expression estimation using weighted nonnegative least squares from mRNASeq data [J]. BMC Bioinformatics, 2011, 12: 305.
[9]Li Bo, Ruotti V, Stewart R.M, et al. RNASeq gene expression estimation with read mapping uncertainty [J]. Bioinformatics, 2010, 26(4): 493-500.
[10]Li Bo, Dewey CN. RSEM: accurate transcript quantification from RNASeq data with or without a reference genome [J]. BMC Bioinformatics, 2011, 12: 323.
[11]Katz Y, Wang Eric T, Airoldi EM, et al. Analysis and design of RNA sequencing experiments for identifying isoform regulation [J]. Nature Methods, 2010, 7: 1009-1015.
[12]Glaus P, Honkela A, Rattray M. Identifying differentially expressed transcripts from RNASeq data with biological variation [J]. Bioinformatics, 2012, 28(3): 1721-1728.
[13]Pepke S, Wold B, Mortazavi A. Computation for ChIPseq and RNASeq studies [J]. Nature Methods Supplement, 2009, 6:S22-S32.
[14]Li Jun, Jiang Hui, Wong Wing Hung. Modeling nonuniformity in shortread rates in RNASeq data [J]. Genome Biology, 2010, 11: R50.
[15]Srivastava S, Chen Liang. A twoparameter generalized Poisson model to improve the analysis of RNASeq data [J]. Nucleic Acids Research, 2010, 38(17): e170.
[16]Wu Zhengpeng, Wang Xi, Zhang Xuegong. Using nonuniform read distribution models to improve isoform expression inference in RNASeq [J]. Bioinformatics, 2011, 27(4): 502-508.
[17]Roberts A, Trapnell C, Donaghey J, et al. Improving RNASeq expression estimates by correcting for fragment bias [J]. Genome Biology, 2011, 12: R22.
[18]Blei DM, Ng AY, Jordan MI. Latent Dirichlet Allocation [J]. Journal of Machine Learning Research, 2003, 3:993-1022.
[19]赵志兰, 刘学军, 张礼. 一种基于概率模型的RNASeq数据分析方法[C]// 2011中国生物医学工程联合学术年会论文集(光盘版). 武汉: 中国生物医学工程学会,2011.
[20]Langmead B, Trapnell C, Pop M, et al. Ultrafast and memoryefficient alignment of short DNA sequences to the human genome[J]. Genome Biology, 2009, 10(R25).
[21]Li Ruiqiang, Yu Chang, Li Yingrui, et al. SOAP2: an improved ultrafast tool for short read alignment [J]. Bioinformatics, 2009, 25(15): 1966-1967.
[22]Wang ET, Sandberg R, Luo Shujun, et al. Alternative isoform regulation in human tissue transcriptomes [J]. Nature, 2008, 456(7221): 470-476.