Abstract:The key of genetic system modeling is to identify the causal relationships of the genes. In the third Dialogue for Reverse Engineering Assessments and Methods(DREAM3)competition, E.coli dataset was generated with a ‘true’ biological gene networks. The aim of this work is to recover gene network structure from the data. Here we presented a statistical independent measurement method based on reproducing kernel Hilbert space (RKHS) - HilbertSchmidt independence criteria (HSIC). Different from others, which either use the classification rate, or parameterized methods,the proposed measurement is a nonparametric direct measurement with independence. Comparative experiment results showed that the method was efficient in recovering the regulatory relationships between genes even with small data sample. Specifically, the HSIC achieved a better result than the classical Granger Causality (GC) method as well as the differential equations based method, which was the best in DREAM3 contest. The AUROC values obtained by HSIC is 23 percent higher than GC method, and 3.9 percent higher than the best performer of this contest. In addition, the computational efficiency of HSIC method was 3 orders higher than differential equations based method.
樊双喜**韩斌**厉力华*祝磊 金丽艳 李颜娥 王晟 应南娇. 基于统计独立性度量方法的大肠杆菌基因调控网络结构辨识[J]. 中国生物医学工程学报, 2013, 32(2): 141-148.
FAN ShuangXi **HAN Bin**LI Li Hua*ZHU Lei JIN Li Yan LI YanE WANG Sheng YING NanJiao. Identifying E.coli Gene Regulatory Network Based on HilbertSchmidt Independence Criterion. journal1, 2013, 32(2): 141-148.
[1]Gibbs RA, Belmont JW, Hardenbol P, et al. The international HapMap project [J]. Nature, 2003, 426(6968): 789-796.
[2]Zhang Y, AbuKhzam FN, Baldwin NE et al. Genomescale computational approaches to memoryintensive applications in systems biology [C] // William Kramer, eds. Proceedings of the 2005 ACM/IEEE Conference on Supercomputing.Washington: IEEE ,2005:12.
[3]Kamiński M, Mingzhou D, Wilson A ,et al. Evaluating causal relations in neural systems: Granger causality, directed transfer function and statistical assessment of significance [J]. Biological Cybernetics, 2001, 85(2): 145-157.
[4]Granger CWJ. Investigating causal relations by econometric models and crossspectral methods [J]. Econometrica: Journal of the Econometric Society, 1969: 424-438.
[5]JirapechUmpai T, Aitken S. Feature selection and classification for microarray data analysis: Evolutionary methods for identifying predictive genes [J]. BMC Bioinformatics, 2005 6(1): 148.
[6]Zheng P,Griswold MD,Hassold TJ, et al.Predicting meiotic pathways in human fetal oogenesis [J]. Biology of Reproduction, 2010, 82(3):543-551.
[7]Silvescu A, Honavar V. Temporal boolean network models of genetic networks and their inference from gene expression time series [J]. Complex Systems, 2001, 13(1): 61-78.[8]Xiong H, Choe Y. Structural systems identification of genetic regulatory networks [J]. Bioinformatics, 2008 24(4): 553-560.
[9]Murphy K, Mian S. Modelling gene expression data using dynamic Bayesian networks [J]. Bioinformatics, 2008 24 (4): 553-560.
[10]Chen T, He HL, Church GM. Modeling gene expression with differential equations [C]// Lauderdale K, eds.Pacific Symposium on Biocomputing.Singapore:World Scientific, 1999:29-40.
[11]Fukumizu K, Bach FR, Jordan MI. Dimensionality reduction for supervised learning with reproducing kernel Hilbert spaces [J]. Journal of Machine Learning Research, 2004: 73-99.
[12]Fukumizu K, Bach FR, Jordan MI. Kernel dimension reduction in regression [J]. Annals of Statistics,2009,37(4):1871-1905.
[13]Li KC. Sliced inverse regression for dimension reduction [J]. Journal of the American Statistical Association, 1991 86(414): 316-327.
[14]Li KC. On principal Hessian directions for data visualization and dimension reduction: another application of Stein's lemma [J]. Journal of the American Statistical Association, 1992, 87(420): 1025-1039.
[15]Everitt B, Howell D. Encyclopedia of statistics in behavioral science[M]//Thompson B. Canonical Correlation Analysis. Hoboken:Wiley Online Library,2005.
[16]Marbach D, Schaffter T, Mattiussi C, et al. Generating realistic in silico gene networks for performance assessment of reverse engineering methods [J]. Journal of Computational Biology, 2009, 16(2): 229-239.
[17]Gretton A, Bousquet O, Smola A. Measuring statistical dependence with HilbertSchmidt norms [M]// Jain S, Simon HU, Tomita E. Algorithmic Learning Theory. Berlin:Springer Berlin Heidelberg,2005:63-77
[18]Gretton A, Fukumizu K, Teo CH, et al. A kernel statistical test of independence [C] //Schlkopf B. Advances in Neural Information Processing Systems 20: Proceedings of the 2007 Conference (2008). Seattle: MIT Press, 2008:585-592.
[19]Kankainen A, Ushakov NG. Consistent testing of total independence based on the empirical characteristic function [J]. Journal of Mathematical Sciences,1998:1486-1494
[20]Yip KY, Alexander RP, Yan KK, et al. Improved reconstruction of in silico gene regulatory networks by integrating knockout and perturbation data [J]. PloS One, 2010, 5(1): e8121.
[21]Gretton A. A kernel method for the twosample problem [J]. Journal of Machine Learning Research 1,2008.
[22]Prill RJ, Marbach D, SaezRodriguez J, et al. Towards a rigorous assessment of systems biology models: the DREAM3 challenges [J]. PloS One, 2010, 5(2): e9202.[23]Acharya LR, Judeh T, Wang G, et al. Optimal structural inference of signaling pathways from unordered and overlapping gene sets [J]. Bioinformatics, 2012,28(4): 546-556.