Abstract：Human microbial composition and function changes have an important impact on their phenotype or disease. When studying the association of microorganisms with human phenotypes or diseases, not only the individual microbial dynamics, but also the overall impact of the community at the taxonomic level should be considered. In this work, a method of regression analysis of microbial substructure based on tree-based LASSO was proposed to analyze the correlation between microbial community and human phenotype. First, a new penalty function was constructed based on phylogenetic tree structure, and the tree structure is analyzed node by node. Second, 148 samples were tested for complex and sparse substructure regression and coefficient evaluation. The regression results of strains in different substructures were analyzed and compared with the traditional LASSO method. The results showed that this method could highlight the tree structure of microbial communities. The regression coefficients of this method on test nodes were 0.122 and 0.127, which were better than those of the traditional LASSO method (0.106 and 0.118). The advantage of this method in identifying microbial structure was verified. In conclusion, the method could better analyze the association between microbial communities and human phenotypes or diseases.
许小敏, 林勇. 基于Tree-Based LASSO的微生物组子结构回归分析[J]. 中国生物医学工程学报, 2020, 39(1): 40-49.
Xu Xiaomin, Lin Yong. Regression Analysis of Microbial Substructure Based on Tree-Based LASSO. Chinese Journal of Biomedical Engineering, 2020, 39(1): 40-49.
 刘双江,施文元,赵国屏. 中国微生物组计划:机遇与挑战[J]. 中国科学院院刊,2017,32(3):241-250.  Yatsunenko T,Rey FE,Manary MJ,et al. Human gut microbiome viewed across age and geography[J]. Nature,2012,486(7402):222-227.  邓红,吴纯启,江涛,等. 肠道微生物组及其在中药药理毒理研究中的应用[J]. 中国药理学与毒理学杂志,2016, 30(9):975-982.  田芳云,黄婷婷,黄光武,等. 16 SrRNA基因序列分析在人体微生物组学研究中的进展及应用[J]. 中国老年学,2014, 34(15):4396-4398.  张泽,刘翠花,赵晓航. 人类肠道微生物组与相关疾病研究进展[J]. 生命科学,2014,26(7):768-772.  Tang Zhengzheng,Chen Guanhua,Alekseyenko AV,et al. PERMANOVA-S: Association test for microbial community composition that accommodates confounders and multiple distances. Bioinformatics[J]. 2016,32(17):2618-2625.  黄循柳,黄仕杰,郭丽琼,等. 宏基因组学研究进展[J]. 微生物学通报,2009,36(7):1058-1066.  Lozupone C,Lladser ME,Knights D,et al. UniFrac: an effective distance metric for microbial community comparison[J]. ISME JOURNAL,2010,5(2):169-172.  Plantinga A,Xiang Z,Ni Z,et al. MiRKAT-S: a community-level test of association between the microbiota and survival times[J]. Microbiome,2017,5(1):17.  孙啸,陆祖宏,谢建明. 生物信息学基础[M]. 北京:清华大学出版社,2006.  房位昊,刘强,王肖南,等. 基于LASSO算法的中风病综合治疗方案优化的初步研究[J]. 中华中医药杂志,2018, 33(8):3540-3543.  Wang Tao. Structured subcomposition selection in regression and its application to microbiome data analysis[J]. 2017,11(2):771-791.  Tang Zhengzheng,Chen Guanhua,Alekseyenko AV,et al. A general framework for association analysis of microbial communities on a taxonomic tree. Bioinformatics[J]. 2017,33(9):1278-1285.  李东萍,郭明璋,许文涛. 16S rRNA测序技术在肠道微生物中的应用研究进展[J]. 生物技术通报,2015,31(2):71-77.  Roat KJ,Shinichi S,Junhua L,et al. MOCAT: A metagenomics assembly and gene prediction toolkit[J]. PLoS ONE,2012,7(10):e47656.  Gilbert JA,Quinn RA,Debelius J,et al. Microbiome-wide association studies link dynamic microbial consortia to disease[J]. Nature,2016,535(7610):94-103.  Chen Jun,Li Hongzhe. Variable selection for sparse Dirichlet-multinomial regression with an application to microbiome data analysis[J].Annals of Applied Statistics,2013,7(1):418-442.  Chong Wu,Jun Chen,Junghi Kim,et al. An adaptive association test for microbiome data[J]. Genome Medicine,2016,8(1):1-12.  Have T,Thomas R. Structural and sampling zeros[M]. Encyclopedia of biostatistics. New York:John Wiley & Sons,Ltd,2005.  Lei Jing. Variable selection in regression with compositional covariates[J]. Biometrika,2014,101(4):785-797.  许正. 零膨胀数据的几种模型及应用[D]. 扬州:扬州大学,2014.