Classification of Breast Cancer Gene Data Based on gcForest
Qin Xiwen1,2*, Wang Rui1,3, Zhang Siqi1,3
1(Institute of Big Data Science, Changchun University of Technology, Changchun 130012, China) 2(Graduate School, Changchun University of Technology, Changchun 130012, China) 3(School of Mathematics and Statistics, Changchun University of Technology, Changchun 130012, China)
Abstract:The classification of breast cancer gene data is of great importance in clinical medicine. Aiming at the characteristics of complex structure, high-dimensional and small samples of gene data, this paper proposes a gene data classification method based on the max-relevance and min-conditional redundancy ( mRMCR ) and multi-grained cascade forest ( gcForest ). A total of 98 data were selected from the breast cancer gene expression data set of theBroad Gene Research Institute, and each sample contained 1 213 characteristic genes. Firstly, the data are standardized, then the feature subsets are selected by using the max-relevance and min-conditional redundancy , and finally the feature subsets are classified by the gcForest. Taking random forest, support vector machine and BP neural network as comparison methods, the results show that the best classification accuracy of the proposed combination method of mRMCR and gcForest is 93.78%, which is obviously better than other methods. This method can effectively improve the classification accuracy of breast cancer gene data, and has important theoretical significance and practical value for breast cancer classification based on gene data.
秦喜文, 王芮, 张斯琪. 基于深度级联森林的乳腺癌基因数据分类研究[J]. 中国生物医学工程学报, 2022, 41(2): 177-185.
Qin Xiwen, Wang Rui, Zhang Siqi. Classification of Breast Cancer Gene Data Based on gcForest. Chinese Journal of Biomedical Engineering, 2022, 41(2): 177-185.
[1] 刘旭东. 互信息去冗余与多种分类模型结合的癌症分类问题研究[D]. 长沙: 湖南大学, 2018. [2] 谢东迅. 基于邻域互信息的优化特征基因选择研究[D]. 长沙: 湖南大学, 2018. [3] 陈俊颖. 特征选择算法在基因表达数据分类中的应用[D]. 杭州: 中国计量大学, 2018. [4] 郭园园. 基于互信息的信息基因选择算法研究[D]. 湘潭: 湘潭大学, 2018. [5] Kong YC, Yu TW. A deep neural network model using random forest to extract feature representation for gene expression data classification [J]. Scientific Reports, 2018, 8(1):1-9. [6] 梁壮. 基于Boosting的基因表达数据分类[D]. 西安: 西安电子科技大学, 2019. [7] 高振斌. 基于最小二乘支持向量机微阵列基因特征分类[J]. 计算机应用与软件, 2019, 36(8): 288-292. [8] 范怡敏, 齐林, 帖云. 基于基因表达小样本数据的级联森林分类模型[J]. 计算机应用与软件, 2020, 37(11): 165-171. [9] 颜建军, 刘章鹏, 刘国萍, 等. 基于深度森林算法的慢性胃炎中医证候分类[J]. 华东理工大学学报(自然科学版), 2019, 45(4): 593-599. [10] Cover TM., Thomas JA. Elements of Information Theory[M]. John Wiley & Sons Inc, 2005. [11] Yang HH, Moody J. Feature Selection based on joint mutual information[C]//Proceedings of International ICSC Symposium on Advances in Intelligent Data Analysis. New York:Springer, 1999: 22-25. [12] Freeman C, Kulić D, Basir O. An evaluation of classifier-specific filter measure performance for feature selection [J]. Pattern Recognition, 2015, 48(5): 1812-1826. [13] Mohamed B, Yulia H, Rossitza S. Feature selection using joint mutual information maximization [J]. Expert Systems with Applications, 2015, 42(22): 8520-8532. [14] Wang Zhichun, Li Minqiang, Li Juanzi. A multi-objective evolutionary algorithm for feature selection based on mutual information with a new redundancy measure [J]. Information Sciences, 2015, 307:73-88. [15] Battiti R. Using mutual information for selecting features in supervised neural net learning [J]. IEEE Transactions on Neural Networks, 1994, 5(4): 537-550. [16] Peng H, Long F, Ding C. Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(8): 1226-1238. [17] 车金星. 杂数据的变量选择与预测方法研究[D]. 成都: 电子科技大学, 2019. [18] Zhou Zhihua, Feng Ji. Deep forest [J]. National Science Review, 2019, 6(1): 74-86. [19] Breiman L. Random forest [J]. Machine Learning, 2001, 45(1): 5-32. [20] Cortes C, Vapnik V. Support-vector networks [J]. Machine Learning, 1995, 20(3): 273-297. [21] Deng L, Yu D. Deep learning: methods and applications [J]. Foundations & Trends in Signal Processing, 2014, 7(3):214-226. [22] Akhand M, Asaduzzaman M, Hussain M, et al. Cancer classification from DNA microarray data using mRMR and artificial neural network [J]. International Journal of Advanced Computer Science and Applications (IJACSA), 2019, 10(7):106-111. [23] Jo I, Lee S, Oh S. Improved measures of redundancy and relevance for mRMR feature selection [J]. Computers, 2019, 8(2):42-42. [24] 陈昊楠, 金敏. 基于特征交互与权重集成的癌症分类方法[J]. 计算机应用研究, 2021, 38(4): 1051-1057. [25] 刘超, 吴申, 郑一超, 等. 基于深度森林和DNA甲基化的癌症分类研究[J]. 计算机工程与应用, 2020, 56(13): 189-193.