An Improved AdaBoost Cascade Classifier for Identifying Breath Signals of Liver Cancer
Hao Lijun1,2#, Zhu Geng1, Huang Gang3*, Yan Jiayong1,2*
1(Medical Instrumentation College, Shanghai University of Medicine & Health Sciences, Shanghai 201318, China) 2(School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai 200093,China) 3(Shanghai Key Laboratory of Molecular Imaging, Jiading District Central Hospital Affiliated Shanghai University of Medicine and Health Sciences, Shanghai 201318, China)
Abstract:To reduce false negative rate of breath detection techniques in liver cancer screening, an improved AdaBoost cascade classifier was designed and applied to discriminate breath signals from healthy volunteers and liver cancer patients. First, a set of training subsets was obtained by self-help division of training samples. Based on the training subset, multiple sub-classifiers were successively obtained using different machine learning algorithms with K-fold cross-training and voting method. Next, multiple sub-classifiers were weighted and combined to obtain an improved AdaBoost classifier. Then, the training samples were self-subdivided and trained again with a new training subset to obtain another AdaBoost classifier. Finally, the two AdaBoost classifiers were concatenated in tandem to form a cascade classifier. After the test samples were fed into this cascade classifier, potentially anomalous samples were repeatedly screened according to the cascade rule. In this study, therelief-optimized feature set of the breath signals of 120 volunteers collected by the electronic nose (eNose) was used as the training sample to construct an improved AdaBoost cascade classifier and to discriminate the 40 test samples. The results showed that the classifier effectively distinguished the exhaled breath signals of liver cancer patients and healthy people in the test group, and the average sensitivity reached 93.42%, which was significantly better than the traditional AdaBoost cascade classifier, and the false negative rate was significantly reduced. In addition, the stability of this cascade classifier was good, and the coefficient of variation of the precision was only 3.95%. In conclusion, the improved AdaBoost cascade classifier effectively improved the classifier′s discrimination accuracy of liver cancer breath signals, which was important for the study ofbreath-based noninvasive universal screening for liver cancers.
[1] Ferlay J, Colombet M, Soerjomataram I, et al. Cancer statistics for the year 2020: an overview[J]. International Journal of Cancer, 2021,149(2):778-789. [2] 中国抗癌协会肝癌专业委员会. 中国肿瘤整合诊治指南(CACA)-肝癌部分[J]. 肿瘤综合治疗电子杂志, 2022, 8(3):31-63. [3] 王坤,张学良,张岁霞,等. 基于机器学习方法的肝癌X射线相衬CT图像分类研究[J]. 中国生物医学工程学报,2020,39(5):621-625. [4] Tilborg A, Scheffer HJ, Nielsen K, et al. Transcatheter CT arterial portography and CT hepatic arteriography for liver tumor visualization during percutaneous ablation[J]. Journal of Vascular and Interventional Radiology, 2014,25(7): 1101-1111. [5] Hagstrm H, Thiele M, Sharma R, et al. Risk of cancer in biopsy-proven alcohol-related liver disease: a population-based cohort study of 3410 persons[J]. Clinical Gastroenterology and Hepatology, 2022,20(4) :918-929. [6] 侯玉丽,姜菲菲,王颖,等. AFP 在早期肝癌诊断中的临床价值研究[J]. 中国现代医学杂志,2018,28(5):92-96. [7] Mazzone PJ. Analysis of volatile organic compounds in the exhaled breath for the diagnosis of lung cancer[J] Journal of Thoracic Oncology,2008,3(7):774-780. [8] Oakley-Girvan I, Sharon-Watkins D. Breath-based volatile organic compounds in the detection of breast, lung, and colorectal cancers: a systematic review[J]. Cancer Biomarkers,2017,21(1):29-39. [9] Ke Y, Zhang D. A novel breath analysis system for diabetes diagnosis [C]// 2012 International Conference on Computerized Healthcare (ICCH), Hong Kong: IEEE, 2012: 166-170. [10] Kitiyakara T, Redmond S, Unwanatham N, et al. The detection of hepatocellular carcinoma (HCC) from patients’breath using canine scent detection: a proof-of-concept study[J]. Journal of Breath Research, 2017,11(4) :0460021. [11] 秦涛.肝癌患者呼气中挥发性标志物的定量分析研究与呼气诊断函数模型的建立[D].合肥:安徽医科大学,2009. [12] Germanese D, Colantonio S, D'Acunto M, et al. An e-nose for the monitoring of severe liver impairment: a preliminary study[J]. Sensors, 2019,19(17):3656. [13] Kanaparthi S, Singh SG. Discrimination of gases with a single chemiresistive multi-gas sensor using temperature sweeping and machine learning[J]. Sensors and Actuators B: Chemical, 2021,348: 130725-130732. [14] Hiraga K, Takeuchi M, Kimura T, et al. Prediction models for in-hospital deaths of patients with COVID-19 using electronic healthcare data[J].Current medical research and opinion, 2023,39(11):1463-1471. [15] Wongwattanaporn S, Phienthrakul T. Machine learning for explosive detection from electronic nose datasets[C]// 2021 13th International Conference on Knowledge and Smart Technology (KST). Bangsaen: IEEE, 2021:214-218. [16] Hendrick H, Hidayat R, Horng GJ, et al. Non-invasive method for tuberculosis exhaled breath classification using electronic nose[J]. IEEE Sensors Journal, 2021, 21(9): 11184-11191. [17] 李勇,陈思萱,贾海,等. 基于C-AdaBoost模型的乳腺癌预测研究[J]. 计算机工程与科学,2020,42(8):1414-1422. [18] Chen G, Hong Z, Guo Y, et al. A cascaded classifier for multi-lead ECG based on feature fusion[J]. Computer Methods and Programs in Biomedicine, 2019, 178:135-143. [19] Haghighi F, Omranpour H. Stacking ensemble model of deep learning and its application to Persian/Arabic handwritten digits recognition[J]. Knowledge-Based Systems, 2021,220(23) :106940. [20] 任涛,林梦楠,陈宏峰,等. 基于Bagging集成学习算法的地震事件性质识别分类[J]. 地球物理学报,2019,62(1):383-392. [21] 郝丽俊, 黄钢. 基于电子鼻的呼气无创肝癌检测方法研究[J]. 传感器与微系统, 2020, 39(4):46-48. [22] Hao Lj, Zhang M, Huang G. Feature optimization of exhaled breath signals based on Pearson-BPSO[J]. Mobile Information Systems, 2021, 2021(7):1-9. [23] Ghosh P, Azam S, Jonkman M, et al. Efficient prediction of cardiovascular disease using machine learning algorithms with relief and lasso feature selection techniques[J]. IEEE Access, 2021, 9: 19304-19326. [24] Jeon YS, Yang DH, Lim DJ. FlexBoost: a flexible boosting algorithm with adaptive loss functions[J]. IEEE Access, 2019, (7): 125054-125061. [25] Gupta V, Mittal M. KNN and PCA classifier with autoregressive modelling during different ECG signal interpretation[J]. Procedia Computer Science, 2018, 125:18-24. [26] Ghavimi S. A novel backward stepwise logistic regression and classification and regression tree model to predict 180-day clinical outcomes in hepatitis b virus-acute-on-chronic liver failure patients[J]. Journal of Clinical and Translational Hepatology, 2021, 9(4):456-457. [27] Gerhardt N, Schwolow S, Rohn S, et al. Quality assessment of olive oils based on temperature-ramped HS-GC-IMS and sensory evaluation: comparison of different processing approaches by LDA, KNN, and SVM[J]. Food Chemistry, 2019, 278:720-728. [28] 秦喜文,吕思奇,李巧玲. 利用整体经验模态分解和随机森林的脑电信号分类研究[J]. 中国生物医学工程学报,2018,37(6):665-672. [29] 唐徙文,曾义. 人脸检测级联分类器快速训练算法[J]. 计算机仿真,2007,24(12):324-327. [30] Zhang D, Luo X. A heterogeneous AdaBoost ensemble based extreme learning machines for imbalanced data[J]. International Journal of Cognitive Informatics and Natural Intelligence, 2019, 13(3):19-35. [31] Bauer E, Kohavi R. An empirical comparison of voting classification algorithms: bagging, boosting, and variants[J]. Machine Learning, 1999, 36(1-2):105-139. [32] Zhang YP, Wang L, Liu C, et al. The screening of VOCs in the breath of diffuse large B-cell lymphoma patients [J]. Acta Universitatis Medicinalis Anhui,2016,51(8):1204-1207. [33] Mahabub A. A robust technique of fake news detection using ensemble voting classifier and comparison with other classifiers[J]. SN Applied Sciences, 2020,2(4): 1-9. [34] Fawzi A, Fawzi O, Frossard P. Analysis of classifiers′ robustness to adversarial perturbations[J]. Machine Learning, 2018, 107(3): 481-508.