A Stroke Mortality Prediction Model Based onCausal Features
Wang Ziyang1, Yang Lin1,2, Li Jiao1,2
1(Institute of Medical Information, Chinese Academy of Medical Sciences / Peking Union Medical College, Beijing 100020, China) 2(Key Laboratory of Medical Information Intelligence Technology, Chinese Academy of Medical Sciences, Beijing 100020, China)
Abstract:The aim of this work was to apply causal learning methods for selecting causal features to enhance the robustness and generalizability of model predictions. The MIMIC database was used as the data source. A stroke outcome prediction method that integrated causal features was proposed. This method applied greedy equivalence search (GES) to generate causal diagrams, selected causal features through the theory of Markov boundaries, and used the features for classifiers to obtain the final probability of death risk. The performance of causal feature selection was evaluated compared to baseline feature selection methods using classification metrics such as the area under the ROC curve (AUROC) and the F1 score.Based on 6 021 stroke records from the MIMIC database. Causal features of stroke death of 26 in the training set were selected using the causal feature selection method, achievingthe AUROC of 0.9 in the test set and the AUROCof 0.83 in the external validation data, all of which were better than that obtained from the baseline method. In the prediction of stroke death,our proposed feature selection methodhas better prediction performance, robustness and generalization thanthat of the commonly used feature selection method. The use of causal networks can uncover the potential causal relationship between features and stroke death.
王梓阳, 杨林, 李姣. 基于因果特征的卒中死亡预测模型[J]. 中国生物医学工程学报, 2025, 44(3): 312-324.
Wang Ziyang, Yang Lin, Li Jiao. A Stroke Mortality Prediction Model Based onCausal Features. Chinese Journal of Biomedical Engineering, 2025, 44(3): 312-324.
[1] Hendrikus J, Ramos L, Hilbert A, et al. Predicting outcome of endovascular treatment for acute ischemic stroke: potential value of machine learning algorithms[J]. Frontiers in Neurology, 2018, 9(1):784-791. [2] Liu Chang, Sun Xinwei, Wang Jindong, et al. Learning causal semantic representation for out-of-distribution prediction[C]//Neural Information Processing Systems. New York: Association for Computing Machinery, 2021: 3562-3571. [3] Cai Ruichu, Zhang Zhenjie, Hao Zhifeng. BASSUM: a Bayesian semi-supervised method for classification feature selection[J]. Pattern Recognition, 2011, 44(4): 811-820. [4] 吴兴宇. 基于因果关系的特征选择研究[D]. 合肥:中国科学技术大学, 2023. [5] Wang Ziyang, Lan Yushan, Xu Zidu, et al. Comparison of mortality predictive models of sepsis patients based on machine learning[J]. Chinese Medical Sciences Journal, 2022, 37(3): 201-209. [6] Yu Kui, Guo Xinjie, Liu Lin, et al. Causality-based feature selection: methods and evaluations[J]. ACM 计算调查, 2020, 53(5): 1-36. [7] 李家宁, 熊睿彬, 兰艳艳, 等. 因果机器学习的前沿进展综述[J]. 计算机研究与发展, 2023, 60(1): 59-84. [8] 郭若城, 程璐, 刘昊, 等. 因果推断与机器学习[M]. 北京: 电子工业出版社, 2023:20-21. [9] Berkhemer O, Fransen P, Beumer D, et al. A randomized trial of intraarterial treatment for acute ischemic stroke[J]. New England Journal of Medicine, 2015, 372(1): 11-20. [10] Tao Chunrong, Li Rui, Zhu Yuyou, et al. Endovascular treatment for acute basilar artery occlusion: a multicenter randomized controlled trial[J]. International Journal of Stroke, 2022, 17(7): 815-819. [11] Kent D, Steyerberg E, Van D. Personalized evidence based medicine: predictive approaches to heterogeneous treatment effects[J]. The BMJ, 2018,10: 363-384. [12] 王梓阳, 杨林, 王嘉阳, 等. 基于电子病历的表型分析方法及应用综述[J]. 中华医学图书情报杂志, 2022,1(31): 38-48. [13] 况琨, 李廉, 耿直, 等. 因果推理[J]. Engineering, 2020, 6(3): 107-130. [14] Kaddour J, Lynch A, Liu Q, et al. Causal Machine Learning: A Survey and Open Problems[M]. arXiv, 2022. https://doi.org/10.48550/ARXIV.2206.15475, 2022-06-30/2024-06-18. [15] Yaramakala S, Margaritis D. Speculative Markov blanket discovery for optimal feature selection[C]//The Fifth IEEE International Conference on Data Mining (ICDM’05). Houston:IEEE, 2005: 4-4. [16] 白骏. 基于深度学习的因果关系发现方法研究[D]. 北京:北京工业大学, 2020. [17] Shimizu S, Hoyer P, Hyvärinen A, et al. A linear non-Gaussian acyclic model for causal discovery[J]. The Journal of Machine Learning Research, 2006, 7: 2003-2030. [18] Hoyer P, Janzing D, Mooij J, et al. Nonlinear causal discovery with additive noise models[C]//Proceedings of the 21st International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2008: 689-696. [19] Zhang Kun, Hyvärinen A. On the identifiability of the post-nonlinear causal model[C]//Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence. Arlington: AUAI Press, 2009: 647-655. [20] Chickering D. Optimal structure identification with greedy search[J]. The Journal of Machine Learning Research, 2003, 3: 507-554. [21] Yu Kui, Guo Xianjie, Liu Lin, et al. Causality-based feature selection: methods and evaluations[J]. ACM Computing Surveys, 2020, 53(5): 1-36. [22] Wang Xueyang, Lyu Jintao, Meng Zhihua, et al. Small vessel disease burden predicts functional outcomes in patients with acute ischemic stroke using machine learning[J]. CNS Neuroscience & Therapeutics, 2023, 29(4): 1024-1033. [23] Cheon S, Kim J, Lim J. The use of deep learning to predict stroke patient mortality[J]. International Journal of Environmental Research and Public Health, 2019, 16(11): 1876-1876. [24] Lin S, Law K, Yeh Y, et al. Applying machine learning to carotid sonographic features for recurrent stroke in patients with acute stroke[J]. Frontiers in Cardiovascular Medicine, 2022, 9: 410. [25] Zhu Enzhao, Chen Zhihao, Ai Pu, et al. Analyzing and predicting the risk of death in stroke patients using machine learning[J]. Frontiers in Neurology, 2023, 14: 153-153. [26] Rubin DB. Estimating causal effects of treatments in randomized and nonrandomized studies[J]. Journal of Educational Psychology, 1974, 66(5): 688-701. [27] Johnson AEW, Stone DJ, Celi LA, et al. The MIMIC Code Repository: enabling reproducibility in critical care research[J]. JAMIA, 2018,25(1):32-39. [28] Johnson A, Pollard T, Mark R. MIMIC-III, a freely accessible critical care database[J]. Sci Data, 2016, 3:160035. [29] Johnson A, Pollard T, Horng S, et al. MIMIC-IV, a freely accessible electronic health record dataset[J]. Sci Data. 2023, 10(1):1. [30] Liu Ping, Luo Su, Duan Xiangjie, et al. RDW-to-ALB ratio is an independent predictor for 30-day all-cause mortality in patients with acute ischemic stroke: a retrospective analysis from the MIMIC-IV Database[J]. Behavioural Neurology, 2022, 2022: 11. [31] Jhou Hongjie, Chen Pohuang, Yang Liyu, et al. Plasma anion gap and risk of in-hospital mortality in patients with acute ischemic stroke: analysis from the MIMIC-IV Database[J]. Journal of Personalized Medicine, 2021, 11(10): 1004. [32] Wu Xingyu, Jiang Bingbing, Zhong Yan, et al. Multi-target markov boundary discovery: theory, algorithm, and application[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(4): 4964-4980. [33] Tsamardinos I, Aliferis C. Towards principled feature selection: relevancy, filters and wrappers[C]//International Conference on Artificial Intelligence and Statistics. Key West: PMLR, 2003: 300-307. [34] Imbens G, Rubin D. Causal inference for statistics, social, and biomedical sciences: an introduction[J]. Revue Internationale de Statistique, 2016, 84(1): 159-159. [35] Kalainathan D, Goudet O. Causal Discovery Toolbox: uncover causal relationships in Python[J]. Journal of Machine Learning Research, 2020,21(37): 1-5. [36] Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit-learn: machine learning in Python[J]. Journal of Machine Learning Research, 2011, 12(85): 2825-2830. [37] Wang Wenzhi, Jiang Bin, Sun Haixin, et al. Prevalence, incidence, and mortality of stroke in China: results from a nationwide population-based survey of 480 687 adults[J]. Circulation, 2017, 135(8): 759-771. [38] Wu Anpeng, Kuang Kun, Xiong Ruoxuan, et al. Learning instrumental variable from data fusion for treatment effect estimation[C]//AAAI'23. Washington DC: AAAI Press, 2023: 10324-10332. [39] Tesei G, Giampanis S, Shi J, et al. Learning end-to-end patient representations through self-supervised covariate balancing for causal treatment effect estimation[J]. Journal of Biomedical Informatics, 2023, 140: 104339. [40] Tu Wenjun, Wang Longde, Special writing group of china stroke surveillance report. China stroke surveillance report 2021[J]. Military Medical Research, 2023, 10(1): 33. [41] Wang Hesong, Liu Chang, Xu Heng, et al. The association between serum anion gap and all-cause mortality in cerebral infarction patients after treatment with rtPA: a retrospective analysis[J]. Disease Markers, 2022, 2022: 1931818. [42] Jhou H, Chen P, Yang L, et al. Plasma anion gap and risk of in-hospital mortality in patients with acute ischemic stroke: analysis from the MIMIC-IV Database[J]. Journal of Personalized Medicine, 2021, 11(10): 1004. [43] Khan A, Khan Z, Khan S, et al. Frequency of hyponatremia and its impact on prognosis in ischemic stroke[J]. Cureus, 2023, 15(6): e40317. [44] You Shoujiang, Ou Zhijie, Zhang Wei, et al. Combined utility of white blood cell count and blood glucose for predicting in-hospital outcomes in acute ischemic stroke[J]. Journal of Neuroinflammation, 2019, 16(1): 37. [45] Furlan J, Vergouwen M, Fang J, et al. White blood cell count is an independent predictor of outcomes after acute ischaemic stroke[J]. European Journal of Neurology, 2014, 21(2): 215-222. [46] Wang Ximei, Xia Jianhua, Shan Yanhua, et al. Predictive value of the Oxford Acute Severity of Illness Score in acute stroke patients with stroke-associated pneumonia[J]. Frontiers in Neurology, 2023, 14: 1251944. [47] Sico J, Phipps M, Concato J, et al. Thrombocytopenia and in-hospital mortality risk among ischemic stroke patients[J]. Journal of Stroke and Cerebrovascular Diseases, 2013, 22(7): e99-e102. [48] 中国卒中学会. 中国脑血管病临床管理指南[M] . 北京: 人民卫生出版社, 2019. [49] Selioutski O, Auinger P, Siddiqi O, et al. Association of the verbal component of the GCS with mortality in patients with encephalopathy who are not undergoing mechanical ventilation[J]. Neurology, 2022, 98(5): e533-e540. [50] Premraj L, Camarda C, White N, et al. Tracheostomy timing and outcome in critically ill patients with stroke: a meta-analysis and meta-regression[J]. Critical Care (London, England), 2023, 27(1): 132. [51] Liu Hongpeng, Song Baoyun, Jin Jingfen, et al. Length of stay, hospital costs and mortality associated with comorbidity according to the charlson comorbidity index in immobile patients after ischemic stroke in china: a national study[J]. International Journal of Health Policy and Management, 2022, 11(9): 1780-1787.