Research Progress on Transformer-Based Deep Learning Models for Medical Image Segmentation
Zhou Lazhen1, Chen Hongchi1, Li Qiuxia1, Li Fangzuo1,2*
1(School of Medical Information Engineering, Gannan Medical University, Ganzhou 341000, Jiangxi, China) 2(Key Laboratory of Prevention and Treatment of Cardiovascular and Cerebrovascular Diseases, Ministry of Education, Gannan Medical University, Ganzhou 341000, Jiangxi, China)
Abstract:Accurate segmentation of medical images is a crucial step in clinical diagnosis and treatment. Over the past decade, convolutional neural network (CNN) has been widely applied in the field of medical image segmentation and have achieved excellent segmentation performance. However, the inherent inductive bias in CNN architectures limits their ability to model long-range dependencies in images. In contrast, the Transformer architectures, which focus on global information and the ability to model long-range dependencies, has been demonstrated outstanding performance in biomedical image segmentation. This review introduced the components of Transformer architecture and its applications in medical image segmentation. From perspectives of fully supervised, unsupervised and semi-supervised learning, application values and performances of Transformer architectures in abdominal multi-organ segmentation, cardiac segmentation and brain tumor segmentation were summarized and analyzed. Finally, limitations of Transformer model in segmentation tasks and future optimizations were prospected.
周腊珍, 陈红池, 李秋霞, 李坊佐. 基于Transformer深度学习模型在医学图像分割中的研究进展[J]. 中国生物医学工程学报, 2024, 43(4): 467-476.
Zhou Lazhen, Chen Hongchi, Li Qiuxia, Li Fangzuo. Research Progress on Transformer-Based Deep Learning Models for Medical Image Segmentation. Chinese Journal of Biomedical Engineering, 2024, 43(4): 467-476.
[1] Zhou Shaohua, Greenspan H, Davatzikos C, et al. A review of deep learning in medical imaging: imaging traits, technology trends, case studies with progress highlights, and future promises[J]. Proceedings of the IEEE, 2021, 109(5): 820-838. [2] Beutel J, Kundel HL, Van Metter RL. Handbook of Medical Imaging[M]. Bellingham: SPEI Press, 2000. [3] Lo SB, Lou SA, Lin JS, et al. Artificial convolution neural network techniques and applications for lung nodule detection[J]. IEEE Trans Med Imaging, 1995, 14(4): 711-718. [4] Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation[C]//Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference. Munich: Springer International Publishing, 2015: 234-241. [5] Seo H, Huang C, Bassenne M, et al. Modified U-Net (mU-Net) with incorporation of object-dependent high level features for improved liver and liver-tumor segmentation in CT images[J]. IEEE Trans Med Imaging, 2020, 39(5): 1316-1325. [6] Cheng Yinlin, Ma Mengnan, Zhang Liangjun, et al. Retinal blood vessel segmentation based on densely connected U-Net[J]. Math Biosci Eng, 2020, 17(4): 3088-3108. [7] 纪建兵, 陈纾, 杨媛媛. 双重降维通道注意力门控U-Net的胰腺CT分割[J]. 中国生物医学工程学报, 2023, 42(3): 281-288. [8] Zhou Hongyu, Lu Chixiang, Yang Sibei, et al. Convnets vs transformers: whose visual representations are more transferable?[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021: 2230-2238. [9] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[DB/OL]. https://arxiv.org/abs/1409.1556, 2015-04-10/2023-09-01. [10] Liang Ming, Hu Xiaolin. Recurrent convolutional neural network for object recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE, 2015: 3367-3375. [11] He Kaiming, Zhang Xiangyu, Ren Shaoqing, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 770-778. [12] Devalla SK, Renukanand PK, Sreedhar BK, et al. DRUNET: a dilated-residual u-net deep learning network to segment optic nerve head tissues in optical coherence tomography images[J]. Biomed Opt Express, 2018, 9(7): 3244-3265. [13] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]//Advances in Neural Information Processing Systems. New York: Curran Associates Inc, 2017: 5998-6008. [14] Wang Wenxuan, Chen Chen, Ding Meng, et al. Transbts: multimodal brain tumor segmentation using transformer[C]//Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference. Strasbourg: Springer International Publishing, 2021: 109-119. [15] Xu Guoping, Wu Xingrong, Zhang Xuan, et al. Levit-unet: make faster encoders with transformer for medical image segmentation[DB/OL]. https://arxiv.org/abs/2107.08623, 2021-07-19/2023-09-01. [16] Hong Zhifang, Chen Mingzhi, Hu Weijie, et al. Dual encoder network with transformer-CNN for multi-organ segmentation[J]. Med Biol Eng Comput, 2023, 61(3): 661-671. [17] Chen Jieneng, Lu Yongyi, Yu Qihang, et al. Transunet: transformers make strong encoders for medical image segmentation[DB/OL]. https://arxiv.org/abs/2102.04306, 2021-02-08/2023-09-01. [18] 李军, 叶欣怡, 杨长才, 等. 基于联合深度网络和形态结构约束的三维医学图像分割方法[J]. 中国生物医学工程学报, 2023, 42(1): 30-40. [19] Liu Ze, Lin Yutong, Cao Yue, et al. Swin transformer: Hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Montreal: IEEE, 2021: 10012-10022. [20] Cao Hu, Wang Yueyue, Chen J, et al. Swin-unet: Unet-like pure transformer for medical image segmentation[C]//European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2022: 205-218. [21] Wu Yixuan, Liao Kuanlun, Chen Jintai, et al. D-former: a U-shaped dilated transformer for 3D medical image segmentation[J]. Neural Computing and Applications, 2023, 35(2): 1931-1944. [22] Zhou Hongyu, Guo Jiansen, Zhang Yinghao, et al. Nnformer: interleaved transformer for volumetric segmentation[DB/OL]. https://arxiv.org/abs/2109.03201, 2022-02-04/2023-09-01. [23] Shaker A, Maaz M, Rasheed H, et al. UNETR++: delving into efficient and accurate 3D medical image segmentation[DB/OL]. https://arxiv.org/abs/2212.04497, 2023-03-22/2023-09-01. [24] Azad R, Arimond R, Aghdam EK, et al. Dae-former: dual attention-guided efficient transformer for medical image segmentation[DB/OL]. https://arxiv.org/abs/2212.13504, 2023-07-26/2023-09-01. [25] Yang XN, Tian XL. Transnunet: using attention mechanism for whole heart segmentation[C]//2022 IEEE 2nd International Conference on Power, Electronics and Computer Applications (ICPECA). Shenyang: IEEE, 2022: 553-556. [26] Woo S, Park J, Lee JY, et al. Cbam: convolutional block attention module[DB/OL]. https://arxiv.org/abs/1807.06521, 2018-07-18/2023-09-01. [27] Wang Haonan, Cao Peng, Wang Jiaqi, et al. UCTransnet: rethinking the skip connections in u-net from a channel-wise perspective with transformer[C]//Proceedings of the AAAI Conference on Artificial Intelligence. California: AAAI Press, 2022: 2441-2449. [28] Huang Xiaohong, Deng Zhifang, Li Dandan, et al. MISSFormer: an effective transformer for 2D medical image segmentation[J]. IEEE Trans Med Imaging, 2023, 42(5): 1484-1494. [29] Wang Jing, Zhao Haiyue, Liang Wei, et al. Cross-convolutional transformer for automated multi-organs segmentation in a variety of medical images[J]. Phys Med Biol, 2023, 68(3): 035008. [30] Tragakis A, Kaul C, Murray-Smith R, et al. The fully convolutional transformer for medical image segmentation[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision(WACV). Waikoloa: IEEE, 2023: 3660-3669. [31] Isensee F, Jaeger PF, Kohl SAA, et al. nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation[J]. Nature Methods, 2021, 18(2): 203-211. [32] Hatamizadeh A, Tang Yucheng, Nath V, et al. UNETR: transformers for 3D medical image segmentation[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision(WACV). Waikoloa: IEEE, 2022: 574-584. [33] Hatamizadeh A, Nath V, Tang Yucheng, et al. Swin unetr: swin transformers for semantic segmentation of brain tumors in mri images[C]//International MICCAI Brainlesion Workshop. Cham: Springer International Publishing, 2021: 272-284. [34] Li Jiangyun, Wang Wenxuan, Chen Chen, et al. TransBTSV2: wider instead of deeper transformer for medical image segmentation[DB/OL]. https://arxiv.org/abs/2201.12785v2, 2022-05-17/2023-09-01. [35] Peiris H, Hayat M, Chen Zhaolin, et al. A robust volumetric transformer for accurate 3D tumor segmentation[C]//International Conference on Medical Image Computing and Computer-Assisted Intervention(MICCAI 2022). Cham: Springer Nature Switzerland, 2022: 162-172. [36] Li Zongren, Wushouer S, Wang Yuzhen, et al. DenseTrans: multimodal brain tumor segmentation using swin transformer[J]. IEEE Access, 2023, 11: 42895-42908. [37] Zhou Zongwei, Rahman Siddiquee MM, Tajbakhsh N, et al. Unet++: a nested u-net architecture for medical image segmentation[C]//Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support: 4th International Workshop, DLMIA 2018, and 8th International Workshop, ML-CDS 2018, Held in Conjunction with MICCAI 2018. Granada: Springer, 2018: 3-11. [38] Liu Lihao, Huang Zhening, Liò P, et al. PC-SwinMorph: Patch representation for unsupervised medical image registration and segmentation[DB/OL]. https://arxiv.org/abs/2203.05684, 2022-07-20/2023-09-01. [39] Wang Yiqing, Li Zihan, Mei Jieru, et al. Swinmm: masked multi-view with swin transformers for 3D medical image segmentation[C]//International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer Nature Switzerland, 2023: 486-496. [40] Wang Tao, Lu Jianglin, Lai Zhihui, et al. Uncertainty-guided pixel contrastive learning for semi-supervised medical image segmentation[C]//Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence. Vienna: IJCAI, 2022: 1444-1450. [41] Liu Qianying, Gu Xiao, Henderson P, et al. Multi-scale cross contrastive learning for semi-supervised medical image segmentation[DB/OL]. https://arxiv.org/abs/2306.14293, 2023-06-25/2023-09-01. [42] Xie Yutong, Zhang Jianpeng, Xia Yong, et al. Unimiss: universal medical self-supervised learning via breaking dimensionality barrier[C]//European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2022: 558-575. [43] Wang Bo, Li Qian, You Zheng. Self-supervised learning based transformer and convolution hybrid network for one-shot organ segmentation[J]. Neurocomputing, 2023, 527: 1-12. [44] Xiao Zhiyong, Su Yixin, Deng Zhaohong, et al. Efficient combination of CNN and transformer for dual-teacher uncertainty-guided semi-supervised medical image segmentation[J]. Computer Methods and Programs in Biomedicine, 2022, 226: 107099. [45] Xie Shiao, Huang Huimin, Niu Ziwei, et al. MedFCT: a frequency domain joint CNN-transformer network for semi-supervised medical image segmentation[C]//2023 IEEE International Conference on Multimedia and Expo (ICME). Brisbane: IEEE, 2023: 1913-1918. [46] Wu Wenxia, Yan Jing, Zhao Yuanshen, et al. Multi-task learning for concurrent survival prediction and semi-supervised segmentation of gliomas in brain MRI[J]. Displays, 2023, 78: 102402. [47] Luo Xiangde, Hu Minhao, Song Tao, et al. Semi-supervised medical image segmentation via cross teaching between CNN and transformer[C]//International Conference on Medical Imaging with Deep Learning. Zurich: PMLR, 2022: 820-833. [48] Shin H, Kim H, Kim S, et al. SDC-UDA: volumetric unsupervised domain adaptation framework for slice-direction continuous cross-modality medical image segmentation[DB/OL]. https://arxiv.org/abs/2305.11012, 2023-05-18/2023-09-01. [49] Tang Yucheng, Yang Dong, Li Wenqi, et al. Self-supervised pre-training of swin transformers for 3d medical image analysis[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 20730-20740. [50] Liang Junjie, Yang Cihui, Zhong Jingting, et al. Btswin-unet: 3D U-shaped symmetrical swin transformer-based network for brain tumor segmentation with self-supervised pre-training[J]. Neural Processing Letters, 2023, 55(4): 3695-3713.