多尺度特征融合的轻量化Transformer医学图像分割研究

doi:10.3969/j.issn.0258-8021.2025.02.004

摘要
图/表
参考文献
相关文章 (15)

全文: PDF (4449 KB) HTML (1 KB)
输出: BibTeX | EndNote (RIS)

摘要 UNet网络在医学图像分割领域得到广泛应用, 其编码器搭配解码器的U形网络结构已经逐渐成为医学图像分割的主流构架之一。然而, 传统UNet属于纯卷积神经网络, 由于其定位准确性受制于卷积的局部视野, 所以缺乏利用全局依赖关系的能力。Transformer作为目前大模型的核心支撑技术, 具有优秀的捕捉全局依赖关系的能力, 可弥补传统UNet的不足。本研究构建一种新的医学图像分割模型MoFormer。该模型以UNet的编码-解码结构为基础构架,在编码器中融合Transformer学习机制, 扩大了模型上下文感知视野, 提升了局部与全局信息的多尺度特征提取能力。随机初始化的MoFormer模型在BTCV数据集(共包含50例腹部CT图像)上平均Dice系数为0.823;在包含2 750张皮肤镜图像的ISIC2017数据集上达到了与TransFuse相同的效果,但参数量比TransFuse少10.91 M;在包含2 590张内窥镜图像的息肉数据集上实验,其性能超越了PraNet等其他流行的对比模型, 其mIoU值平均提高了0.123。该神经网络模型平衡了参数量和分割精度,在多种医学图像数据集中表现出良好的泛化性。本研究设计的MoFormer模型有效地平衡了参数量和精度, 在多种医学图像分割任务中取得了良好性能。

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	王骁崴
	邢树礼
	毛国君

关键词 ： U形网络, Transformer, 多尺度特征, 轻量化, 医学图像分割

Abstract：UNet has been widely used in the field of medical image segmentation, and its U-shaped encoder-decoder structure has become one of the most popular frameworks. However, the classification and localization accuracy of UNet are limited by the local receptive field of convolutions, which restricts its ability to effectively capture long-range dependencies. Transformer has been demonstrated outstanding capabilities in capturing long-range dependencies and serves as the core supporting technology for current large language models, addressing the limitations of convolutional neural networks. In this paper, a novel medical image segmentation model referred to as MoFormer was proposed. Based on the encoding decoding structure of UNet, this model integrated Transformer learning mechanism in the encoder to expand its context aware field of view and enhanced the multi-scale feature extraction ability of local and global information. The proposed MoFormer with random initialization achieved an average Dice coefficient of 0.823 on the BTCV dataset with 50 abdominal CT images. On the ISIC2017 dataset containing 2 750 dermoscopy images, it performed equally well as TransFuse but with 10.91 M fewer parameters. On the polyp dataset which includes 2 590 endoscopic images, it outperformed other popular comparison models, such as PraNet, with the increase in mIoU value by an average of 0.123. Overall, this neural network model balances the number of parameters with segmentation accuracy, demonstrating strong generalization across various medical image datasets.

Key words： U-shaped network Transformer multi-scale feature lightweight medical images segmentation

收稿日期: 2023-08-22

PACS:

R318

基金资助:中央引导地方科技发展专项(2023L3030);国家自然科学基金(61773415);国家重点研发计划项目(2019YFD0900905)

通讯作者: ^*E-mail: 19662092@fjut.edu.cn

引用本文:

王骁崴, 邢树礼, 毛国君. 多尺度特征融合的轻量化Transformer医学图像分割研究[J]. 中国生物医学工程学报, 2025, 44(2): 165-173.
Wang Xiaowei, Xing Shuli, Mao Guojun. Research on Lightweight Transformer Medical Image Segmentation with Multi-Scale FeatureFusion. Chinese Journal of Biomedical Engineering, 2025, 44(2): 165-173.

链接本文:

http://cjbme.csbme.org/CN/10.3969/j.issn.0258-8021.2025.02.004 或 http://cjbme.csbme.org/CN/Y2025/V44/I2/165

[1] Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 39(4): 640-651.
[2] Ronneberger, O, Fischer F, Brox T. U-net: convolutional networks for biomedical image segmentation [C]//Lecture Notes in Computer Science. Munich: Spring-Verlag Berlin, 2015: 234-241.
[3] Milletari F, Navab N, Ahmadi SA. V-net: fully convolutional neural networks for volumetric medical image segmentation [C]//2016 Fourth International Conference on 3D Vision (3DV). Stanford: IEEE, 2016: 565-571.
[4] Hu Han, Zhang Zheng, Xie Zhenda, et al. Local relation networks for image recognition [C]//2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seoul: IEEE, 2019: 3463-3472.
[5] Ramachandran P, Parmar N, Vaswani A, et al. Stand-alone self-attention in vision models [C]//The 33rd Conference on Neural Information Processing Systems. Vancouver: Neural Information Processing System. 2019: 68-80.
[6] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need [C]//The 31st Conference on Neural Information Processing Systems. Long Beach: Neural Information Processing Systems, 2017: 1-11.
[7] Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16x16 words: transformers for image recognition at scale [C]//The 9th International Conference on Learning Representations. Vienna: Elsevier, 2021: 1-21.
[8] Xie Enze, Wang Wenhai, Yu Zhiding, et al. SegFormer: simple and efficient design for semantic segmentation with transformers [J]. Advances in Neural Information Processing Systems, 2021, 34(1): 12077-12090.
[9] Chen Jieneng, Lu Yongyi, Yu Qihang, et al. Transunet: transformers make strong encoders for medical image segmentation [J/OL]. arXiv preprint arXiv: 2102. 04306v1, 2021-02-08/2023-08-22.
[10] Hatamizadeh A, Tang Yucheng, Nath V, et al. Unetr: transformers for 3d medical image segmentation [C]//2022 IEEE/CVF Winter Conference on Applications of Computer Vision. Waikoloa: IEEE, 2022: 574-584.
[11] Landman B, Xu Zhoubing, Igelsias J, et al. Miccai multi-atlas labeling beyond the cranial vault-workshop and challenge [C]//The 18th International Conference on Medical Image Computing and Computer Assisted Intervention. Munich: Springer, 2015, 5: 12-12.
[12] Codella NCF, Gutman D, Celebi ME, et al. Skin lesion analysis toward melanoma detection: a challenge at the 2017 international symposium on biomedical imaging (ISBI), hosted by the international skin imaging collaboration (ISIC) [C]//2018 IEEE 15th International Symposium on Biomedical Imaging. Washington: IEEE, 2018: 168-172.
[13] Vázquez D, Bernal J, Sánchez FJ, et al. A benchmark for endoluminal scene segmentation of colonoscopy images [J]. Journal of Healthcare Engineering, 2017, 2017(1): 4037190.
[14] Bernal J, Sánchez FJ, Fernández EG, et al. Wm-dova maps for accurate polyp highlighting in colonoscopy: validation vs. saliency maps from physicians [J]. Computerized Medical Imaging and Graphics, 2015, 43(1): 99-111.
[15] Tajbakhsh N, Gurudu SR, Liang Jianming. Automated polyp detection in colonoscopy videos using shape and context information [J]. IEEE Transactions on Medical Imaging, 2015, 35(2): 630-644.
[16] Silva J, Histace A, Romain O, et al. Toward embedded detection of polyps in wce images for early diagnosis of colorectal cancer [J]. International Journal of Computer Assisted Radiology and Surgery, 2014, 9(2): 283-293.
[17] Jha D, Smedsrud PH, Riegler MA, et al. Kvasir-seg: a segmented polyp dataset [C]//The 26th International Conference on MultiMedia Modeling 2020, Daejeon: Springer, 2020: 451-462.
[18] Fan Dengping, Ji Gepeng, Zhou Tao, et al. Pranet: parallel reverse attention network for polyp segmentation [C]//International Conference on Medical Image Computing and Computer Assisted Intervention. Cham: Springer, 2020: 263-273.
[19] Zeiler MD, Fergus R. Visualizing and understanding convolutional networks [C]//ECCV 2014 - 13th European Conference on Computer Vision. Zurich: Springer, 2014: 818-833.
[20] Hu Jie, Shen Li, Sun Gang. Squeeze-and-excitation networks [C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake: IEEE. 2018: 7132-7141.
[21] Chu Xiangxiang, Tian Zhi, Zhang Bo, et al. Conditional positional encodings for vision transformers [J/OL]. https:/arxiv.org/abs/2102.10882,2023-02-13/2023-08-22.
[22] Paszke, A, Gross, S, Massa, F, et al. Pytorch: an imperative style, high performance deep learning library [C]//The 33rd Conference on Neural Information Processing Systems. Vancouver: Neural Information Processing System. 2019: 8026-8037.
[23] Wang Wenxuan, Chen Chen, Ding Meng, et al. Transbts: multimodal brain tumor segmentation using transformer [C]//The 24th International Conference on Medical Image Computing and Computer Assisted Intervention. Strasbourg: Springer, 2021: 109-119.
[24] Zhou Hongyu, Guo Jiansen, Zhang Yinghao, et al. Nnformer: volumetric medical image segmentation via a 3D transformer [J]. IEEE Transactions on Image Processing, 2023, 32(1): 4036-4045.
[25] Tang Yucheng, Yang Dong, Li Wenqi, et al. Self-supervised pre-training of swin transformers for 3d medical image analysis [C]//Proceedings of the 22nd IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 20730-20740.
[26] Shaker AM, Maaz M, Rasheed H, et al. Unetr++: delving into efficient and accurate 3D medical image segmentation [J]. IEEE Transactions on Medical Imaging, 2024, 43(9): 3377-3390.
[27] Li Hang, He Xinzi, Zhou Feng, et al. Dense deconvolutional network for skin lesion segmentation [J]. IEEE Journal of Biomedical and Health Informatics, 2018, 23(2): 527-537.
[28] AlMasni MA, AlAntari MA, Choi MT, et al. Skin lesion segmentation in dermoscopy images via deep full resolution convolutional networks [J]. Computer Methods and Programs in Biomedicine, 2018, 162(1): 221-231.
[29] Bi Lei, Kim J, Ahn E, et al. Step-wise integration of deep class-specific learning for dermoscopic image segmentation [J]. Pattern Recognition, 2019, 85(1): 78-89.
[30] Sarker MMK, Rashwan HA, Akram F, et al. Ssldeep: skin lesion segmentation based on dilated residual and pyramid pooling networks [C]//The 21st International Conference on Medical Image Computing and Computer Assisted Intervention. Granada: Springer, 2018: 21-29.
[31] Zhou Zongwei, Siddiquee MMR, Tajbakhsh N, et al. Unet++: redesigning skip connections to exploit multiscale features in image segmentation [J]. IEEE Transactions on Medical Imaging, 2019, 39(6): 1856-1867.
[32] Shelhamer E, Long J, Darrell T. Fully convolutional networks for semantic segmentation [C]//2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE, 2015: 3431-3440.
[33] Zhang Zhengxin, Liu Qingjie, Wang Yunhong. Road extraction by deep residual unet [J]. IEEE Geoscience and Remote Sensing Letters, 2018, 15(5): 749-753.
[34] Jha D, Smedsrud PH, Riegler MA, et al. Resunet++: an advanced architecture for medical image segmentation [C]//2019 IEEE International Symposium on Multimedia (ISM). San Diego: IEEE, 2019: 225-230.
[35] Fang Yuqi, Chen Cheng, Yuan Yixuan, et al. Selective feature aggregation network with area-boundary constraints for polyp segmentation [C]//The 24th International Conference on Medical Image Computing and Computer Assisted Intervention. Shenzhen: Springer, 2019: 302-310.
[36] He Kaiming, Girshick R, Dollár P. Rethinking imagenet pre-training [C]//Proceedings of the 19th IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 4918-4927.
[37] Shen Zhuoran, Zhang Mingyuan, Zhao Haiyu, et al. Efficient attention: attention with linear complexities [C]//Proceedings of the 21st IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 3531-3539.
[38] Ho J, Kalchbrenner N, Weissenborn D, et al. Axial attention in multidimensional transformers [J/OL]. https://arxiv.org/abs/1912.12180,2019-12-20/2023-08-22.
[39] Murugan P, Durairaj S. Regularization and optimization strategies in deep convolutional neural network [J/OL]. https://arxiv.org/abs/1712.04711,2017-12-13/2023-08-22.
[40] 邓仕俊, 汤红忠, 曾黎等. 基于多尺度特征感知的胸腔图像危及器官分割[J]. 中国生物医学工程学报, 2021: 40(6):701-711.