基于高效加性注意力的级联式特征融合息肉分割网络

doi:10.3969/j.issn.0258-8021.2025.04.007

摘要
图/表
参考文献
相关文章 (1)

全文: PDF (2509 KB) HTML (1 KB)
输出: BibTeX | EndNote (RIS)

摘要为解决大多数息肉分割方法存在的局部和全局信息交互不足、相邻层不同深度间的特征弱相关性问题,本研究提出了一种基于金字塔视觉Transformer和自注意力机制级联解码器的网络模型(PVT-SMCD)。首先,以PVTv2为骨干网络提取图像特征,通过高效加性注意力获取关键信息,捕捉长距离依赖关系;其次,引入多核卷积增强块定位息肉的高级语义特征,将其输入到级联解码器中实现局部和全局间的信息交互;最后,利用特征融合模块自上而下逐步融合相邻层间的特征以减少高维特征与低维特征间的信息差距。所提出模型在5个息肉分割数据集上与其他8种医学图像分割网络进行对比,其中在Kvasir和CVC-ClinicDB数据集上,mDice分别为92.3%、94.5%,mIoU为87.1%、89.9%,MAE分别为0.021和0.006;在CVC-300上,mDice和mIoU分别达到了90%和83.3%,MAE为0.007;在CVC-ColonDB上mDice为81.5%,mIoU为73.5%,MAE为0.028;在ETIS数据集上,mDice为78.9%,mIoU为71.3%,MAE为0.019。实验结果表明,PVT-SMCD在绝大多数评价指标上均优于对比模型,展现出更优异的学习能力和泛化性能,能够实现更精准的息肉分割效果。

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	李萌
	张孙杰

关键词 ：息肉分割, 金字塔视觉Transformer, 级联解码器, 高效加性注意力

Abstract：To address the problems of insufficient information interaction between local and global and weak correlation of features between neighboring layers at different depths in most polyp segmentation methods, this paper proposed a new network model (PVT-SMCD) based on the Pyramid Vision Transformer and self-attention mechanism cascaded decoder. Firstly, PVTv2 was used as the backbone network to extract image features, and the key information was obtained through the efficient additive attention module to capture the long-distance dependency relationships. Secondly, a multiple kernel convolution enhance blocks were introduced to locate high-level semantic features of polyps, and the obtained featureswere inputted to the cascade decoder to achieve the information interaction between the local and global layers. And lastly, a feature fusion module was used to gradually fuse the features between the two neighboring layers from top to bottom to reduce the information gap between the fused high-dimensional features and the low-dimensional features. The model in this paperwas compared with other eight medical image segmentation networks on five polyp segmentation datasets, mDice on Kvasir and CVC-ClinicDB datasets were 92.3% and 94.5%, mIoU were 87.1% and 89.9%, and the MAE were 0.021 and 0.006, respectively; on CVC-300, mDice and mIoU were 90% and 83.3%, respectively, with an MAE of 0.007; on CVC-ColonDB mDice were81.5%, mIoU was 73.5%, and MAE was 0.028; And on the ETIS dataset, mDice was 78.9%, mIoU was 71.3%, and MAE was0.019. Experimental results indicated that PVT-SMCD achieved superior performance over state-of-the-art methods across most of the evaluation metrics, demonstrating enhanced learning ability and generalization capacity, leading to more precise polyp segmentation outcomes.

Key words： polyp segmentation the Pyramid Vision Transformer cascaded decoder efficient additiveattention

收稿日期: 2024-10-14

PACS:

R318

基金资助:国家自然科学基金(61603255);上海市晨光计划项目(18CG52)

通讯作者: ^*E-mail:zhang_sunjie@126.com

引用本文:

李萌, 张孙杰. 基于高效加性注意力的级联式特征融合息肉分割网络[J]. 中国生物医学工程学报, 2025, 44(4): 447-456.
Li Meng, Zhang Sunjie. Cascading Feature Fusion Polyp Segmentation Network Based on Efficient Additive Attention. Chinese Journal of Biomedical Engineering, 2025, 44(4): 447-456.

链接本文:

http://cjbme.csbme.org/CN/10.3969/j.issn.0258-8021.2025.04.007 或 http://cjbme.csbme.org/CN/Y2025/V44/I4/447

[1] 黄晓鸣,何富运,唐晓虎,等. U-Net及其变体在医学图像分割中的应用研究综述 [J]. 中国生物医学工程学报, 2022, 41(5): 567-576.
[2] 张杰妹,杨词慧. 基于RV-FCN的CT肝脏影像自动分割算法 [J]. 计算机工程, 2019, 45(7): 258-263.
[3] Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation [C]//Medical Image Computing and Computer-Assisted Intervention. Berlin: Springer, 2015: 234-241.
[4] Zhou Zongwei, Siddiquee M, Tajbakhsh N, et al. Unet++: a nested u-net architecture for medical image segmentation [C]//Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support.New York: ACM, 2018: 3-11.
[5] Jha D, Riegler MA, Johansen D, et al. Doubleu-net: A deep convolutional neural network for medical image segmentation [C]//Proceedings of the IEEE 33rd International Symposium on Computer Based Medical Systems. Piscataway: IEEE, 2020: 558-564.
[6] Jha D, Smedsrud PH, Riegler MA, et al. Resunet++: An advanced architecture for medical image segmentation [C]//Proceedings of the IEEE International Symposium on Multimedia. Piscataway: IEEE, 2019: 225-255.
[7] Oktay O, Schlemper J, Folgoc LL, et al. Attention u-net: learning where to look for the pancreas [J]. Medical Image Analysis, 2020, 62: 101663.
[8] Tran TT, Pham VT. Fully convolutional neural network with attention gate and fuzzy active contour model for skin lesion segmentation [J]. Multimedia Tools and Applications, 2022, 81(10): 13979-13999.
[9] Hu Jie, Shen Li, Sun Gang. Squeeze-and-excitation networks [C]//Proceedings of the IEEE Conference On Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 7132-7141.
[10] Woo SH, Park JC, Lee JY, et al. CBAM: Convolutional block attention module [C]//Computer Vision. Berlin: Springer, 2018: 3-19.
[11] Fan Dengping, Ji Gepeng, Zhou Tao, et al. Pranet: Parallel reverse attention network for polyp segmentation [C]//International Conference on Medical Image Computing and Computer-Assisted Intervention. Berlin: Springer, 2020: 263-273.
[12] Zhang Ruifei, Lai Peiwen, Wang Xiang, et al. Lesion-aware dynamic kernel for polyp segmentation [C]//Medical Image Computing and Computer Assisted Intervention. Berlin: Springer, 2022: 99-109.
[13] Cao Hu, Wang Yueyue, Chen Joy, et al. Swin-unet: Unet-like pure transformer for medical image segmentation [C]//Computer Vision. Berlin: Springer, 2022: 205-218.
[14] 孙红,朱江明,吴一凡,等. GFENet:基于Transformer的高效医学图像分割网络 [J]. 小型微型计算机系统, 2024, 45(7): 1728-1733.
[15] Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16x16 words: Transformers for image recognition at scale [C]//Proceedings of the 9th International Conference on Learning Representations. Virtual: ICLR, 2021: 1-25.
[16] Liu Ze, Lin Yutong, Cao Yue, et al. Swin transformer: hierarchical vision transformer using shifted windows [C]//Proceedings of the IEEE/CVF international conference on computer vision. Piscataway: IEEE, 2021: 9992-10002.
[17] Wang Wenhai, Xie Enze, Li Xiang, et al. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions [C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE, 2021: 568-578.
[18] Wang Wenhai, Xie Enze, Li Xiang, et al. Pvtv2: Improved baselines with pyramid vision transformer [J]. Computational Visual Media, 2022, 8(3): 415-424.
[19] 陈新禾. 基于深度学习的目标检测算法研究 [D]. 合肥: 安徽工程大学, 2023.
[20] Dong Bo, Wang Wenhai, Fan Dengping, et al. Polyp-PVT: polyp segmentation with pyramid vision transformers [J]. CAAI Artificial Intelligence Research, 2023, 2: 9150015.
[21] 周杰璐,林嘉希,朱锦舟. 人工智能在内镜下结直肠肿瘤诊断中的应用 [J]. 中国医疗设备, 2024, 39(9): 136-143.
[22] Wang Jinfeng, Huang Qiming, Tang Feilong, et al. Stepwise feature fusion: Local guides global [C]//Medical Image Computing and Computer Assisted Intervention. Berlin: Springer, 2022: 110-120.
[23] Rahman MM, Marculescu R. Medical image segmentation via cascaded attention decoding [C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. Piscataway: IEEE, 2023: 6222-6231.
[24] Shaker A, Maaz M, Rasheed H, et al. SwiftFormer: Efficient additive attention for transformer-based real-time mobile vision applications [C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Paris: IEEE, 2023: 17425-17436.
[25] Jha D, Smedsrud PH, Riegler MA, et al. Kvasir-SEG: a segmented polyp dataset [C]//MultiMedia Modeling. Berlin: Springer, 2020: 451-462.
[26] Bernal J, Sánchez FJ, Fernández-Esparrach G, et al. WM-DOVA maps for accurate polyp highlighting in colonoscopy: Validation vs. saliency maps from physicians [J]. Computerized Medical Imaging and Graphics, 2015, 43: 99-111.
[27] Vázquez D, Bernal J, Sánchez FJ, et al. A benchmark for endoluminal scene segmentation of colonoscopy images [J]. Journal of Healthcare Engineering, 2017,2017:1-9.
[28] Tajbakhsh N, Gurudu SR, Liang Jianming. Automated polyp detection in colonoscopy videos using shape and context information [J]. IEEE Trans Med Imaging, 2015, 35(2): 630-644.
[29] Silva J, Histace A, Romain O, et al. Toward embedded detection of polyps in WCE images for early diagnosis of colorectal cancer [J]. International Journal of Computer Assisted Radiology and Surgery, 2014, 9: 283-293.
[30] Fang Yuqi, Chen Cheng, Yuan Yixuan, et al. Selective feature aggregation network with area-boundary constraints for polyp segmentation [C]//Medical Image Computing and Computer Assisted Intervention. Berlin: Springer, 2019: 302-310.
[31] Kim T, Lee H, Kim DJ. UACANet: Uncertainty augmented context attention for polyp segmentation [C]//Proceedings of the 29th ACM International Conference on Multimedia. New York: ACM, 2021: 2167-2175.
[32] Wei Jun, Hu Yiwen, Zhang Ruimao, et al. Shallow attention network for polyp segmentation [C]//Medical Image Computing and Computer Assisted Intervention. Berlin: Springer, 2021: 699-708.