Abstract:To address the problems of insufficient information interaction between local and global and weak correlation of features between neighboring layers at different depths in most polyp segmentation methods, this paper proposed a new network model (PVT-SMCD) based on the Pyramid Vision Transformer and self-attention mechanism cascaded decoder. Firstly, PVTv2 was used as the backbone network to extract image features, and the key information was obtained through the efficient additive attention module to capture the long-distance dependency relationships. Secondly, a multiple kernel convolution enhance blocks were introduced to locate high-level semantic features of polyps, and the obtained featureswere inputted to the cascade decoder to achieve the information interaction between the local and global layers. And lastly, a feature fusion module was used to gradually fuse the features between the two neighboring layers from top to bottom to reduce the information gap between the fused high-dimensional features and the low-dimensional features. The model in this paperwas compared with other eight medical image segmentation networks on five polyp segmentation datasets, mDice on Kvasir and CVC-ClinicDB datasets were 92.3% and 94.5%, mIoU were 87.1% and 89.9%, and the MAE were 0.021 and 0.006, respectively; on CVC-300, mDice and mIoU were 90% and 83.3%, respectively, with an MAE of 0.007; on CVC-ColonDB mDice were81.5%, mIoU was 73.5%, and MAE was 0.028; And on the ETIS dataset, mDice was 78.9%, mIoU was 71.3%, and MAE was0.019. Experimental results indicated that PVT-SMCD achieved superior performance over state-of-the-art methods across most of the evaluation metrics, demonstrating enhanced learning ability and generalization capacity, leading to more precise polyp segmentation outcomes.
[1] 黄晓鸣,何富运,唐晓虎,等. U-Net及其变体在医学图像分割中的应用研究综述 [J]. 中国生物医学工程学报, 2022, 41(5): 567-576.
[2] 张杰妹,杨词慧. 基于RV-FCN的CT肝脏影像自动分割算法 [J]. 计算机工程, 2019, 45(7): 258-263.
[3] Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation [C]//Medical Image Computing and Computer-Assisted Intervention. Berlin: Springer, 2015: 234-241.
[4] Zhou Zongwei, Siddiquee M, Tajbakhsh N, et al. Unet++: a nested u-net architecture for medical image segmentation [C]//Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support.New York: ACM, 2018: 3-11.
[5] Jha D, Riegler MA, Johansen D, et al. Doubleu-net: A deep convolutional neural network for medical image segmentation [C]//Proceedings of the IEEE 33rd International Symposium on Computer Based Medical Systems. Piscataway: IEEE, 2020: 558-564.
[6] Jha D, Smedsrud PH, Riegler MA, et al. Resunet++: An advanced architecture for medical image segmentation [C]//Proceedings of the IEEE International Symposium on Multimedia. Piscataway: IEEE, 2019: 225-255.
[7] Oktay O, Schlemper J, Folgoc LL, et al. Attention u-net: learning where to look for the pancreas [J]. Medical Image Analysis, 2020, 62: 101663.
[8] Tran TT, Pham VT. Fully convolutional neural network with attention gate and fuzzy active contour model for skin lesion segmentation [J]. Multimedia Tools and Applications, 2022, 81(10): 13979-13999.
[9] Hu Jie, Shen Li, Sun Gang. Squeeze-and-excitation networks [C]//Proceedings of the IEEE Conference On Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 7132-7141.
[10] Woo SH, Park JC, Lee JY, et al. CBAM: Convolutional block attention module [C]//Computer Vision. Berlin: Springer, 2018: 3-19.
[11] Fan Dengping, Ji Gepeng, Zhou Tao, et al. Pranet: Parallel reverse attention network for polyp segmentation [C]//International Conference on Medical Image Computing and Computer-Assisted Intervention. Berlin: Springer, 2020: 263-273.
[12] Zhang Ruifei, Lai Peiwen, Wang Xiang, et al. Lesion-aware dynamic kernel for polyp segmentation [C]//Medical Image Computing and Computer Assisted Intervention. Berlin: Springer, 2022: 99-109.
[13] Cao Hu, Wang Yueyue, Chen Joy, et al. Swin-unet: Unet-like pure transformer for medical image segmentation [C]//Computer Vision. Berlin: Springer, 2022: 205-218.
[14] 孙红,朱江明,吴一凡,等. GFENet:基于Transformer的高效医学图像分割网络 [J]. 小型微型计算机系统, 2024, 45(7): 1728-1733.
[15] Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16x16 words: Transformers for image recognition at scale [C]//Proceedings of the 9th International Conference on Learning Representations. Virtual: ICLR, 2021: 1-25.
[16] Liu Ze, Lin Yutong, Cao Yue, et al. Swin transformer: hierarchical vision transformer using shifted windows [C]//Proceedings of the IEEE/CVF international conference on computer vision. Piscataway: IEEE, 2021: 9992-10002.
[17] Wang Wenhai, Xie Enze, Li Xiang, et al. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions [C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE, 2021: 568-578.
[18] Wang Wenhai, Xie Enze, Li Xiang, et al. Pvtv2: Improved baselines with pyramid vision transformer [J]. Computational Visual Media, 2022, 8(3): 415-424.
[19] 陈新禾. 基于深度学习的目标检测算法研究 [D]. 合肥: 安徽工程大学, 2023.
[20] Dong Bo, Wang Wenhai, Fan Dengping, et al. Polyp-PVT: polyp segmentation with pyramid vision transformers [J]. CAAI Artificial Intelligence Research, 2023, 2: 9150015.
[21] 周杰璐,林嘉希,朱锦舟. 人工智能在内镜下结直肠肿瘤诊断中的应用 [J]. 中国医疗设备, 2024, 39(9): 136-143.
[22] Wang Jinfeng, Huang Qiming, Tang Feilong, et al. Stepwise feature fusion: Local guides global [C]//Medical Image Computing and Computer Assisted Intervention. Berlin: Springer, 2022: 110-120.
[23] Rahman MM, Marculescu R. Medical image segmentation via cascaded attention decoding [C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. Piscataway: IEEE, 2023: 6222-6231.
[24] Shaker A, Maaz M, Rasheed H, et al. SwiftFormer: Efficient additive attention for transformer-based real-time mobile vision applications [C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Paris: IEEE, 2023: 17425-17436.
[25] Jha D, Smedsrud PH, Riegler MA, et al. Kvasir-SEG: a segmented polyp dataset [C]//MultiMedia Modeling. Berlin: Springer, 2020: 451-462.
[26] Bernal J, Sánchez FJ, Fernández-Esparrach G, et al. WM-DOVA maps for accurate polyp highlighting in colonoscopy: Validation vs. saliency maps from physicians [J]. Computerized Medical Imaging and Graphics, 2015, 43: 99-111.
[27] Vázquez D, Bernal J, Sánchez FJ, et al. A benchmark for endoluminal scene segmentation of colonoscopy images [J]. Journal of Healthcare Engineering, 2017,2017:1-9.
[28] Tajbakhsh N, Gurudu SR, Liang Jianming. Automated polyp detection in colonoscopy videos using shape and context information [J]. IEEE Trans Med Imaging, 2015, 35(2): 630-644.
[29] Silva J, Histace A, Romain O, et al. Toward embedded detection of polyps in WCE images for early diagnosis of colorectal cancer [J]. International Journal of Computer Assisted Radiology and Surgery, 2014, 9: 283-293.
[30] Fang Yuqi, Chen Cheng, Yuan Yixuan, et al. Selective feature aggregation network with area-boundary constraints for polyp segmentation [C]//Medical Image Computing and Computer Assisted Intervention. Berlin: Springer, 2019: 302-310.
[31] Kim T, Lee H, Kim DJ. UACANet: Uncertainty augmented context attention for polyp segmentation [C]//Proceedings of the 29th ACM International Conference on Multimedia. New York: ACM, 2021: 2167-2175.
[32] Wei Jun, Hu Yiwen, Zhang Ruimao, et al. Shallow attention network for polyp segmentation [C]//Medical Image Computing and Computer Assisted Intervention. Berlin: Springer, 2021: 699-708.