基于轻量级多尺度CNN-Transformer网络的鼻咽癌诊断方法

doi:10.3969/j.issn.0258-8021.2025.03.003

摘要
图/表
参考文献
相关文章 (15)

全文: PDF (8575 KB) HTML (1 KB)
输出: BibTeX | EndNote (RIS)

摘要深度学习(DL)技术是辅助临床医生进行内窥镜图像中鼻咽癌(NPC)肿瘤物的诊断重要手段,但其面临两个挑战:1)图像局部区域的视觉信息相似而冗余,可能会导致低效的计算效率;2)全局信息和局部特征之间的长期的动态交互往往会导致无效的学习,同时增加冗余计算。针对上述问题,提出了一种轻量级多尺度CNN-Transformer网络,称为L-MTransNet。该网络由多尺度的卷积神经网络(MCNN)块和具有动态卷积的多尺度CNN-Transformer(MTrans)构成。首先,MCNN用于提取内窥镜数据的多尺度的局部特征,降低局部信息的冗余;其次,为了在同一特征层级具有精细和粗糙的多尺度特征表示,并且重构每个多尺度局部特征间的全局关系,多路径视觉Transformer(MPViT)和动态卷积Transformer(TransNet)组成的MTrans模块被构建。其赋予网络较强的归纳偏置和全局信息交互能力,缓解特征的表示差异和提升融合效率。基于深圳市第二人民医院采集的300例患者的临床内窥镜数据集进行验证实验。结果证明,分类准确率为94.53%±0.35%,F1-评分为94.17%±0.34%,AUC达到98.61%±0.07%,同时具有较低的计算成本,参数为5.9 M,FLOPs为7.6 G。所提出的方法展现出了良好的效果,有望应用于内窥镜图像的NPC肿瘤早期筛查。

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	任宇
	杨鹏
	范小琴
	汪天富
	聂国辉
	雷柏英

关键词 ：鼻咽癌, 轻量级, 多尺度, Transformer, 动态卷积

Abstract：Deep Learning (DL) technology is an important method to assist clinicians in the diagnosis of nasopharyngeal carcinoma (NPC) in endoscopic images, However, it still faces two challenges: 1) The visual information in local areas of the image is similar, and redundant, which may lead to inefficient computing efficiency. 2) The long-term dynamic interaction between global context information and local features often leads to ineffective learning and increases redundant calculations. To address the above problems, we proposed a lightweight multi-scale CNN-Transformer hybrid network, named L-MTransNet, which consisted of a multi-scale CNN (MCNN) block and multi-scale Transformer (MTrans) block with a hybrid CNN-Transformer feature extraction backbone. First, CNN block was used to extract local features with multi-scale in endoscopic images and reduce the redundancy of local information. Secondly, to have fine and coarse multi-scale feature representation at the same feature level and reconstruct the global relationship between each multi-scale local feature, the MTrans module composed of amulti-path vision Transformer (MPViT) and Transformer with dynamic convolution (TransNet) was constructed. It gaves the network strong inductive bias and global information interaction capabilities, alleviated feature representation differences, and improved fusion efficiency. The results of extensive experiments based on a clinical endoscopy dataset of 300 patients collected from Shenzhen Second People′s Hospital demonstrated the effectiveness of the L-MTransNet. The Acc was 94.53%±0.35%, the F1 score was 94.17%±0.34%, and the AUC reached 98.61%±0.07% while having a low computational cost with parameters of 5.9 M and FLOPs of 7.6 G. The proposed method exhibited excellent performance and was expected to be applied to the early-stage screening of NPC tumors from endoscopic images.

Key words： nasopharyngeal carcinoma lightweight multi-scale Transformer dynamic convolution

收稿日期: 2024-04-02

PACS:

R318

基金资助:国家自然科学联合基金项目(U22A2024)

通讯作者: ^*E-mail: leiby@szu.edu.cn

引用本文:

任宇, 杨鹏, 范小琴, 汪天富, 聂国辉, 雷柏英. 基于轻量级多尺度CNN-Transformer网络的鼻咽癌诊断方法[J]. 中国生物医学工程学报, 2025, 44(3): 279-290.
Ren Yu, Yang Peng, Fan Xiaoqin, Wang Tianfu, Nie Guohui, Lei Baiying. Nasopharyngeal CarcinomaDiagnosis Method Based on Lightweight Multi-ScaleCNN-Transformer Network. Chinese Journal of Biomedical Engineering, 2025, 44(3): 279-290.

链接本文:

http://cjbme.csbme.org/CN/10.3969/j.issn.0258-8021.2025.03.003 或 http://cjbme.csbme.org/CN/Y2025/V44/I3/279

[1] Badoual C. Update from the 5th edition of the World Health Organization classification of head and neck tumors: oropharynx and nasopharynx[J]. Head and Neck Pathology, 2022, 16(1): 19-30.
[2] 康敏. 中国鼻咽癌放射治疗指南 (2020 版)[J]. 中华肿瘤防治杂志, 2021, 28(3):167-177.
[3] Chen Yupei, Chan ATC, Le QT, et al. Nasopharyngeal carcinoma[J]. The Lancet, 2019, 394(10192): 64-80.
[4] Wang Zipei, Fang Mengjie, Zhang Jie, et al. Radiomics and deep learning in nasopharyngeal carcinoma: a review[J]. IEEE Reviews in Biomedical Engineering, 2023, 17: 118-135.
[5] Wang Shixu, Li Ying, Zhu Jiqing, et al. The detection of nasopharyngeal carcinomas using a neural network based on nasopharyngoscopic images[J]. The Laryngoscope, 2024, 134(1): 127-135.
[6] Rezvy S, Zebin T, Braden B, et al. Transfer learning for Endoscopy disease detection and segmentation with mask-RCNN benchmark architecture[C]//International Workshop and Challenge on Computer Vision in Endoscopy 2020. Iowa City: CEUR Workshop Proceedings, 2020, 2595: 68-72.
[7] Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks[C]//Advances in Neural Information Processing Systems. Lake Tahoe: NIPS, 2012: 1097-1105.
[8] He Kaiming, Zhang Xiangyu, Ren Shaoqing, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 770-778.
[9] Huang Gao, Liu Zhuang, Van Der Maaten L, et al. Densely connected convolutional networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 4700-4708.
[10] Tan Mingxing, Le Q. Efficientnet: rethinking model scaling for convolutional neural networks[C]//Proceedings of the 36th International Conference on Machine Learning, Long Beach: PMLR, 2019: 6105-6114.
[11] 李炯逸, 李彬, 邱前辉, 等. 基于 MRI 与优化 3D-ResNet18 的鼻咽癌复发预测模型[J]. 中国生物医学工程学报, 2023, 42(5): 583-593.
[12] Ali S, Dmitrieva M, Ghatwary N, et al. Deep learning for detection and segmentation of artefact and disease instances in gastrointestinal endoscopy[J]. Medical Image Analysis, 2021, 70: 102002.
[13] Ali S, Zhou F, Bailey A, et al. A deep learning framework for quality assessment and restoration in video endoscopy[J]. Medical Image Analysis, 2021, 68: 101900.
[14] Brandao P, Mazomenos E, Ciuti G, et al. Fully convolutional neural networks for polyp segmentation in colonoscopy[C]//Medical Imaging 2017: Computer-Aided Diagnosis. Florida: SPIE, 2017, 10134: 101-107.
[15] Sharma A, Kumar R, Garg P. Deep learning-based prediction model for diagnosing gastrointestinal diseases using endoscopy images[J]. International Journal of Medical Informatics, 2023, 177: 105142.
[16] He Junyan, Wu Xiao, Jiang Yugang, et al. Hookworm detection in wireless capsule endoscopy images with deep learning[J]. IEEE Transactions on Image Processing, 2018, 27(5): 2379-2392.
[17] Chollet F. Xception: Deep learning with depthwise separable convolutions[C]//Proceedings of the Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 1251-1258.
[18] Su Zhuo, Fang Linpu, Kang Wenxiong, et al. Dynamic group convolution for accelerating convolutional neural networks[C]//Proceedings of the European Conference on Computer Vision. Glasgow: Springer, 2020: 138-155.
[19] Sandler M, Howard A, Zhu M, et al. Mobilenetv2: Inverted residuals and linear bottlenecks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 4510-4520.
[20] Howard A, Sandler M, Chu G, et al. Searching for mobilenetv3[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019: 1314-1324.
[21] Ma Ningning, Zhang Xiangyu, Zheng Haitao, et al. Shufflenet v2: Practical guidelines for efficient cnn architecture design[C]//Proceedings of the European Conference on Computer Vision. Munich: Springer, 2018: 116-131.
[22] Han Kai, Wang Yunhe, Tian Qi, et al. Ghostnet: More features from cheap operations[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 1580-1589.
[23] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]//Proceedings of Advances in Neural Information Processing Systems. Long Beach: NIPS, 2017: 5998-6008.
[24] Dosovitskiy A, Beyer L, Kolesnikov A,et al. An image is worth 16x16 words: Transformers for image recognition at scale[EB/OL]. https://arxiv.org/abs/2010.11929, 2024-02-36/2024-10-16.
[25] Liu Ze, Lin Yutong, Cao Yue, et al. Swin transformer: Hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021: 10012-10022.
[26] Chen CFR, Fan Quanfu, Panda R. Crossvit: Cross-attention multi-scale vision transformer for image classification[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021: 357-366.
[27] Mehta S, Rastegari M. Mobilevit: light-weight, general-purpose, and mobile-friendly vision transformer[EB/OL]. https://arxiv.org/abs/2110.02178, 2023-11-13/2024-04-16.
[28] Liu Xinyu, Peng Houwen, Zheng Ningxin, et al. Efficientvit: memory efficient vision transformer with cascaded group attention[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver: IEEE, 2023: 14420-14430.
[29] Wang Wenhai, Xie Enze, Li Xiang, et al. Pyramid vision transformer: a versatile backbone for dense prediction without convolutions[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021: 568-578.
[30] Wu Haiping, Xiao Bin, Codella N, et al. Cvt: Introducing convolutions to vision transformers[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021: 22-31.
[31] Lou Meng, Zhou Hongyu, Yang Sibei, et al. TransXNet: learning both global and local dynamics with a dual dynamic token mixer for visual recognition[EB/OL]. https://arxiv.org/abs/2310.19380, 2023-11-30/2024-03-29.
[32] Lee Y, Kim J, Willette J, et al. Mpvit: Multi-path vision transformer for dense prediction[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, IEEE, 2022: 7287-7296.
[33] Wu Xin, Feng Yue, Xu Hong, et al. CTransCNN: Combining transformer and CNN in multilabel medical image classification[J]. Knowledge-Based Systems, 2023, 281: 111030.
[34] Guo Xiayu, Lin Xian, Yang Xin, et al. UCTNet: Uncertainty-guided CNN-Transformer hybrid networks for medical image segmentation[J]. Pattern Recognition, 2024, 152: 110491.