基于深度学习的手术机器人单目视觉患者头部姿态估计

doi:10.3969/j.issn.0258-8021.2022.05.003

摘要
图/表
参考文献
相关文章 (1)

全文: PDF (7136 KB) HTML (1 KB)
输出: BibTeX | EndNote (RIS)

摘要患者头部姿态估计技术是神经外科手术机器人自主化与智能化感知的关键技术之一。本研究利用基于数据驱动的深度学习方法帮助神经外科手术机器人估算患者头部姿态,为神经外科手术的智能化打下基础。首先, 建立患者头部姿态估计任务的基本数学关系;其次,给出一种高效、鲁棒的头部姿态标注方法,解决面部特征缺失情况下的2D头部图像姿态标注问题;之后,通过采集机器人视角下神经外科手术场景照片,构建共包含79个手术场景、合计4 301张照片的患者头部姿态估计数据集;最后,研究HopeNet深度神经网络在患者头部姿态估计问题上的适用性,并通过分布式标签、头部区域自适应裁剪、旋转数据增强以及新提出的旋转速率损失函数等方法提高模型性能。进行网络训练和评价,在包含10个手术场景、386张照片的同源测试集1上,基于单视角的姿态估计在偏航角、滚动角、俯仰角等3个方向平均可以达到±12.76°的预测误差;在8个手术场景、229张照片的异源测试集2上,在上述3个方向平均可以达到±13.41°的预测误差。结果表明,提出的模型能准确估计患者头部姿态,且提出的优化方法可以有效提升算法精度,并提高模型的泛化能力。

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章

关键词 ：手术机器人, 头部姿态估计, 数据标注

Abstract：Patient head pose estimationis one of the key technologies for autonomous and intelligent perception of neurosurgery robots. This paper aimed to use data-driven deep learning method to help the neurosurgery robot to estimate the patient′s head posture, laying the foundation for the intelligence of neurosurgery. This paper firstly established the basic mathematical relationship of the patient head pose estimation task. Next, an efficient and robust head pose labeling method was proposed to solve the problem of 2D head image pose labeling in the absence of facial features. After that, by collecting the neurosurgery scene photos from the perspective of the robot, a patient head pose estimation dataset containing a total of 79 surgical scenes and a total of 4 301 photos was constructed. Finally, the applicability of the HopeNet deep neural network in the patient head pose estimation problem was studied, and methods including cropping, rotation data augmentation, and our newly proposed rotation rate loss function improved the model performance. For the network training and evaluation, on the homologous test set 1 containing 10 surgical scenes and 386 pictures, the pose estimation based on a single perspective could reach an average of ±12.76°in three directions including yaw angle, roll angle, and pitch angle; on the heterogeneous test set 2 of 8 surgical scenes and 229 photos, the average prediction error of ±13.41° could be achieved in the three directions. The results showed that the proposed model could accurately estimate the patient′s head pose, and the proposed optimization methods could effectively improve the accuracy of the algorithm and improve the generalization performance of the model.

Key words： surgery robot head pose estimation data annotation

收稿日期: 2022-03-15

PACS:

R318

基金资助:国家自然科学基金重点项目(U20A20389);北京科技计划(Z191100007619036);清华大学自主科研(20197010009)

通讯作者: * E-mail: wgz-dea@tsinghua.edu.cn

作者简介: ^&共同第一作者
^#中国生物医学工程学会会员

引用本文:

冯朋飞, 李亮, 丁辉, 王广志. 基于深度学习的手术机器人单目视觉患者头部姿态估计[J]. 中国生物医学工程学报, 2022, 41(5): 537-546.
Feng Pengfei&,Li Liang&#,Ding Hui,Wang Guangzhi#*. Head Pose Estimation of Patients with Monocular Vision for Surgery Robot Based on Deep Learning. Chinese Journal of Biomedical Engineering, 2022, 41(5): 537-546.

链接本文:

http://cjbme.csbme.org/CN/10.3969/j.issn.0258-8021.2022.05.003 或 http://cjbme.csbme.org/CN/Y2022/V41/I5/537

[1] Faria C, Erlhagen W, Rito M, et al. Review of robotic technology for stereotactic neurosurgery[J]. IEEE Reviews in Biomedical Engineering, 2015, 8(1):125-137.
[2] Smith J A, Jivraj J, Wong R, et al. 30 years of neurosurgical robots: review and trends for manipulators and associated navigational systems[J].Annals of Biomedical Engineering, 2016, 44(1):836-846.
[3] Zeng Bowei, Meng Fanle, Ding Hui, et al. A surgical robot with augmented reality visualization for stereoelectroencephalography electrode implantation [J].International Journal of Computer Assisted Radiology and Surgery, 2017, 12(1):1355-1368.
[4] Meng Fanle, Zhai Fangwen, Zeng Bowei, et al. An automatic markerless registration method for neurosurgical robotics based on an optical camera [J].International Journal of Computer Assisted Radiology & Surgery, 2018, 13(2):253-265.
[5] Lathuiliere S, Mesejo P, Alameda-Pineda X, et al. A Comprehensive analysis of deep regression [J].IEEE Trans Pattern Anal Mach Intell, 2020, 42(9):2065-2081.
[6] Yuan Hui, Li Mengyu, Hou Junhui, et al. Single image-based head pose estimation with spherical parametrization and 3D morphing [J].Pattern Recognit, 2020, 103(1):103-107.
[7] Ranjan R, Patel VM, Chellappa R. HyperFace: a deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 41(1):121-135.
[8] Bulat A, Tzimiropoulos G. How far are we from solving the 2D & 3D face alignment problem? (and a dataset of 230, 000 3D facial landmarks) [C]//Proceedings of the IEEE - International Conference on Computer Vision. Venice: IEEE. 2017: 1021-1030.
[9] Zhang Yi, Fu Keren, Wang Jiang, et al. Learning from discrete Gaussian label distribution and spatial channel-aware residual attention for head pose estimation[J].Neurocomputing, 2020, 407(24):259-269.
[10] Vo MT, Nguyen T, Le T. Robust head pose estimation using extreme gradient boosting machine on stacked autoencoders neural network[J].IEEE Access, 2020, 8(1):3687-3694.
[11] Yang Tsuny, I Chen Yiting, Lin Yenyu, et al. FSA-Net: Learning fine-grained structure aggregation for head pose estimation from a single image [C]// Bryan Morse, eds. Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach: IEEE, 2020: 1087-1896.
[12] Ruiz N, Chong E, Rehg JM, et al. Fine-grained head pose estimation without keypoints [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition workshops. Salt Lake City: IEEE, 2018: 2074-2083.
[13] Madrigal F, Lerasle F. Robust head pose estimation based on key frames for human-machine interaction [J].EURASIP Journal on Image and Video Processing, 2020, 2020(1):1-19.
[14] Borghi G, Venturelli M, Vezzani R, et al. Poseidon: Face-from-depth for driver pose estimation [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Hawaii: IEEE, 2017: 4661-4670.
[15] Fanelli G, Dantone M, Gall J, et al.Random forests for real time 3D face analysis[J]. International Journal of Computer Vision, 2013, 101(3):437-458.
[16] Li Liang, Feng Pengfei, et al. A preliminary exploration to make stereotactic surgery robots aware of the semantic 2D/3D working scene [J]. IEEE Transactions on Medical Robotics and Bionics, 2022, 4(1):17-27.
[17] Wu Changchang, Towards linear-time incremental structure from motion [C]// 2013 International Conference on 3D Vision. Seattle: IEEE, 2013: 127-134.
[18] Wu Changchang, Agarwal S, Curless B, et al. Multicore bundle adjustment [C]// 2011 IEEE Conference on Computer Vision and Pattern Recognition. New York; IEEE, 2011: 3057-3064.
[19] Furukawa Y, Ponce J. Accurate, dense, and robust multiview stereopsis[J]. IEEE Trans Pattern Anal Mach Intell, 2010, 32(8):1362-1376.
[20] He Kaiming, Zhang Xiangyu, Ren Shaoping, et al. Deep residual learning for image recognition[C]// Processings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas: IEEE,2016:770-778.
[21] Liu zhanxiang, Chen Zezhou, Bai Jinqiang, et al. Facial pose estimation by deep learning from label distributions [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops. Los Alamitos:IEEE Computer Soc, 2019: 1232-1240.
[22] Haque A, Milstein A, Li Feifei. Illuminating the dark spaces of healthcare with ambient intelligence[J]. Nature, 2020, 585(7824): 193-202.
[23] Yeung S, Rinaldo F, Jopling J, et al. A computer vision system for deep learning-based detection of patient mobilization activities in the ICU[J]. NPJ Digital Medicine, 2019, 2(1): 1-5.
[24] Li Zhaoshou, Shaban A, Simard J G, et al. A robotic 3D perception system for operating room environment awareness[C]//The 11th International Conference on Information Processing in Computer-Assisted Interventions. Munich: IEEE, 2020:316-328.