Head Pose Estimation of Patients with Monocular Vision for Surgery Robot Based on Deep Learning
Feng Pengfei1&, Li Liang2&#, Ding Hui1, Wang Guangzhi1#*
1(Department of Biomedical Engineering, School of Medicine, Tsinghua University, Beijing 100084, China) 2(School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing 211100, China)
Abstract:Patient head pose estimationis one of the key technologies for autonomous and intelligent perception of neurosurgery robots. This paper aimed to use data-driven deep learning method to help the neurosurgery robot to estimate the patient′s head posture, laying the foundation for the intelligence of neurosurgery. This paper firstly established the basic mathematical relationship of the patient head pose estimation task. Next, an efficient and robust head pose labeling method was proposed to solve the problem of 2D head image pose labeling in the absence of facial features. After that, by collecting the neurosurgery scene photos from the perspective of the robot, a patient head pose estimation dataset containing a total of 79 surgical scenes and a total of 4 301 photos was constructed. Finally, the applicability of the HopeNet deep neural network in the patient head pose estimation problem was studied, and methods including cropping, rotation data augmentation, and our newly proposed rotation rate loss function improved the model performance. For the network training and evaluation, on the homologous test set 1 containing 10 surgical scenes and 386 pictures, the pose estimation based on a single perspective could reach an average of ±12.76°in three directions including yaw angle, roll angle, and pitch angle; on the heterogeneous test set 2 of 8 surgical scenes and 229 photos, the average prediction error of ±13.41° could be achieved in the three directions. The results showed that the proposed model could accurately estimate the patient′s head pose, and the proposed optimization methods could effectively improve the accuracy of the algorithm and improve the generalization performance of the model.
冯朋飞, 李亮, 丁辉, 王广志. 基于深度学习的手术机器人单目视觉患者头部姿态估计[J]. 中国生物医学工程学报, 2022, 41(5): 537-546.
Feng Pengfei&,Li Liang&#,Ding Hui,Wang Guangzhi#*. Head Pose Estimation of Patients with Monocular Vision for Surgery Robot Based on Deep Learning. Chinese Journal of Biomedical Engineering, 2022, 41(5): 537-546.
[1] Faria C, Erlhagen W, Rito M, et al. Review of robotic technology for stereotactic neurosurgery[J]. IEEE Reviews in Biomedical Engineering, 2015, 8(1):125-137. [2] Smith J A, Jivraj J, Wong R, et al. 30 years of neurosurgical robots: review and trends for manipulators and associated navigational systems[J].Annals of Biomedical Engineering, 2016, 44(1):836-846. [3] Zeng Bowei, Meng Fanle, Ding Hui, et al. A surgical robot with augmented reality visualization for stereoelectroencephalography electrode implantation [J].International Journal of Computer Assisted Radiology and Surgery, 2017, 12(1):1355-1368. [4] Meng Fanle, Zhai Fangwen, Zeng Bowei, et al. An automatic markerless registration method for neurosurgical robotics based on an optical camera [J].International Journal of Computer Assisted Radiology & Surgery, 2018, 13(2):253-265. [5] Lathuiliere S, Mesejo P, Alameda-Pineda X, et al. A Comprehensive analysis of deep regression [J].IEEE Trans Pattern Anal Mach Intell, 2020, 42(9):2065-2081. [6] Yuan Hui, Li Mengyu, Hou Junhui, et al. Single image-based head pose estimation with spherical parametrization and 3D morphing [J].Pattern Recognit, 2020, 103(1):103-107. [7] Ranjan R, Patel VM, Chellappa R. HyperFace: a deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 41(1):121-135. [8] Bulat A, Tzimiropoulos G. How far are we from solving the 2D & 3D face alignment problem? (and a dataset of 230, 000 3D facial landmarks) [C]//Proceedings of the IEEE - International Conference on Computer Vision. Venice: IEEE. 2017: 1021-1030. [9] Zhang Yi, Fu Keren, Wang Jiang, et al. Learning from discrete Gaussian label distribution and spatial channel-aware residual attention for head pose estimation[J].Neurocomputing, 2020, 407(24):259-269. [10] Vo MT, Nguyen T, Le T. Robust head pose estimation using extreme gradient boosting machine on stacked autoencoders neural network[J].IEEE Access, 2020, 8(1):3687-3694. [11] Yang Tsuny, I Chen Yiting, Lin Yenyu, et al. FSA-Net: Learning fine-grained structure aggregation for head pose estimation from a single image [C]// Bryan Morse, eds. Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach: IEEE, 2020: 1087-1896. [12] Ruiz N, Chong E, Rehg JM, et al. Fine-grained head pose estimation without keypoints [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition workshops. Salt Lake City: IEEE, 2018: 2074-2083. [13] Madrigal F, Lerasle F. Robust head pose estimation based on key frames for human-machine interaction [J].EURASIP Journal on Image and Video Processing, 2020, 2020(1):1-19. [14] Borghi G, Venturelli M, Vezzani R, et al. Poseidon: Face-from-depth for driver pose estimation [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Hawaii: IEEE, 2017: 4661-4670. [15] Fanelli G, Dantone M, Gall J, et al.Random forests for real time 3D face analysis[J]. International Journal of Computer Vision, 2013, 101(3):437-458. [16] Li Liang, Feng Pengfei, et al. A preliminary exploration to make stereotactic surgery robots aware of the semantic 2D/3D working scene [J]. IEEE Transactions on Medical Robotics and Bionics, 2022, 4(1):17-27. [17] Wu Changchang, Towards linear-time incremental structure from motion [C]// 2013 International Conference on 3D Vision. Seattle: IEEE, 2013: 127-134. [18] Wu Changchang, Agarwal S, Curless B, et al. Multicore bundle adjustment [C]// 2011 IEEE Conference on Computer Vision and Pattern Recognition. New York; IEEE, 2011: 3057-3064. [19] Furukawa Y, Ponce J. Accurate, dense, and robust multiview stereopsis[J]. IEEE Trans Pattern Anal Mach Intell, 2010, 32(8):1362-1376. [20] He Kaiming, Zhang Xiangyu, Ren Shaoping, et al. Deep residual learning for image recognition[C]// Processings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas: IEEE,2016:770-778. [21] Liu zhanxiang, Chen Zezhou, Bai Jinqiang, et al. Facial pose estimation by deep learning from label distributions [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops. Los Alamitos:IEEE Computer Soc, 2019: 1232-1240. [22] Haque A, Milstein A, Li Feifei. Illuminating the dark spaces of healthcare with ambient intelligence[J]. Nature, 2020, 585(7824): 193-202. [23] Yeung S, Rinaldo F, Jopling J, et al. A computer vision system for deep learning-based detection of patient mobilization activities in the ICU[J]. NPJ Digital Medicine, 2019, 2(1): 1-5. [24] Li Zhaoshou, Shaban A, Simard J G, et al. A robotic 3D perception system for operating room environment awareness[C]//The 11th International Conference on Information Processing in Computer-Assisted Interventions. Munich: IEEE, 2020:316-328.