Knowledge Graph Powered Human Proteome Knowledge Annotation and Knowledge ExplorationStudy
Yuan Yize, Wang Zhigang, Wang Zhe, Shi Furen, Yang Sheng, Yang Xiaolin*
(Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences, School of Basic Medicine, Peking Union Medical College, Beijing 100005,China)
Abstract:Proteome knowledge annotation facilitates the derivation of scientific hypotheses from existing knowledge. However, traditional annotation approaches are often not comprehensive and lack systemic integration, being limited to knowledge retrieval and aggregation. In this paper, a novel method involving knowledge graphs is proposed to integrate biomedical knowledge from 13 biomedical ontologies and databases. The knowledge graph, Biomedical Knowledge Graph (BMKG), was constructed with the graph database Neo4j. Metapaths were designed to create knowledge annotation schemes which incorporated prior knowledge with graph algorithms such as centrality measures. By leveraging similarity calculations, link prediction algorithms, and node2vec graph embedding, knowledge exploration analysis was facilitated. BMKG encompasses 2 508 348 nodes of 9 types and 25 362 594 relationships of 20 types. The BMKG knowledge annotation scheme facilitates diverse perspectives and multi-level annotation, which is demonstrated by its application to renal cell carcinoma tissue proteome data in annotating various biological aspects comprehensively, such as pathways, drugs, and phenotypes. Additionally, BMKG supports knowledge exploration studies, such as drug-disease association prediction, and the clustering of disease knowledge exhibits strong concordance with the Mondo ontology structure. Moreover, an online platform (http://bmkg.bmicc.org) has been established, with three analysis modules: knowledge retrieval, knowledge annotation, and knowledge analysis. Collectively, this study demonstrates the potential of knowledge graph approaches to enhance human proteome knowledge annotation and knowledge exploration.
袁一泽, 王志刚, 王哲, 史涪仁, 杨晟, 杨啸林. 知识图谱驱动的人类蛋白质组知识注释与知识探索研究[J]. 中国生物医学工程学报, 2024, 43(3): 315-326.
Yuan Yize, Wang Zhigang, Wang Zhe, Shi Furen, Yang Sheng, Yang Xiaolin. Knowledge Graph Powered Human Proteome Knowledge Annotation and Knowledge ExplorationStudy. Chinese Journal of Biomedical Engineering, 2024, 43(3): 315-326.
[1] Evangelista JE, Xie Zhuorui, Marino GB, et al. Enrichr-KG: bridging enrichment analysis across multiple libraries [J]. Nucleic Acids Research, 2023, 51(W1): W168-W179. [2] Feng Fan, Tang Feitong, Gao Yijia, et al. GenomicKB: a knowledge graph for the human genome [J]. Nucleic Acids Research, 2023, 51(D1): D950-D956. [3] Santos A, Colaço AR, Nielsen AB, et al. A knowledge graph to interpret clinical proteomics data [J]. Nature Biotechnology, 2022, 40(5): 692-702. [4] Baumgartner WA Jr, Cohen KB, Fox LM, et al. Manual curation is not sufficient for annotation of genomic databases [J]. Bioinformatics, 2007, 23(13): i41-i48. [5] Gene Ontology Consortium, Aleksander SA, Balhoff J, et al. The Gene Ontology knowledgebase in 2023 [J]. Genetics, 2023, 224(1): iyad031. [6] Gillespie M, Jassal B, Stephan R, et al. The reactome pathway knowledgebase 2022 [J]. Nucleic Acids Research, 2022, 50(D1): D687-D692. [7] Szklarczyk D, Gable AL, Nastou KC, et al. The STRING database in 2021: customizable protein-protein networks, and functional characterization of user-uploaded gene/measurement sets [J]. Nucleic Acids Research, 2021, 49(D1): D605-D612. [8] Köhler S, Gargano M, Matentzoglu N, et al. The human phenotype ontology in 2021 [J]. Nucleic Acids Research, 2021, 49(D1): D1207-D1217. [9] Ji Shaoxiong, Pan Shirui, Cambria E, et al. A survey on knowledge graphs: representation, acquisition, and applications [J]. IEEE Trans Neural Netw Learn Syst, 2022, 33(2): 494-514. [10] Nicholson DN, Greene CS. Constructing knowledge graphs and their biomedical applications [J]. Computational and Structural Biotechnology Journal, 2020, 18: 1414-1428. [11] Königs C, Friedrichs M, Dietrich T. The heterogeneous pharmacological medical biochemical network PharMeBINet [J]. Scientific Data, 2022, 9(1): 393. [12] Wang Jie, Wu Min, Huang Xuhui, et al. SynLethDB 2.0: a web-based knowledge graph database on synthetic lethality for novel anticancer drug discovery [J]. Database: The Journal of Biological Databases and Curation, 2022, 2022: baac030. [13] Zheng Shuangjia, Rao Jiahua, Song Ying, et al. PharmKG: a dedicated knowledge graph benchmark for bomedical data mining [J]. Briefings in Bioinformatics, 2021, 22(4): bbaa344. [14] Sadegh S, Skelton J, Anastasi E, et al. Network medicine for disease module identification and drug repurposing with the NeDRex platform [J]. Nature Communications, 2021, 12(1): 6848. [15] Fernández-Torras A, Duran-Frigola M, Bertoni M, et al. Integrating and formatting biomedical data as pre-calculated knowledge graph embeddings in the Bioteque [J]. Nature Communications, 2022, 13(1): 5304. [16] Chandak P, Huang Kexin, Zitnik M, et al. Building a knowledge graph to enable precision medicine [J]. Scientific Data, 2023, 10(1): 67. [17] Li Zongren, Zhong Qin, Yang Jing, et al. DeepKG: an end-to-end deep learning-based workflow for biomedical knowledge graph extraction, optimization and applications [J]. Bioinformatics, 2022, 38(5): 1477-1479. [18] Himmelstein DS, Lizee A, Hessler C, et al. Systematic integration of biomedical knowledge prioritizes drugs for repurposing [J]. ELife, 2017, 6: e26726. [19] Seal RL, Braschi B, Gray K, et al. Genenames.org: the HGNC resources in 2023 [J]. Nucleic Acids Research, 2023, 51(D1): D1003-D1009. [20] UniProt Consortium. UniProt: the Universal Protein Knowledgebase in 2023 [J]. Nucleic Acids Research, 2023, 51(D1): D523-D531. [21] Vasilevsky NA, Matentzoglu NA, Toro S, et al. Mondo: Unifying diseases for the world, by the world [EB/OL]. https://www.medrxiv.org/content/10.1101/2022.04.13.22273750v3, 2022-05-03/2024-03-30. [22] Landrum MJ, Chitipiralla S, Brown GR, et al. ClinVar: improvements to accessing data [J]. Nucleic Acids Research, 2020, 48(D1): D835-D844. [23] Mungall CJ, Torniai C, Gkoutos GV, et al. Uberon, an integrative multi-species anatomy ontology [J]. Genome Biology, 2012, 13(1): R5. [24] Bastian FB, Roux J, Niknejad A, et al. The Bgee suite: integrated curated expression atlas and comparative transcriptomics in animals [J]. Nucleic Acids Research, 2021, 49(D1): D831-D847. [25] Davis AP, Wiegers TC, Johnson RJ, et al. Comparative Toxicogenomics Database (CTD): update 2023 [J]. Nucleic Acids Research, 2023, 51(D1): D1257-D1262. [26] Sjöstedt E, Zhong Wen, Fagerberg L, et al. An atlas of the protein-coding genes in the human, pig, and mouse brain [J]. Science, 2020, 367(6482): eaay5947. [27] Wishart DS, Bartok B, Oler E, et al. MarkerDB: an online database of molecular biomarkers [J]. Nucleic Acids Research, 2021, 49(D1): D1259-D1267. [28] Barabási AL, Oltvai ZN. Network biology: understanding the cell′s functional organization [J]. Nature Reviews Genetics, 2004, 5(2): 101-113. [29] Gosak M, Markovič R, Dolenšek J, Network science of biological systems at different scales: a review [J]. Physics of Life Reviews, 2018, 24: 118-135. [30] Hu Sai, Luo Yingchun, Zhang Zzhihong, et al. Protein function annotation based on heterogeneous biological networks [J]. BMC Bioinformatics, 2022, 23(1): 493. [31] Kanehisa M, Sato Y, Kawashima M, et al. KEGG as a reference resource for gene and protein annotation [J]. Nucleic Acids Research, 2016, 44(D1): D457-D62. [32] Barabási AL, Gulbahce N, Loscalzo J. Network medicine: a network-based approach to human disease [J]. Nature Reviews Genetics, 2010, 12(1): 56-68. [33] Gleich DF. PageRank beyond the web [EB/DJ]. https://arxiv.org/abs/1407.5107, 2014-07-18/2024-03-30. [34] Yuxiao Dong, Chawla NV, Swami A. Metapath2vec: scalable representation learning for heterogeneous networks [C] // Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2017: 135-144. [35] Sun Yizhou, Han Jiawei, Yan Xifeng, et al. Pathsim: meta path-based top-k similarity search in heterogeneous information networks [C] // Proceedings of the VLDB Endowment. Seattle: ACM, 2011, 4(11): 992-1003. [36] Zeng Xiangxiang, Tu Xinqi, Liu Yuansheng, et al. Toward better drug discovery with knowledge graph [J]. Current Opinion in Structural Biology, 2022, 72: 114-126. [37] Mohamed SK, Nounu A,Nováček V. Biological applications of knowledge graph embedding models [J]. Briefings in Bioinformatics, 2021, 22(2): 1679-1693. [38] Kumar A, Singh SS, Singh K, et al. Link prediction techniques, applications, and performance: a survey [J]. Physica A: Statistical Mechanics and its Applications, 2020, 553: 124289. [39] SU Chang, Tong Jie, Zhu Yongjun, et al. Network embedding in biomedical data science [J]. Briefings in Bioinformatics, 2020, 21(1): 182-197. [40] Grover A, Leskovec J. Node2vec: scalable feature learning for networks [C] // Proceedings International Conference on Knowledge Discovery & Data Mining. New York: ACM, 2016: 855-864. [41] Liu Xxiaoyan, Zhang Mingxin, Shao Chen, et al. Blood- and urine-based liquid biopsy for early-stage cancer investigation: taken clear renal cell carcinoma as a model [J]. Molecular & Cellular Proteomics : MCP, 2023, 22(8): 100603. [42] Barata PC, Rini BI. Treatment of renal cell carcinoma: current status and future directions [J]. CA: A Cancer Journal for Clinicians, 2017, 67(6): 507-524. [43] Higashi S, Sasaki T, Uchida K, et al. Succinate dehydrogenase B-deficient renal cell carcinoma with a germline variant in a Japanese patient: a case report [J]. Human Genome Variation, 2022, 9(1): 25. [44] Coronaviridae Study Group of the International Committee on Taxonomy of Viruses. The species severe acute respiratory syndrome-related coronavirus: classifying 2019-nCoV and naming it SARS-CoV-2 [J]. Nature Microbiology, 2020, 5(4): 536-544. [45] Pushpakom S, Iorio F, Eyers PA, et al. Drug repurposing: progress, challenges and recommendations [J]. Nature Reviews Drug Discovery, 2018, 18(1): 41-58. [46] Hua Yi, Dai Xiaowen, Xu Yuan, et al. Drug repositioning: progress and challenges in drug discovery for various diseases [J]. European Journal of Medicinal Chemistry, 2022, 234:114239.