蛋白质数据库对蛋白质组鉴定的影响

doi:10.3969/j.issn.0258-8021.2013.02.001

摘要
图/表
参考文献
相关文章 (0)

全文: PDF (416 KB) HTML (1 KB)
输出: BibTeX | EndNote (RIS)

摘要在蛋白质组学研究中，通常使用数据库检索算法进行蛋白质的鉴定。使用完整性较高但注释不准确的数据库，可能能够鉴定到更多的蛋白质，但存在数据不准确的风险；使用注释准确但完整性较低的数据库，则有可能漏掉一些数据库中未收录的蛋白。如何兼顾蛋白质鉴定结果的完整性和准确性是一个重要的问题。本研究以人类蛋白质组为例，采用不同质谱仪及不同样品产生的蛋白质组数据，比较了常用的IPI数据库、UniProt数据库和Swiss-Prot数据库的检索结果。结果表明，3个数据库在不同的蛋白质组数据中表现各有优劣，但总体来讲差异很小；每个数据库可鉴定到的、特有的多肽数不超过总数的5%，蛋白数的差异为1%～5%。说明3个数据库都覆盖了常见的人类蛋白序列，完整性很高。因此，推荐采用通过人工注释、在不断更新中的Swiss-Prot数据库作为检索对象。当研究目的为鉴定或定量未收录在Swiss-Prot数据库中的蛋白序列（如一些特殊的蛋白异构体或突变体）时，可将目的序列加入该数据库进行检索，或考虑使用其他完整性更高的数据库。

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	邵晨孙伟^*

关键词 ：蛋白质数据库, 蛋白质组学, 数据库检索

Abstract：Database searching is a common strategy to identify proteins in current proteomic studies. In this strategy, searching against a highly comprehensive database might produce more protein identifications, but have the risk of incorrect database annotations. In contrast, using a more accurate database might loss some correct protein identifications that are not included in the database due to less database completeness. Achieving both completeness and accuracy in protein identification is an important problem. Taking human proteomic study as an example, this study compared database searching results of three commonly used protein databases (IPI database, UniProt database and Swiss-Prot database) on three proteomic datasets that were obtained from different biological samples and mass spectrometers. In general, although these databases performed differently on various proteomic data, the differences among them were not significant. For each database, no more than 5% of the total peptide identifications were not identified by the other two databases, while the differences of protein identifications ranged from 1% to 5%. This result indicates that all of the databases are with high completeness by covering most of the commonly identified proteins in human samples. Therefore, we recommend using Swiss-Prot database, a manually curated and continuously updated database, for routine human proteomic analysis. In addition, if the aim of a study to identify or quantify some special sequences that are not included in SwissProt database, such as protein isoforms or mutations, researchers can add the target protein sequences to Swiss-Prot database, or use a more complete database instead

Key words： protein database proteomics database searching

基金资助:国家自然科学基金青年基金项目（31200614）

引用本文:

邵晨孙伟^*. 蛋白质数据库对蛋白质组鉴定的影响[J]. 中国生物医学工程学报, 2013, 32(2): 129-134.
SHAO Chen SUN Wei^*. Influence of Protein Databases in Proteomic Identification. journal1, 2013, 32(2): 129-134.

链接本文:

http://cjbme.csbme.org/CN/10.3969/j.issn.0258-8021.2013.02.001 或 http://cjbme.csbme.org/CN/Y2013/V32/I2/129

［1］Eng JK, Searle BC, Clauser KR, et al. A face in the crowd: recognizing peptides through database search ［J］. Mol Cell Proteomics, 2011, 10(11):R111 009522.
［2］Kersey PJ, Duarte J, Williams A, et al. The International Protein Index: an integrated database for proteomics experiments ［J］. Proteomics, 2004, 4(7):1985-1988.
［3］UniProt Consortium. Reorganizing the protein space at the Universal Protein Resource (UniProt) ［J］. Nucleic Acids Res, 2012, 40(Database Issue):D71-D75.
［4］Nakamura Y, Cochrane G, KarschMizrachi I. The international nucleotide sequence database collaboration ［J］. Nucleic Acids Res, 2013, 41(D1):D21-D24.
［5］Flicek P, Amode MR, Barrell D, et al. Ensembl 2012 ［J］. Nucleic Acids Res, 2012, 40(Database Issue):D84-D90.
［6］Pruitt KD, Tatusova T, Brown GR, et al. NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy ［J］. Nucleic Acids Res, 2012, 40(Database Issue):D130-135.
［7］Nesvizhskii AI. A survey of computational methods and error rate estimation procedures for peptide and protein identification in shotgun proteomics ［J］. J Proteomics, 2010,73(11):2092-2123.
［8］Perkins DN, Pappin DJ, Creasy DM, et al. Probabilitybased protein identification by searching sequence databases using mass spectrometry data ［J］. Electrophoresis, 1999, 20(18):3551-3567.
［9］Nesvizhskii AI, Aebersold R. Interpretation of shotgun proteomic data: the protein inference problem ［J］. Mol Cell Proteomics, 2005, 4(10):1419-1440.
［10］Liu Xuejiao, Shao Chen, Wei Lilong, et al. An individual urinary proteome analysis in normal human beings to define the minimal sample number to represent the normal urinary proteome ［J］. Proteome Sci, 2012, 10(1):70.
［11］Elias JE, Gygi SP. Targetdecoy search strategy for increased confidence in largescale protein identifications by mass spectrometry ［J］. Nat Methods, 2007, 4(3):207-214.