Abstract:Protein sequence feature and machine learning algorithm are two important aspects to determine the results of protein structural class prediction. In this study, we established 17-D and 57-D feature information sets through fusing the sequence information, physical and chemical information with the secondary structure information based on the kword statistical frequency and the kfragment distribution feature extraction method. By introducing MultiAgent's idea into Adaboost.M1 algorithm, a novel method for protein structural class prediction, called MaAda multiclassifier fusion algorithm, was proposed, which fully utilized the information of the single classifier metric layer and the fusion of information among individual classifiers. Four protein datasets including Z277, Z498, 1189, D640 were used to validate the performance of the Ma-Ada algorithm. Classification accuracies are 91.3 %, 96.8 %, 85.3% and 87.2 % with 57-D features, and 90.6 % , 95.8 %, 84.8 % and 88.3 % with 17 D features on datasets Z277, Z498, 1189 and D640, respectively. The experimental results show better.
郑斌 厉力华*. 基于多特征信息及Ma-Ada 多分类器融合的蛋白质结构类预测[J]. 中国生物医学工程学报, 2013, 32(5): 580-587.
ZHENG Bin LI Li Hua*. Protein Structural Class Prediction Based on Multi-Feature and Ma-Ada Multi-Classifier Fusion. journal1, 2013, 32(5): 580-587.
[1]Levitt M, Chothia C. Structural patterns in globular proteins[J]. Nature, 1976, 261(5561): 552-558.
[2]Nakashima H, Nishikawa K. Discrimination of intracellular and extracellular proteins using amino acid composition and residuepair frequencies[J]. Journal of Molecular Biology, 1994, 238(1): 54-61.
[3]Bu Weishu, Feng Zhiping, Zhang Ziding, et al. Prediction of protein (domain) structural classes based on aminoacid index[J]. European Journal of Biochemistry, 1999, 266(3): 1043-1049.
[4]Chou Kuochen. Prediction of protein cellular attributes using pseudoamino acid composition [J]. Proteins: Structure, Function, and Bioinformatics, 2001, 43(3): 246-255.
[5]Ding CHQ, Dubchak I. Multiclass protein fold recognition using support vector machines and neural networks[J]. Bioinformatics, 2001, 17(4): 349-358.
[6]Liu Taigang, Zheng Xiaoqi, Wang Jun. Prediction of protein structural class using a complexitybased distance measure[J]. Amino Acids, 2010, 38(3): 721-728.
[7]Wu Li, Dai Qi, Han Bin, et al. Prediction of protein structural class using a combined representation of proteinsquence information and support vector machine[C]//Bioinformatics and Biomedicine Workshops (BIBMW). HongKong: IEEE, 2010:101-106.
[8]Cai YD, Feng KY, Lu WC, et al. Using logitboost classifier to predict protein structural classes[J]. Journal of Theoretical Biology[J].2006, 238(1): 172-176.
[9]Feng KY, Cai YD, Chou KC. Boosting classifier for predicting protein domain structural class[J]. Biochemical and Biophysical Research Communications, 2005, 334(1): 213-217.
[10]Dai Qi, Wu Li, Li Lihua. Improving protein structural class prediction using novel combined sequence information and predicted secondary structural features[J]. Journal of Computational Chemistry, 2011,32(16): 3393-3398.
[11]Freund Y, Schapire RE. Experiment with a new boosting algorithm[C]//Machine LearningInternational Workshop. Morgan: Kaufmann Publishers, 1996:148-156.
[12]彭芳青,厉力华,徐伟栋,等. 基于MultiAgent的乳腺钼靶图像肿块分类方法[J]. 传感技术学报, 2010, 23(2): 153-157.
[13]赵浣萍,徐伟栋,厉力华,等. 一种基于改进型MultiAgent多分类器融合的乳腺钼靶肿块分类算法[J]. 仪器仪表学报, 2011, 32(9): 2034-2040.
[14]Liu Tian, Jia Cangzhi. A highaccuracy protein structural class prediction algorithm using predicted secondary structural information[J]. Journal of Theoretical Biology, 2010, 267(3): 272-275.
[15]武勃,黄畅,艾海舟,等. 基于连续Adaboost算法的多视角人脸检测[J]. 计算机研究与发展, 2005, 42(9): 1612-1621.
[16]江林升,朱学芳. 一种基于新特征的车牌检测方法[J]. 计算机工程与应用, 2011, 47(20): 188-190.
[17]寇忠宝,张长水. 基于MultiAgent 的分类器融合[J]. 计算机学报, 2003, 26(2): 174-179.
[18]Kurgan LA, Homaeian L. Prediction of structural classes for protein sequences and domains—Impact of prediction algorithms,sequence representation and homology, and test procedures on accuracy[J]. Pattern Recognition, 2006,39(12): 2323-2343.
[19]Chen Ke, Kurgan LA, Ruan Jishou. Prediction of protein structural class using novel evolutionary collocationbased sequence representation[J]. Journal of computional chemistry, 2008, 29(10):1596-1604.
[20]Zhou Guoping. An intriguing controversy over protein structural class prediction[J]. Journal of Protein Chemistry, 1998, 17(8): 729-738.
[21]Vapnik VN. An overview of statistical learning theory[J]. IEEE Transactions on Neural Networks, 1999, 10(5): 988-999.
[12]Sun XD, Huang RB. Prediction of protein structural classes using support vector machines[J]. Amino Acids, 2006, 30(4): 469-475.
[23]Cai YD, Liu XJ, Xu X, et al. Support vector machines for predicting protein structural class[J]. BMC Bioinformatics, 2001, 2(1): 1-5.
[24]Cao Youfang, Liu Shi, Zhang Lida, et al. Prediction of protein structural class with rough sets[J]. BMC bioinformatics. BMC Bioinformatics, 2006, 7(1): 7-20.
[25]Yang Jianyi, Peng Zhenling, Chen Xin. Prediction of protein structural classes for lowhomology sequences based on predicted secondary structure[J]. BMC Bioinformatics, 2010, 11(Suppl 1):S9.