基于支持向量机的蛋白质分类研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着人类基因组计划的顺利进展,越来越多的蛋白质序列被测定出来;而通过实验确定其结构与功能的蛋白质序列则相对较少,且两者之间的差距有迅速扩大的趋势。由于通过实验确定蛋白质的结构和功能费时、费力、费财,且实验中可能还会遇到一些目前无法解决的困难,因此探索利用理论及计算方法来研究蛋白质结构和功能具有重要意义。本文从蛋白质的一级序列出发,研究了蛋白质的结构、功能分类预测,其主要贡献如下:
     1.提出一种新的组合分类思想,即将氨基酸组成成分、自相关函数二种特征提取法与支持向量机恰当组合,首次对蛋白质同源二聚体和非同源二聚体进行分类研究,并与国际上现有的Garian方法进行了对比。在10CV检验下,本文方法的分类总精度比Garian方法最大可提高17.1个百分点。
     2.提出二种新的特征提取法,并引入以前已有的二种特征提取法,与支持向量机和不同的分类策略,进行恰当的组合构成分类系统,首次对蛋白质同源二聚体、同源三聚体、同源四聚体和同源六聚体进行分类研究。结果表明整合了氨基酸残基序列顺序信息的三种特征提取法,其分类能力均好于氨基酸组成成分特征提取法,尤以我们提出的加权自相关函数特征提取法的分类效果最好,其分类总精度可比氨基酸组成成分特征提取法最大可提高6.39个百分点,比Chou的特征提取法提高2.41个百分点;采用“一对一”策略的分类能力明显优于“一对多”策略,其分类总精度最大可提高17.69个百分点。
     3.一种新的组合分类方法,即将自相关函数特征提取法和支持向量机、以及本文提出的“改进的唯一的一对多”分类策略恰当组合,应用于蛋白质折叠子分类研究。结果显示:对于独立测试样本,自相关函数特征提取法的分类总精度比氨基酸组成成分特征提取法,大约可提高7个百分点;“改进的唯一的一对多”分类策略优于“一对多”策略,其独立测试和5CV检验的分类总精度,比“一对多”策略最大可分别提高约18和12个百分点。
     4.引入加权思想,以一种新的特征提取法—加权自相关函数,表示蛋白质序列,并采用“一对多”、“一对一”分类策略对膜蛋白和亚细胞定位进行了分类和预测研究,结果有明显改善:
     1).对于膜蛋白分类,在采用支持向量机算法及“一对多”分类策略下,加权自相关函数特征提取法的分类总精度为87.98%,比氨基酸组成成分特征提取
With the success of human genome project, the protein sequences entering into the data banks are rapidly increasing. The structures and functions of these proteins may be determined by means of experiments, but it is very time-consuming and almost impossible. Thus the scientists have being sought after the theoretical or computational methods for predicting the structures and functions of proteins. Several methods of classifying or predicting protein structures and functions based on the protein primary sequences are investigated in this dissertation. The main contributions are summarized as follows:1. A new idea of composite classification is raised, that is the support vector machine (SVM) algorithm is combined felicitously with two feature extraction methods of amino acid composition and the auto-correlation functions based on the amino acid index, to classify the homodimers and non-homodimers from the protein primary sequences. Compared with previous Garian's investigation, the total classifying accuracy of our method is 17.1 percentage points higher than that of Garian's method in 10CV test.2. Two new feature extraction methods are put forward by this dissertation, and two previous feature extraction methods are also introduced. Then these four feature extraction methods are combined felicitously with SVM and two classifying strategies to investigate the classification of homodimers, homotrimers, homotetramers and homohexamers from the protein primary sequences. The simulation results show that the performances of three feature extraction methods by incorporating the information of sequence order are higher than that of the conventional amino acid composition method. Among them, our weighted auto-correlation function method is the best one. Its total accuracy is 6.39 and 2.41 percentage points higher than that of amino acid composition and Chou's feature extraction methods respectively. The classification performance of using 'one-versus-one' strategy is superior to the 'one-versus-rest' strategy, and the total accuracy is 17.69 percentage points higher than that of 'one-versus-rest' strategy.3. A new method of composite classification, it is that the feature extraction method of auto-correlation function is combined felicitously with SVM and the strategy of 'improved unique one-versus-rest', to classify 27 class folds. The results show that the total classification accuracy of auto-correlation function method is about 7 percentage points higher than that of amino acid composition in independent test. The results of using 'improved unique one-versus-rest' strategy are superior to 'one-versus-rest' strategy, and the total accuracies of independent test and 5CV test are about 18, 12 percentage points higher than that of using 'one-versus-rest' strategy respectively.4. The weighted idea is introduced in this dissertation to form a new feature extraction method, that is, the weighted auto-correlation function method, to represent the protein sequences. And two classification strategies ('one-versus-rest' and 'one-versus-one') are also used to classify
    the membrane proteins, and to predict the protein subcellular locations. The results are significantly improved:1) For membrane protein, the total accuracy of our new feature extraction method is 87.98% in jackknife test, which is 3.38 percentage points higher than that of amino acid composition with the same 'one-versus-rest' strategy and SVM; the total accuracy of one-versus-one' strategy may be up to 94.88% in jackknife test, which is 6.9 percentage points higher than that of "one-versus-rest" strategy.2) For protein subcellular location, the total predictive accuracies of prokaryotic subcellular location and eukaryotic subcellular location are 92.38% and 95.22% respectively in jackknife test, and the total predictive accuracy of eukaryotic subcellular location is far higher than that of Hua's result 79.4%. The total predictive accuracy of eukaryotic protein with 'one-versus-one' strategy is 12.19 percentage points higher than that of 'one-versus-rest' strategy in jackknife test. The total predictive accuracy of eukaryotic protein with the new feature extraction method is 2.96 percentage points higher than that of amino acid composition feature extraction method in jackknife test.5. In the end, the kernel functions and their parameters are simply discussed.
引文
[1] Tao Jiang, Ying Xu and Michael Q. Zhang ed., Current Topics in Computational Molecular Biology, Tsinghua University Press, The MIT press, 2002.
    [2] 陈润生.生物信息学,生物物理学报,1999,1:5-11.
    [3] 孙啸.生物信息学—揭示生物分子数据的内涵,电子科技导报,1998,11:10-16.
    [4] Gilbert, W. Towards a paradigm shift in biology. Nature, 1991, 349: 99.
    [5] 郝柏林,刘寄星主编.理论物理与生命科学.上海:上海科学技术出版社,1999.
    [6] Shapiro, L. and Lima, C. D. The Argonne Structural Genomics Workshop: Lamaze class for the birth of a new science. Structure. 1998, 6(3):265-267.
    [7] Clore, G. M. and Gronenborn, A. M. Two-, three-, and four-dimensional NMR methods for obtaining larger and more precise three-dimensional structures of proteins in solution. Annu. Rev. Biophys. Biochem. , 1991, 20:29-63.
    [8] 来鲁华,蛋白质的结构预测与分子设计.北京:北京大学出版社,1993.
    [9] Anfinsen, C. B., Haber, E., Sela, M. and White, F. H. The kinetics of the formation of native ribonuclease during oxidation of the reduced polypeptide chain. Proc. Natl. Acad. Sci. U.S.A. 1961, 47: 1309-1314.
    [10] Anfisen, C. B. Principles that govern the folding of protein chains. Science, 1973, 151: 223-230.
    [11] Lindersteon-Lang, K. U. and Shellman, J. A. Protein structure and enzyme activity. In: Enzymes, Boyer, P. D., ed. New York: Academic Press. 1959, pp 443-510.
    [12] Ibarra, B. Bernal, P. G., Melendez, M. and Leyton, G. R. Physico-chemical findings on the products of protein decomposition; chromatographic analysis of various types of peptones. Rev. Med. Cordoba., 1958, 46:351-358.
    [13] Buehner, M. Ford, G. C. Moras, D., Olsen, K.W. and Rossman, M.G. D-glyceraldehyde-3-phosphate dehydrogenase: three-dimensional structure and evolutionary significance. Proc. Natl. Acad. Sci. U. S. A., 1973, 70(11):3052-3054.
    [14] 阎隆飞,孙之荣主编.蛋白质分子结构.北京:清华大学出版社,1999.
    [15] 王镜岩,文重,陆德培,文镜和刘志华译.生物化学.北京:科学出版社,2000.Hames, B. D. Hooper, N. M. and Houghton, J. D. Instant notes in biochemistry. United Kingdom, BIOS Scientific Publishers Limited, 1997.
    [16] 陶慰孙,李惟,姜涌明.蛋白质分子基础.北京:高等教育出版社,1995.
    [17] Bairoch, A. PROSITE: a dictionary of sites and patterns in proteins. Nucleic Acids Res., 1992, Suppl 20:2013-2018.
    [18] Henikoff, S. and Henikoff, J. G. Automated assembly of protein blocks for database searching. Nucl. Acids Res., 1991, 19: 6565-6572.
    [19] Altschul, S. F., Gish, W., Miller, W., Myers, E. W. and Lipman, D. J. Basic local alignment search tool. J. Mol. Biol., 1990, 215: 403-410.
    [20] Pearson, W.R. and Lipman, D.J. Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. U.S.A., 1988, 85: 2444-2448.
    [21] Altschul, S. F., Madden, T. L., Schaffer, A. A., Zhang, J., Zahng, Z., Miller, W. and Lipman, D. J. Gapped BLAST and PSI-BLAST: a new generateon of protein database search programs. Necl. Acids Res., 1997, 25: 3389-3402.
    [22] Johnson, D. T. and Thirup, S. Using known substructures in protein model building and crystallography. EMBOJ., 1986, 5: 816-822.
    [23] Jones, T. A. and Kleywegt, G. J. CASP3 comparative modeling evaluation. Proteins. 1999, S3: 30-46.
    [24] Bryant, S. H. and Lawrence, C. E. Statistics of sequence-structure threading. Curr. Opin. Struct. Biol. , 1995, 5: 236-244.
    [25] Fetrow, J. S. and Bryant, S. H. New programs for protein tertiary structure prediction. Bio/Technology, 1993, 11: 479-484.
    [26] Jones, D. T. and Thornton, J. M. Potential energy functions for threading. Curr. Opin. Struct. Biol., 1996, 6: 210-216.
    [27] Finkelstein, A. V. and Reva, B. A. Search for the most stable folds of protein chains: Ⅰ. Application of a self-consistent molecular field theory to a problem of protein three-dimensional structure prediction. Protein Eng. 1996, 9(5):387-97.
    [28] Sippl, M. J. and Weitekus, S. Detection of native-like models for amino acid sequences of unknown three-dimensional structure in a data base of known protein conformations. Proteins: Struct. Funct. Genet. , 1992, 13: 258-271.
    [29] Ortiz, A. R., Kolinski, A., Rotkiewicz, P., Ilkowski, B. and Skolnick, J. Ab initio folding of proteins using restraints derived from evolutionary information. Proteins, 1999, Suppl 3:177-185.
    [30] Ofran, Y. and Rost, B. Analysing Six Types of Protein-Protein Interfaces. J. Mol. Biol., 2003, 325: 377-387.
    [31] Jones, S. and Thornton, J. M. Analysis of Protein-Protein Interaction Sites Using Surface Patches. J. Mol. Biol., 1997, 272: 121-132.
    [32] Garian, R. Prediction of Quaternary Structure from Primary Structure, Bioinformatics, 2001, 17: 551-556.
    [33] Shao-Wu Zhang, Quan Pan, Hong-Cai Zhang, Yun-Long Zhang and Hai-Yu Wang, Classification of protein quaternary structure with support vector machine. Bioinformatics, 2003, 19(18): 2390-2396.
    [34] Shao-Wu Zhang, Quan Pan, Hong-Cal Zhang, Yong-Hong Wu and Jian-Yu Shi, Support Vector Machine for Predicting Protein Homo-oligomers by Incorporating Pseudo-amino acid Composition. Internet Electron. J. Mol. Des., 2003, 2: 392-402, http://www.biochempress.com.
    [35] 张绍武,潘泉,陈润生,张洪才,基于支持向量机的蛋白质同源寡聚体分类研究.生物化学与生物物理进展,2003,30(6):879-883.
    [36] 张绍武,潘泉,张洪才,张云龙,王海瑜,基于支持向量机和贝叶斯方法的蛋白质四级结构分类研究.生物物理学报,2003,19(2):171-175.
    [37] Lira, V. I. Algorithms for prediction of α-helices and β-structural regions in globular proteins. J. Mol. Biol., 1974, 88: 873-894.
    [38] Chou, P. Y. and Fasman, G. D. Conformational parameters for amino acids in helical, beta-sheet, and random coil regions calculated from proteins. Biochemistry, 1974, 13(2): 211-222.
    [39] Chou, P. Y. and Fasman, G. D. Prediction of protein conformation. Biochemistry, 1974, 13(2): 222-245.
    [40] Chou, P. Y. and Fasman, G. D. Empirical predictions of protein conformation. Annu. Rev. Biochem., 1978, 47: 251-276.
    [41] Ptitsyn, O. B. and Finkelstein, A. V. Theory of protein secondary structure and algorithm of its prediction. Biopolymers. 1983, 22: 15-22.
    [42] Nagano, K. Logical analysis of the mechanism of protein folding. Ⅰ. Prediction of helices, loops and beta-structures from primary structure. J. Mol. Biol., 1973, 75: 401-420.
    [43] Gariner, J., Osguthorpe, D. and Robson, B. Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. J. Mol. Biol., 1978, 120: 97-120.
    [44] Gibrat, J. F. Garnier, J. and Robson, B. Further developments of protein secondary structure prediction using information theory. New parameters and consideration of residue pairs. J Mol. Biol., 1987, 198:425-4431
    [45] Cohen, F. E., Abarbanel, R. M., Kuntz, I. D. and Fletterick, R. J. Secondary structure assignment for alpha/beta proteins by a combinatorial approach. Biochemistry, 1983, 22(21): 4894-4904.
    [46] Cohen, F. E., Abarbanel, R. M., Kuntz, I. D. and Fletterick, R. J. Turn prediction in proteins using a pattern-matching approach. Biochemistry, 1986, 25(1): 266-275.
    [47] Sotovyer, V. V. and Salamov, A. A. Method of calculation of discrete secondary structures in globular proteins. J. Mol. Biol., 1991, 25(3): 810-824.
    [48] Solovyer, V. V. and Salamov, A. A. Predicting alpha-helix and beta-strand segments of globular proteins. Comput. Appl. Biosci., 1994, 10(6): 661-669.
    [49] Qian, N. and Sejnowski, T. J. Predicting the secondary structure of globular proteins using neural network models. J. Mal, Dial., 1988, 202: 865-884.
    [50] Holley, H. L. and Karplus, M. Protein secondary structure prediction with a neural network. Proc. Natl. Acad. Sci. U.S.A., 1989, 86: 152-156.
    [51] Zhang, X., Mesirov, J. P. and Waltz, D. L. Hybrid system for protein secondary structure prediction. J. Mol. Biol., 1992, 225: 1049-1063.
    [52] Zvelebil, M. J., Barton, G. J., Taylor, W. R. and Sternberg, M. J. Prediction of protein secondary structure and active sites using the alignment of homologous sequences. J. Mol. Biol., 1987, 195(4): 957-961.
    [53] Salzberg, S. and Cost, S. Predicting protein secondary structure with nearest-neighbor algorithm. J. Mol. Biol., 1992, 227: 371-374.
    [54] Viswanadhan, V. N., Denckla, B. and Weinstein, J. N. New joint prediction algorithm (Q7-JASEP) improves the prediction of protein secondary structure. Biochemistry, 1991, 30(46): 11164-11172.
    [55] Hua S. J. and Sun, Z. R. A Novel Method of Protein Secondary Structure Prediction with High Segment Overlap Measure: Support Vector Machine Approach, J. Mol. Biol., 2001, 308:397-407.
    [56] Kabsch, W. and Sander, C. Dictionary of protein secondary structures: Pattern recognition of hydrogen-bend and geometrical features. Biopolymers, 1983.22: 2577-2637.
    [57] Levitt, M. and Chothia, C. Structural patterns in globular proteins. Nature, 1981, 261: 552-558.
    [58] Richardson, J. S. and Richardson, D. C. Principles and patterns of protein conformation. In "Prediction of protein structure and the principles of protein conformation." Fasman, G. D., ed. 1989, New York: Plenum Press, pp1-98.
    [59] Murzin, A. G., Brenner, S. E., Hubbard, T. and Chothia, C. SCOP: A structural classification of protein database for the investigation of sequence and structures. J. Mol. Biol., 1995, 247: 536-540.
    [60] Michie, A. D., Orengo, C. A. and Thornton, J. J. Analysis of domain structural class using an automated class assignment protocol. J. Mol. Biol., 1990, 262: 168-185.
    [61] Deleage, G. and Roux, B. Use of class prediction to improve protein secondary structure prediction. In "Prediction of protein structure and the principles of protein conformation." Fasman, G. D., ed. 1989, New York: Plenum Press, pp 417-465.
    [62] Kneller, D. G., Cohen, F. E. and Langridge, R. Improvements in protein secondary-structure prediction by enhanced neural networks. J. Mol. Biol., 1990, 214:171-182.
    [63] Klein, P. and DeLisi, C. Prediction of protein structural class from amino acid sequence. Biopolymers, 1990, 25: 1659-1672.
    [64] Klein, P. Prediction of protein structural class by discriminant analysis. Biochim. Biophys. Acta., 1986, 874: 205-215.
    [65] Chou, P. Y. Amino acid composition of four classes of proteins. In: "Abstracts of Papers, Part Ⅰ, Second Chemical Congress of the North American Continent," Las Vegas. 1980.
    [66] Chou, P. Y. Prediction of protein structural classes from amino acid composition. In "Prediction of protein structure and the principles of protein conformation." Fasman, G. D., ed. 1989, New York: Plenum Press, pp 549-586.
    [67] Nakashima, H., Nishikawa, K. and Ooi, T. The folding type of a protein is relevant to the amino acid composition. J. Biochem, 1986, 99: 153-162.
    [68] Zhang, C. T. and Chou, K. C. An optimization approach to predicting protein structural class from amino acid composition. Protein Sci., 1992, 1: 401-408.
    [69] Chou, K. C. and Zhang, C. T. Predicting protein folding types by distance functions that make allowances for amino acid interactions. J. Biol. Chem., 1994, 269: 22014-22020.
    [70] Chou, K. C. and Maggiora, G. M. Domain structural class prediction. Protein Eng., 1998, 11: 523-538.
    [71] Dubchak, I., Muchnik, I., Mayor, C., Dralynk, I. and Kim, S. H. Recognition of a protein fold in the context of the SCOP classification. Proteins: Struct. Funct. Genet., 1999, 35: 401-407.
    [72] Cai, Y. D., Liu, X. J., Xu, X. B. and Chou, K. C. Prediction of Protein Structural Classes by Support Vector Machines. Comput. Chem., 2002, 26: 293-296.
    [73] Scherga, H. A. Calculations of conformations of polypeptides. Adv. Phy. Org. Chem. 1968, 6: 103-184.
    [74] Weiner, P. K. and Kollman, P. A. Assisted model building with energy refinement. A general program for modeling molecules and their interaction. J. Comp. Chem., 1981, 2: 287-303.
    [75] Levitt, M. Protein folding by restrained energy minimization and molecular dynamoics. J. Mol. Biol., 1983, 104: 59-107.
    [76] McCammon, J. A., Wong, C. F. and Lybrand, T. P. Protein stability and function, In Prediction of protein structure and the principles of protein conformation, Fasman, G. D., ed., Plenum Press, New York, 1989, pp 149-159.
    [77] Mackay, D. H. J., Cross, A. J. and Hagler, A. T. The role of energy minimization in simulation strategies of biomolecular systems, In Prediction of protein structure and the principles of protein conformation, Fasman, G. D., ed., Plenum Press, New York, 1989, pp 317-358.
    [78] Chou, K. C., Nemethy, G. and Scherage, H. A. Energy of stabilization of the regular structural elements in proteins. Acc. Chem. Res., 1990, 23: 134-141.
    [79] Karplus, M. and Shakhnovich, E. Theoretical studies of the tertiary structures of peptide by the Monte Carlo simulated annealing method. Protein Eng., 1992, 3:515-523.
    [80] Lazaridis, T. and Karplus, M. Effective energy function for protein structure prediction. Current Opinion in Structural Biology, 2000, 10:139-145.
    [81] 靳利霞.蛋白质结构预测方法研究.大连理工大学博士论文.2002.
    [82] Mosimann, S., Meleshko, R. and James, M. N. A critical assessment of comparative molecular modeling of tertiary structures of proteins. Proteins, 1995, 23(3):301-17.
    [83] 丁达夫,汤海旭,张保红.基于结构比较的蛋白质模建系统及其评估.Ⅰ.主链的模建.生物物理学报,1995,11(3):416.
    [84] 汤海旭,丁达夫.基于结构比较的蛋白质模建系统及其评估.Ⅱ.侧链的安装.生物物理学报,1996,12(1):125.
    [85] 张保红,丁达夫.基于结构比较的蛋白质模建系统及其评估.Ⅲ.灵敏的蛋白质结构的评估方法.生物化学与生物物理学报,1996,28(4):335.
    [86] Finkelstein, A. V. and Ptitsyn, O. B. Why do globular proteins fit the limited set of folding patterns. Prog. Biophys. Mol. Biol., 1987, 50: 171-190.
    [87] Chothia, C. and Finkelstein. A. V. The classification and origins of protein folding patterns. Ann. Rev. Biochem., 1990, 59: 1007-1039.
    [88] Hubbard, T. J. P., Murzin, A.G., Brenner, S. E. and Chothia, C. SCOP: A structural classification of proteins database. Nucleic Acids Res., 1997, 25(1): 236-239.
    [89] 黄积涛,蛋白质结构、运动与功能,天津大学博士论文,2003.
    [90] Chothia, C. One thousand families for the molecular biologist. Nature, 1992, 357: 543-544.
    [91] Blundell, T. L. and Johnson, M. S. Catching a common fold. Protein Sci., 1993, 2(6):877-883.
    [92] Wang, Z. X. A re-estimation for the total numbers of protein folds and superfamilies. Protein Eng., 1998, 11(8): 621-626.
    [93] Alexandrov, N. N. and Go, N. Biological meaning, statistical significance, and classification of local spatial similarities in nonhomologous proteins. Protein Sci., 1994, 3(6):866-875.
    [94] Harrison, A., Pearl, F., Mott, R., Thornton, J. and Orengo, C. Quantifying the similarities within fold space. J Mol Biol., 2002, 323(5):909-26.
    [95] Orengo, C. A., Michie, A. D., Jones, S., Jones, D. T., Swindells, M. B. and Thornton, J. M. CATH-A Hierarchic Classification of Protein Domain Structures. Structure, 1997, 5(8): 1093-1108.
    [96] Pearl, F. M. G., et.ai., Assigning genomic sequences to CATH. Nucleic Acids Res., 2000, 28: 277-282.
    [97] Marchler-Bauer, A. and Bryant, S. H. A measure of success in fold recognition. Trends Biochem. Sci., 1997, 22: 236-240.
    [98] Bowie, J. U., Luthy, R. and Eisenberg, D. A method to identity protein sequences that fold into a known three-dimensional structure. Science, 1991, 253: 164-170.
    [99] Luthy, R., Bowie, J. U. and Eisenberg, D. Assessment of protein models with three-dimensional profiles. Nature, 1992, 356: 83-85.
    [100] Jones, D. T., Taylor, W. R. and Thornton, J. M. A new approach to protein fold recognition. Nature. 1992, 358: 86-89.
    [101] Godzik, A., Skolnick, J. and Kolinski, A. A topogy fingerprint approach to the inverse folding problem. J. Mol. Biol., 1992, 227: 227-228.
    [102] Fischer, D., Rice, D., Bowie, J. U. and Eisenberg, D. Assigning amino acid sequences to 3-dimensional protein folds. FASEB J, 1996, 10:126-136.
    [103] Alexandrov, N. N., Nussinov, R. and Zimmer, R. M. Fast protein fold recognition via sequence to structure alignment and contact capacity potentials. In Biocomputing: Proceedings of the 1996 Pacific Symposium, Hunter, L. and Klein, T., ed., 1996, pp53-72. Singapore: World Scientific Publishing Co.
    [104] Xu, Y., Xu, D. and Uberbacher, E. C. An efficient computational method for globally optimal threading. J. Comp. Biol., 1998, 5(3): 597-614.
    [105] Dubchak, I., Muchnik, I., Holbrook, S.R., Kim, S-H. Prediction of protein folding class using global description of amino acid sequence. Proc. Natl. Acad Sci. U.S.A., 1995, 92: 8700-8704.
    [106] Ding, C. H.Q. and Dubchak, I., Multi-class protein fold recognition using support vector machines and neural networks. Bioinformatics, 2001, 17(4): 349-358.
    [107] Klotz, I. M., Darnall, D. W. and Langerman, N. R. Quaternary structure of protein. In: The protein, Neurath, H. and Hill, R. L. eds. 3rd. New York: Academic Press, 1975, 1:293-411.
    [108] Schulz, G. E. and Schirmer. Principle of protein structure. New York: Springer-Verlag, 1979, pp98-107.
    [109] Svedberg, T. Mass and size of protein molecules. Nature, 1929, 123: 871.
    [110] Svedberg, T. and Fathraeus, R. A New Direct Method for the Determination of the Molecular Weight of the Proteins. J. Am. Chem. Soc., 1926, 48:430-438.
    [111] Jones, S. and Thornton, J. M. Protein-protein interactions: A review of protein dimer structures. Prog. Biophys. Biol., 1995, 63:31-65.
    [112] Matthews, B. W. and Bernhard, S. A. Structure and symmetry of oligomeric enzymes. Annu. Rev. Biophys. Bioeng., 1973, 2:257-317.
    [113] Dove, A. Proteomies: translating genes into products? Nat. Bioltechnol., 1999, 17: 233-236.
    [114] Barrel, P. L. and Fields, S. (eds). The yeast two-hybrid system. In Advances in Molecular Biology. Oxford University Press, New York, 1997.
    [115] Fields, S. and Anderson, O. K. A novel genetic system to detect protein-protein interactions. Nature, 1989, 340: 245-246.
    [116] Uetz, P., Giot, L., Cagney, G., Mansfield, T. A., Judson, R. S., Knight, J. R., Lockshon, D., Narayan, V., Srinivasan, M., Pochart, P., Qureshi-Emili, A., Li, Y., Godwin, B., Conover, D., Kalbfleisch, T., Vijayadamodar, G., Yang, M., Johnston, M., Fields, S. and Rothberg, J.M. A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature, 2000, 403: 623-627.
    [117] Enright, A. J., Ililopoulos, I., Kyrpides, N. C. and Ouzounis, C. A. Protein interaction maps for complete genomes based on gene fusion events. Nature, 1999, 402: 86-90.
    [118] Pazos, F., Helmer-Citterich, M., Ausiello, G. and Valencia, A. Correlated mutations contain information about protein-protein interaction. J. Mol. Biol., 1997, 1:511-523.
    [119] Huynen, M., Snel, B., Lathe, W. and Bork, P. Predicting protein function by genomic context: quantitative evaluation and qualitative inferences. Genome Res., 2000, 10: 1204-1210.
    [120] Marcotte, E., Pellegrini, M., Ng. H.L., Rice, D. W., Yeates, T. O. and Eisenberg, D. Detecting protein function and protein-protein interactions from genome sequences. Science, 1999, 285: 751-753.
    [121] Jones, S. and Thornton, J. M. Prediction of protein-protein interaction sites using patch analysis. J. Mol. Biol., 1997, 272: 133-143.
    [122] Kini, R. M. and Evans, J. H. Prediction of potential protein-protein interaction sites from amino acid sequence. Identification of a fibrin polymerization site. FEBS Lett., 1996, 385: 81-86.
    [123] Nissinka, J. W., Verdonk, M. L. and Klebe, G. Knowledge-based descriptors to predict protein-ligand interactions. Proceedings of the 13th European Symposium on Quantitative Structure-Activity Relationships. Heinrich-Heine Universitat, Dusseldorf, Germany, 2000.
    [124] Hopp, T. P. and Woods, K. R. Prediction of protein antigenic determinants from amino acid sequences. Proc. Natl. Acad. Sci., U.S.A., 1981, 78: 3842-3828.
    [125] Welling, G.W., Weijer, W. J., van der Zee, R. and Welling-Wester, S. Prediction of sequential antigenic regions in proteins. FEBS Lett., 1985, 188: 215-218.
    [126] Book, J. R. and Gough, D. A. Predicting protein-protein interactions from primary structure. Bioinformatics, 2001, 17(5): 455-460
    [127] Dietmann, S. and Frommel, C. Prediction of 3D neighbors of molecular surface patches in proteins by artificial neural networks. Bioinformatics, 2002, 18(1): 167-174.
    [128] Jones, S. and Thornton, J. M. Principles of protein-protien interactions. Proc. Natl. Acad. Sci. U.S.A., 1996, 93: 13-20.
    [129] Keskin, O., Bahar, I., Badretdrnov, A. Y., Ptitsyn, O. B. and Jernigan, R. L. Empirical solvent-mediated potentials hold for both intra-molecular and inter-molecular inter-residue interactions. Protein Sci., 1998, 7: 2578-2586.
    [130] Glaser, F., Steinberg, D. M., Vakser, I. A. and Ben-Tal, N. Residue frequencies and pairing preferences at protein-protein interfaces. Proteins: Struct. Funct. Genet., 2001, 43: 89-102.
    [131] McCoy, A. J., Chandana Epa, V. and Colman, P. M. Electrostatic complementary at protein/protein interfaces. J. Mol. Biol., 1997, 268: 570-584.
    [132] Lo Conte, L., Chouthia, C. and Janin, J. The atomic structure of protein-protein recognition sites., J. Mol. Biol., 1999, 285:2177-2198.
    [133] Sheinerman, F. B., Norel, R. and Honig, B. Electrostatic aspects of protein-protein interactions. Curr. Opin. Struct. Biol., 2000, 10: 153-159.
    [134] Chou, K. C. and Elord, D. W., Prediction of membrane protein types and subcellular locations. Proteins: Struct. Funct. Genet. , 1999, 34:137-153.
    [135] Rost, B. Casadio, R., Fariselli, P. and Sander, C. Prediction of the helical transmembrane segments at 95% accuracy. Protein Sci., 1995, 4:521-533.
    [136] Stryer, L. Introduction to biological membranes. Biochemistry, W. H. Freeman, and Company, New York.
    [137] Cserzo, M., Wallin, E., Simon, I., Heijne, G. von and Elofsson, A. Prediction of transmembrane α-helices in prokaryotic membrane proteins: the dense alignment surface method. Protein Eng., 1997, 10(6): 673-676.
    [138] Resh, M. D. Myristylation and palmitylation of Src family membrane: the fats of the matter. Cell, 1994, 76: 411-413.
    [139] Casey, P. J. Protein lipidation in cell signaling. Science, 1995, 268: 221-225.
    [140] Chou, K. C. Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins: Struct. Funct. Genet., 2001, 43:246-255.
    [141] Feng, Z. P. and Zhang, C. T. Prediction of the membrane protein types based on the hydrophobic indices. J Protein Chem., 2000, 19: 269-275.
    [142] Cai, Y. D., Liu, X. J. and Chou, K. C. Artificial neural network model for predicting membrane protein types. J. Biomol. Struct. Dyn., 2001, 18: 607-610.
    [143] Cai, Y. D., Liu, X. J., Xu, X. B. and Chou, K. C. Support vector machines for prediction membrane protein types by incorporating quasi-sequence-order effect. Internet Electron. J. Mol. Des., 2002, 1:219-226. http://www.biochempress.com.
    [144] 冯志萍,从蛋白质的一级结构预测蛋白质的亚细胞位置和结构类.天津大学博士论文,2001.
    [145] Andrade, M.A., O'Donoghue, S. L. and Rost, B. Adaptation of protein surfaces to subcellular location. J. Mol. Biol., 1998, 276: 517-525.
    [146] NaKai, K. and Kanehisa, M. Expert system for predicting protein localization sites in Gram-negative bacteria. Protein, 1991, 11: 95-110.
    [147] NaKai, K. and Kanehisa, M. A knowledge base for predicting protein localization sites in eukaryotic cells. Genomics, 1992, 14:897-911.
    [148] Nielsen, H. Nielsen, H., Brunak, S. and yon Heijne, G. Machine learning approaches for the prediction of signal peptides and other protein sorting signals. Protein Eng., 1999, 12: 3-9.
    [149] Emanuelsson, O., Nielsen, H., Brunak, S. and von Heijne, G. Predicting subcellular localization of proteins based on their N-terminal amino acid sequence, J. Mol. Biol., 2000, 300: 1005-1016.
    [150] Nakashima, H. and Nishikawa, K. Discrimination of intracellular and extracellular proteins using amino acid composition and residue-pair frequencies. J. Mol. Biol., 1994, 238: 54-61.
    [151] Cendnao, J., Aloy, P., Peez-Pons, J. A. and Querol, E. Relation between amino acid composition and cellular location of proteins. J. Mol. Biol., 1997, 266: 594-600.
    [152] Reinhardt, A. and Hubbard, T. Using neural network for prediction of the subcellular location of proteins. Nucleic Acids Res., 1998, 26: 2230-2236.
    [153] Chou, K. C. and Elord, D. W. Using discriminant function for prediction of subcellular location of prokaryotic proteins. Biochem. Biophys. Res. Commun., 1998, 252: 63-68.
    [154] Chou, K. C. and Elord, D.W. Protein subcellular location prediction. Protein Eng., 1999, 12: 107-108.
    [155] Hua, S. J. and Sun, Z. R. Support vector machine approach for protein subcellular localization prediction. Bioinformatics, 2001, 17(8): 721-728.
    [156] Cai, Y. D., Liu, X. J. and Chou, K. C., Support vector machines for prediction of protein subcellular location. Mol. Cell Biol. Res. Commun., 2000, 4: 230-233.
    [157] Yuan, Z., Prediction of protein subcellular locations using Markov chain models. FEBS Lett., 1999, 451: 23-26.
    [158] 崔岩,蛋白质结构预测与结构模拟新方法的研究,中国科学院生物物理研究究博士学位论文,1998.
    [159] Chou, K. C. Prediction of protein subcellular locations by incorporating quasi-sequence-order effect. Biochem. Biophys. Res. Coraraun., 2000, 278: 477-483.
    [160] Chou, K. C. and Cai, Y. D. Using Functional Domain Composition and Support Vector Machines for Prediction of Protein Subcellular Location. J. Biol. Chem., 2002, 277(48): 45765-45769.
    [161] Feng, Z. P and Zhang, C. T. Prediction of the subcellular location of prokaryotic proteins based on the hydrophobicity index of amino acids. International J. Biol. Macromol., 2001, 28:255-261. http://www.elsevier.com:locate:ijbiomac.
    [162] Feng, Z. P and Zhang, C. T. A graphic representation of protein sequence and predicting the subcellular locations of prokaryotic proteins. The International J. Biochem. Cell Biol., 2002, 34: 298-307. http://www.elsevier.com/locate/ijbeb.
    [163] Holm, L. and Sander, C. Mapping the protein universe. Science, 273: 595-602.
    [164] Nishikawa, K., and Ooi, T. Correlation of the amino acid composition of a protein to its structural and biological characters. J. Biochem., 1982, 91:1821-1824.
    [165] Nishikawa, K., Kubota, Y. and Ooi, T. Classification of the protein into groups based on amino acid composition and other characters. Ⅰ. Angular distribution. J. Biochem., 1983, 94:981-995.
    [166] Nishikawa, K., Kubota, Y. and Ooi, T. Classification of the protein into groups based on amino acid composition and other characters. Ⅱ. Grouping into four types. J. Biochem., 1983, 94:997-1007.
    [167] Nishikawa, K. and Ooi, T. Radial locations of amino acid residues in a globular protein: correlation with the sequence. J Biochem (Tokyo). 1986, 100(4): 1043-1047.
    [168] Chou, K. C. and Zhang, C. T. Prediction of protein structural classes. Crit. Rev. Biochem. Mol. Biol., 1995, 30:275-349.
    [169] Zhu Huaiqiu,She Zhensu and Wang jin,An EDP-based description of DNA sequences and its application in identification of exons in Human genome.第二届中国生物信息学大会论文集,北京,2002,P23—24。
    [170] 朱雪龙,应用信息论基础,北京:清华大学出版社,2001.
    [171] Comette, J. L., Cease, K. B. Margali, H., Spouge, J. L., Berzofsky, J. A. and Delisi, C., Hydrophobicity scales and computational techniques for detecting amphipathic structures in proteins. J Mol Biol, 1987, 195(3) : 659-685.
    [172] Zhang, C. T. and Zhang, R. A new quantitative criterion to distinguish between α/β and α+β proteins. FEBS Letters, 1998, 440:153-157.
    [173] Kawashima S. Ogata H. Kanehisa M. Aaindex: Amino Acid Index Database. Nucleic Acids Res. 1999, 27(1): 368-369.
    [174] Schneider, G, Wrede, P. The rational design of amino acid sequences by artificial neural networks and simulated molecular evolution: de novo design of an idealized leader peptidase cleavage site. J Biophys, 1994, 66(2): 335-344.
    [175] Vapnik, V. (Ed.) The Nature of Statistical Learning Theory. Springer, New York, 1995.
    [176] Vapnik, V. (Ed.) Statistical Learning Theory. Wiley, New York, 1998.
    [177] Brown, M., Grundy, W., Lin, D., Cristianini, N., Sugnet, C. W., Furey, T.S., Ares Tr. M., and Hausser, D. Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc. Natl. Acad. Sci. U.S.A., 2000, 97: 262-267.
    [178] Zien, A., Ratsch, G., Mika, S., Scholkopf, B., Lengauer, T. and Muller K. R. Engineering support vector machine kernels that recognize translation initiation sites. Bioinformatics, 2000, 16:799-807.
    [179] Jaakkola, T., Diekhans, M. and Haussler, D. Using the Fisher kernel method to detect remote protein homologies. Proceedings of the 7th International Conference on Intelligent systems for Molecular Biology. AAAI Press, Menlo Park, CA. 1999, 149-158.
    [180] Joachims, T. Making large-scale SVM learning practical; in: Advances in Kernel Methods-Support Vector learning. Scholkopf, B., Burges, C. and Smola, A., Eds. MIT Press, Cambridge, MA, 1999.
    [181] Burges, J. C. A Tutorial on Support Vector Machines for Pattern Recognition. Bell Laboratories, Lucent Technologies. 1997.
    [182] Courant, R. and Hilbert, D. Methods of Mathematical Physics. New York: Wildy-Interscience, 1953.
    [183] Corinna Cortes and Vapnik, V. Support-Vector Network Machine Learning, 1995, 20:273—297.
    [184] Duda, R. O. and Hart, P. E. Pattern Classification and Scene Analysis. John Wiley & Sons, New york, 1973
    [185] Liu, W. and Chou, K. C. Prediction of protein structure classes by modified Mahalanobis discriminant algorithm. J Protein Chem., 1998, 17:209-217
    [186] Wesston, J. and Watkins, C. Multi-class support vector machines. Technical report, Royal Holloway, University of London, 1998.
    [187] Kressel, U. H. G. Pairwise classification and support vector machines. In Seholkopf, B., Burges, C. J. C. and Smola, A. J. editors, Advances in Kernel Methods-Support Vector Learning, pp 255-268, Cambridge, MA, 1999. MIT press.
    [188] Fasman, G. D. (Ed.) Handbook of Biochemistry and Molecular Biology, 3rd ed., Proteins-Volumel, CRC Press, Cleveland, 1976.
    [189] Bairoch, A. and Apweiler, R. The SWISS-PROT Protein Data Bank and Its New Supplement TrEMBL, Nucleic Acids Res. 1996, 24:21-25.
    [190] Wold, S., Eriksson, L., Hellberg, S., Jonsson, J., Sjostrom, M., Skagerberg, B. and Wikstrom, C. Principal property values for six non-natural amino acids and their application to a structure-activity relationship for oxytocin peptide analogues. Can. J. Chem., 1987, 65:1814-1820.
    [191] Kyte, J. and Doolittle, R. A simple method for displaying the hydropathic character of a protein. J. Mol. Biol., 1982, 157:105-132.
    [192] 祝海龙,屈梁生,张海军,基于小波变换和支持向量机的人脸检测系统.西安交通大学学报,2002,36(9):947-950.
    [193] Kawashima, S. and Kanehisa, M. AAindex: amino acid index database. Nucleic Acids Res., 2000, 28(1):374.
    [194] Tomi, K. and Kanehisa, M. Analysis of amino acid indices and mutation matrices for sequence comparison and structure prediction of proteins. Protein Eng., 1996,9:27-36.
    [195] Jones, S. and Thornton, J., Principles of protein-protein interactions. Proc. Natl. Acad. Sci. USA, 1998,93:13-20.
    [196] Rost, B. and Sander, C. Prediction of secondary structure at better than 70% accuracy. J. Mol. Biol., 1993, 232(2): 584-599.
    [197] Pliska, V., Schmidt, M., and Fauchere, J.L. Partition coefficients of amino acids and hydrophobic parameters pi of their side-chains as measured by thin-layer chromatography. J. Chromatogr. 1981,216:79-92.
    [198] Fauchere, J.L., Charton M, Kier, L.B., Verloop, A. and Pliska, V. Amino acid side chain parameters for correlation studies in biology and pharmacology. Int. J. Pept. Protein Res., 1988,32(4):269-278.
    [199] Maxfield, F.R. and Scheraga, H.A. Status of empirical methods for the prediction of protein backbone topography. Biochemistry, 1976,15(23):5138-5153.
    [200] Meek, J.L. and Rossetti, Z.L. Factors affecting retention and resolution of peptides in HPLC.J. Chromatogr, 1981,211: 15-28.
    [201] Robson, B. and Osguthorpe, D.J. Refined models for computer simulation of protein folding. Applications to the study of conserved secondary structure and flexible hinge points during the folding of pancreatic trypsin inhibitor. J. Mol. Biol., 1979,132(1):19-51.
    [202] Sneath PH. Relations between chemical structure and biological activity in peptides. J. Theor. Biol., 1966,12(2):157-195.
    [203] Pascarella, S. and Argos, P. A databank merging related protein structures and sequences. Protein Eng., 1992,5:121-137.
    [204] Reczko, M. and Bohr, H., The DEF database of sequence based protein fold class predictions. Nucleic Acids Res., 1994, 22(17):3616-3619.
    [205] Hobohm, U., Scharf, M., Schneider, R. and Sander, C. Selection of a representative set of structures from the Brookhaven Protein Bank. Protein Sci., 1992,1:409-417.
    [206] Hobohm, U. and Sander, C. Englarged representative set of proteins. Protein Sci., 1994, 3:522-524.
    [207] Lo Conte, L., Ailey, B., Hubbard, T.J.P., Brenner, S.E., Murzin, A.G and Chothia, C. SCOP: a structural classification of proteins database. Nucleic Acids Res., 28:257-259.
    [208] Parker, J.M., Guo, D. and Hodges, R.S. New hydrophilicity scale derived from high-performance liquid chromatography peptide retention data: correlation of predicted surface residues with antigenicity and X-ray-derived accessible sites. Biochem., 1986, 25(19):5425-5432.
    [209] Ponnuswamy, P.K., Prabhakaran, M. and Manavalan, P. Hydrophobic packing and spatial arrangement of amino acid residues in globular proteins. Biochim. Biophys. Acta., 1980, 623(2):301-316.
    [210] Shuichi Kawashima. Hiroyuki Ogata and Minoru Kanehisa. Aaindex: Amino Acid Index Database. Nucleic Acids Res., 1999, 27(1): 368-369.
    [211] Corette, J.L. Cease, K.B., Margali, H, et al. Hydrophobicity scales and computational techniques for detecting amphipathic structures in proteins. J Mol Biol , 1987, 195 : 659-685.
    [212] Jacobon, G. R. and Saier, Jr. M. H. PartⅢ. Lipids and membrane. In: Biochemistry. Zubay, G. R. ed., Adison-Wesley Publishing Company, Inc., 1983.
    [213] Eisenberg, D. and Mclachlan, A. D. Solvation energy in protein folding and binding. Nature, 1986, 319:199-203.
    [214] Klein, P. and DeLisi, C. Prediction of protein structural classes from amino acids sequence. Biopolymers, 1986, 25: 1659-1672.
    [215] Bairoch, A. and Boeckmann, B. The SWISS-PORT protein sequence data bank and its new supplement TrEMBL. Nucleic Acids Res., 1992, 25:31-36.
    [216] Rost, B., Casadio, R. and Fariselli, P., Topology prediction for helical transmembrane proteins at 86% accuracy. Protein Sci., 1996, 5: 1704-1718.
    [217] Boyd, D., Schierle, C. and Beckwith, J. How many membrane proteins are there? Protein Sci., 1998, 7: 210-215.
    [218] Nielsen, H., Engelbrecht, J., Brunak, S. and von Heijne, G. Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Protein Eng., 1997, 10: 1-6.
    [219] Drawid, A. and Gerstein, M. A Bayesian system integrating expression data with sequence patterns for localizing proteins: comprehensive application to the yeast geneome. J. Mol. Biol., 2000, 301: 1059-1075.
    [220] Murphy, R. F., Boland, M. V. and Velliste, M. Towards a systematics for protein subcellular location: quantitative description of protein localization patterns and automated analysis of fluorescence microscope images. Proc. Int. Conf. Intell. Syst. Mol. Biol., 2000, 251-259.
    [221] Hardy, J. S., Holmgrem, J., Johnsson, S., Sanchez, J. and Hirst, T. R. Coordinated assembly of multisubunit proteins: oligomerization of bacterial enterotoxins in vivo and in vitro. Proc. Natn. Acad. Sci. U.S.A., 1988, 85:7109-7113.
    [222] Rost, B., Casadio, R., Fariselli, P. and Sander, C. Prediction of the membrane segments at 95% accuracy. Protein Sci., 1995, 4: 521-533.
    [223] Sander, C. and Schneider, R. Database of homology-derived protein structures and structural meaning of sequence alignment. Proteins, 1991, 9: 56-58.
    [224] Flores, T. P., Orengo, C. A., Moss, D. and Thornton, J. M. Comparison of conformational characteristics in structurally similar protein pairs. Protein Sci., 1993, 2:1811-1826.
    [225] Hilbert, M., Bohm, G. and Janenicke, R. Structural relationships of homologous proteins as a fundamental principle in homology modeling. Proteins, 1993, 17:138-151.
    [226] Karchin, R., Karplus, K. and Haussler, D. Classifying G-protein coupled receptors with support vector machines. Bioinformatics, 2002, 18(1): 147-59.
    [227] Leslie, C., Eskin, E. and Noble, W. The Spectrum Kernel." A String Kernel for SVM Protein Classification. To appear: Pacific Symposium on Biocomputing, 2002.
    [228] Karakoulas, G. and Shawe-Talyor, J. Optimizing classifiers for imbalanced training sets. In Kearns, M., Solla, S. and Cohn, D. editors, Advances in Neural Information Processing Systems 11, Cambridge, AA, The MIT press, 1999.
    [229] Veropoulos, K., Campbell, C. and Cristianini, N. Controlling the sensitivity of support vector machines. In proceedings of the international Joint Conference on Artificial Intelligence (IJCA99), Stockholm, Sweden, 1999.