基于支持向量机的集成学习研究

英文题名：Ensemble Learning Based on Support Vector Machines
作者：李烨
论文级别：博士
学科专业名称：控制理论与控制工程
中文关键词：支持向量机 ; 集成学习 ; 分类 ; 多类 ; 粗糙集 ; 离散化 ; 遗传算法 ; 证据理论 ; 故障诊断
英文关键词：support vector machine ; ensemble learning ; classification ; multi-class ; rough set theory ; discretization ; genetic algorithms ; evidence theory ; fault diagnosis
学位年度：2007
导师：许晓鸣
学科代码：081101
学位授予单位：上海交通大学
论文提交日期：2007-06-01

摘要

支持向量机是建立在统计学习理论基础上的一种新型的机器学习方法,由于具有良好的泛化能力,目前已经在许多领域得到了成功的应用。然而在应用过程中,支持向量机仍然存在一些不足:首先,为了降低求解优化问题的时间,需要采用逼近算法;其次,往往凭经验选择核函数,采用交叉验证法确定分类器参数,并不能保证参数的最优性,但目前仍然没有很好的解决办法;第三,支持向量机本质是两类分类工具,为解决多类问题需要进行扩展,但是无论采用组合多个两类支持向量机的方法,还是在同一个优化问题中考虑所有类别,其分类性能并不如解决两类问题时显著,有些方法在实现上也过于复杂。这些缺点降低了支持向量机的稳定性和泛化能力。
     集成学习通过训练和组合多个准确而有差异的分类器,为提高分类系统的泛化能力提供了一条新的途径,成为近十年来机器学习领域最主要的研究方向之一。目前,国内外以神经网络、决策树等为基分类器的集成学习研究已经取得了很大的进展,但是对支持向量机集成的研究起步较晚,还需要大量的工作。本文从这一现状出发,研究有效的支持向量机集成学习方法,主要研究工作及创新性成果包括:
     1)介绍了支持向量机的原理、算法以及多类扩展方法,从基分类器构造和基分类器组合两个方面详尽地总结了集成学习的一般方法,综述了当前国内外支持向量机集成学习研究的发展现状。
     2)提出了基于属性约简的集成学习方法。粗糙集理论中的属性约简方法可以作为学习算法的一种对冗余数据的预处理手段,但是由于数据噪声和离散化的影响,在许多情况下会降低支持向量机等学习算法的分类性能。对一个包含冗余属性的决策表进行约简可以获得多个不同的约简属性子集,这些子集通常具有较好的分类能力,而且彼此具有一定的差异性,因而可以用来构造支持向量机集成。属性约简集成学习方法能有效利用训练数据中的互补和冗余信息进行融合分类,克服数据噪声和离散化对支持向量机分类性能的不良影响。
     3)提出了基于属性离散化的基分类器构造方法,指出了三种可能的实现策略:随机选择断点;采用某一种离散化算法,并选择不同数量的断点;或者采用多种离散化算法获得不同的断点集。本文采用第一种策略,首先基于RSBRA离散化算法构造支持向量机集成。进一步地,针对RSBRA离散化结果可能可能较大程度降低支持向量机分类性能的缺点,引入粗糙集理论的数据一致性指标,对RSBRA算法进行改进,使得离散化结果能保留足够的分类信息。然后,在此基础上提出基于改进RSBRA算法的集成学习方法。
     4)在当前基于搜索技术的集成学习方法中,通常需要一类指标对基分类器的性能进行评估,但是这些指标很难在准确性和差异性之间取得良好的折衷,或者不能直接反映集成的泛化性能。针对这一问题,提出直接遗传集成学习方法,利用遗传算法直接在集成所在的空间搜索分类性能优良的集成。直接遗传集成很容易实现分类器的选择性集成,研究表明,在组合较少分类器的情况下获得了比传统集成学习方法Bagging和Adaboost更好的分类效果。
     5)研究了多类分类问题中的分类器组合架构,为克服已有架构的不足,提出简化的架构,避免分类器组合过程中不必要的信息损失。在此架构下研究基于证据理论的度量层组合方法,利用支持向量机的后验概率输出和分类精度,定义基本概率分配函数,然后采用一定的规则进行合成。特别地,当采用一对一的多类扩展方法时,可能出现严重的证据冲突,经典的Dempster组合规则不再适用。因此,基于冲突信息部分可用的思想提出新的证据组合方法,根据全体证据的整体有效性确定冲突信息的有用部分,然后将有用部分根据基本概率分配的加权平均在焦元中进行分配,有效地解决了证据冲突的问题。
Support vector machine is a new machine learning method based on the statistical learning theory. Because of its good generalization performance, support vector machine has been successfully applied in a variety of fields. However, there are some defects with support vector machine during the process of application. First, approximation algorithms are adopted to reduce the time and space complexities in solving the optimization problem. Second, usually the choice of kernel function depends on one’s experience and the choice of parameters is done by cross validation. Though the optimality of them has no guarantee, there is no better solution so far. Third, support vector machine is in nature a tool for binary-class classification and should be extended in order to solve multi-class problems. Unfortunately, either by combining several binary support vector machines or by considering all classes in one optimization problem, the classification performance does not improve as much as in the binary classification; moreover, some methods are too hard to be implemented. These defects degrade the stability and generalization performance of support vector machine.
     By training and combining some accurate and diverse classifiers, ensemble learning provides a novel approach for improving the generalization performance of classification systems. In recent ten years, ensemble learning has become a main research topic in the field of machine learning. Now, the research at home and abroad of ensemble learning based on neural networks and decision trees have made great progress, while the research on support vector machine ensemble starts relatively late and needs much further studies. This dissertation focuses on developing effective ensemble learning methods with support vector machine and the main contributions are presented as follows:
     1) First the principle and algorithms of support vector machine as well as the extension methods for multi-class classification are introduced. The general methods of ensemble learning are summarized in detail from both the construction and combination aspects of base classifiers. The current developments of ensemble learning research on support vector machine at home and abroad are reviewed
     2) An ensemble learning method based on attribute reduction is proposed. Attribute reduction methods in the rough set theory can be used as a preprocessing technique of redundant data for learning algorithms. However, it may reduce the classification performance of learning algorithms in many cases due to the influence of data noise and discretization. Reduction of a decision table with redundant attributes can produce more than one different reduct of attributes. The reducts usually have relative good classification capabilities and are different from each other. Therefore, the reducts can be utilized to construct support vector machine ensembles. Reduction based ensemble can utilize effectively the complementary or redundant information in the training data for fusion classification and overcome the harmful influence of attribute reduction on the classification performance of support vector machine.
     3) A construction method of base classifiers based on discretization of attributes is proposed. Three possible implementation strategies are pointed out: choosing cuts randomly; adopting some discretization algorithm and choosing different numbers of cuts; or adopting several discretization methods. In this paper, the first strategy is used to construct support vector machine ensemble based on the RSBRA discretization method. Aiming at the disadvantage of RSBRA that it may excessively degrade the classification performance of support vector machine, the level of consistency, which is coined from the rough set theory, is introduced to modify RSBRA so as to preserve enough information for classification. Afterwards, an ensemble learning method based on the modified RSBRA discretization method is proposed.
     4) In present ensemble learning methods based on search techniques, measures of performance are needed to evaluate the base classifier. Nevertheless, these measures either are hard to be adjusted to make a reasonable tradeoff between accuracy and diversity, or can not reflect directly the generalization performance of an ensemble. Aiming at this problem, we propose a direct genetic ensemble method which searches for a good ensemble in the ensemble space by genetic algorithms. The presented method can be implemented to produce selective ensembles of classifiers readily. The study shows that selective ensembles gain better classification performance than traditional ensemble learning methods such as Bagging and Adaboost by combining less classifier.
     5) The combination structures of classifiers in multi-class classification problems are studied. A simplified structure is proposed to overcome the defects of existent structures. Based on the structure, the measurement-level combination methods of classifiers based on the evidence theory are studied. In the evidence theory method, the basic probability assignment functions are defined by utilizing the posterior outputs and predictive accuracies of support vector machines and then combined by some rule. In particular, when the one-against-one method is used for multi-class extension, the evidence may conflict heavily and the classic Dempster combination rule is not applicable. Therefore, we propose a new evidence combination rule based on the thought that the conflicting information is partly valuable. The valuable part is determined according to the globe effectiveness of the evidence and then distributed among the focus elements according to the weighted sum of basic probability assignments. The rule tackles the problem of evidence conflicting effectively.

引文

[1] Vapnik V. The Nature of Statistical Learning Theory[M], 1995.
    [2] 张学工. 关于统计学习理论与向量机[J]. 自动化学报, 2000, 78(1): 32-42.
    [3] Bottou L, Cortes C, Denker J S, et al. Comparison of classifier methods: a case study in handwritten digit recognition[C]. Jerusalem, Isr. Proceedings of International Conference on Pattern Recognition. IEEE, Piscataway, NJ, USA, 1994: 77-82.
    [4] Friedman J H. Another approach to polychotomous classification[R]. Technical report. Stanford, CA: Department of Statistics, Stanford University, 1996.
    [5] Krebel U. Pairwise classification and support vector machines[M]. //B S, J B C, J S A. Advances in Kernel Methods: Support Vector Learning, pages. Cambridge, MA: MIT Press, 1999: 255-268.
    [6] Knerr S P. Single-layer learning revisited: a stepwise procedure for building and training a neural network[J]. Neurocomputing: Algorithms, Architectures and Application, 1990.
    [7] Kim H C, Pang S, Je H M, et al. Constructing support vector machine ensemble[J]. Pattern Recognition, 2003, 36(12): 2757-2767.
    [8] Platt J C, Cristianini N, Shawe-taylor J. Large margin DAGs for multiclass classification[J]. Advances in Neural Information Processing Systems, 2000, 12: 547-553.
    [9] Takahashi F, Abe S. Decision-Tree-Based Multiclass Support Vector Machines[C]. The 9th International Conference on Neural Information Processing[C]. Singapore, 2002: 1418-1422.
    [10] Azimi-sadjadi M R, Zekavat S A. Cloud classification using support vector machines[C]. Honolulu, Hawaii. Proceedings of the 2000 IEEE Geoscience and Remote Sensing Symposium, 2000: 669-671.
    [11] Sungmoon C, Sang H O, Soo-young L. Support vector machines with binary tree architecture for multi-class classification[J]. Neural Information Processing - Letters and Reviews, 2004, 2(3): 47-51.
    [12] Bose R C, Ray-chauduri D K. On a class of error correcting binary group codes[J]. Information and Control, 1960(3): 68-79.
    [13] Dietterich T G, Bakiri G. Solving multiclass learning problems via error-correcting output codes[J]. Journal of Artificial Intelligence Research, 1995, 11(2): 263-286.
    [14] Kindermann J, Leopold E, Paass G. Multi-class Classification with Error-Correcting Codes[C]. //Leopold E, Kirsten M. Treffen der GI-Fachgruppe 1.1.3, Maschinelles Lernen, 2000.
    [15] Vapnik V. Statistical Learning Theory[M]. New York, NY: Wiley, 1998.
    [16] Weston J, Watkins C. Multi-class Support Vector Machines[R]. Technical Report. Egham, UK: University of London, 1998.
    [17] Crammer K, Singer Y. On the learnability and design of output codes for multiclass problems[C]. Proceedings of the Thirteenth Annual Conference on Computational Learning Theory, 2000: 35-46.
    [18] Hsu C W, Lin C J. A comparison of methods for multi-class support vector machines[J]. IEEE Transactions on Neural Networks, 2002, 13(2): 415-425.
    [19] Cortes C, Vapnik V. Support-Vector Networks[J]. Machine Learning, 1995, 20: 273-297.
    [20] Scholkopf B, Smola A, Williamson R C, et al. New support vector algorithms[J]. Neural Computation, 2000, 12: 1207-1245.
    [21] Scholkopf B, Platt J C, Shawe-taylor J, et al. Estimating the support of a high-dimensional distribution[J]. Neural Computation, 2001, 13(7).
    [22] Lee Y J, Mangasarian O L. RSVM: Reduced support vector machines[C]. Proceedings of the First SIAM International Conference on Data Mining, 2001.
    [23] Osuna E, Freund R, Girosi F. Support vector machines: training and applications[R]. Technical Report. Cambridge, MA: MIT Artificial Intelligence Laboratory, 1997.
    [24] Suykens J A K, Vandewalle J. Least squares support vector machine classifiers[J]. Neural Processing Letters, 1999, 9(3): 293-300.
    [25] Hansen L K, Salamon P. Neural network ensembles[J]. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 1990, 12(10): 993-1001.
    [26] Schapire R E. The strength of weak learnability[J]. Machine Learning, 1990, 5(2): 197-227.
    [27] Dietterich T G. Ensemble methods in machine learning[C]. //Kittler J, Roli F. FirstInternational Workshop on Multiple Classifier Systems. Cagliari, Italy. Lecture Notes in Computer Science. Springer-Verlag Berlin, 2000: 1-15.
    [28] Krogh A, Vedelsby J. Neural network ensembles, cross validation, and active learning[M]. //Tesauro G, Touretzky D, Leen T. Advances in Neural Information Processing Systems. Cambridge, MA: MIT Press, 1995, 7: 231--238.
    [29] Hashem S, Schmeiser B, Yih Y. Optimal linear combinations of neural networks: an overview[C]. Orlando, FL. Proceedings of the 1994 International Conference on Neural Networks, 1994: 1507-1512.
    [30] Maclin R, Shavlik J. Combining the predictions of multiple classifiers: using competitive learning to initialize neural networks[C]. Montreal, Canada. Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, 1995: 524-530.
    [31] Ji C, Ma S. Combination of weak classifiers[J]. IEEE Transations on Neural Networks, 1997, 8(1): 32-42.
    [32] Kuncheva L I, Whitaker C J, Shipp C A, et al. Is independence good for combining classifiers?[C]. Barcelona, Spain. Proceedings of the 15th International Conference on Pattern Recognition, 2000: 168-171.
    [33] Dietterich T. Ensemble learning[M]. //Arbib M. The Handbook of Brain Theory and Neural Networks. 2nd EditionMIT Press, 2002.
    [34] Hornik K. Approximation capabilities of multilayer feedforward networks[J]. Neural Networks, 1991, 4: 251-257.
    [35] Park J, Sandberg I W. Approximation and radial basis function networks[J]. Neural Computation, 1993, 5(2): 305-316.
    [36] Breiman L. Bagging predictors[J]. Machine Learning, 1996, 24(2): 123-140.
    [37] Dong Y S, Han K S. Boosting SVM classifiers by ensemble[C]. Special interest tracks and posters of the 14th international conference on World Wide Web. ACM Press, 2005: 1072-1073.
    [38] Skurichina M, Duin R P W. Bagging, boosting and the random subspace method for linear classifiers[J]. Pattern Analysis and Applications. (in press)., 2002, 5: 121-135.
    [39] Anand R, Mehrotra K, Mohan C K, et al. Efficient classification for multiclass problems using modular neural networks[J]. IEEE Transactions on Neural Networks, 1995, 6(1):117-124.
    [40] Kolen J, Pollack J. Back propagation is sensitive to initial conditions[M]. //Lippmann R P, Moody J E, Touretzky D S. Advances in Neural Information Processing Systems. San Francisco, CA: Morgan Kauffman, 1991, 3: 860-867.
    [41] Parmanto B, Munro P W, Doyle H R. Improving committee diagnosis with resampling techniques[M]. //Touretzky D S, Mozer M C, Hesselmo M E. Advances in Neural Information Processing Systems. Cambridge, MA: MIT Press, 1996, 8: 882-888.
    [42] Raviv Y, Intrator N. Bootstrapping with noise: an effective regularization technique[J]. Connection Science, 1996, 8(3-4): 355-372.
    [43] Giacinto G, Roli F. An approach to the automatic design of multiple classifier systems[J]. Pattern Recognition Letters, 2001, 22: 25-33.
    [44] Caruana R, Niculescu-mizil A, Crew G, et al. Ensemble selection from libraries of models[C]. International Conference on Machine Learning. International Conference on Machine Learning, 2004.
    [45] Jain A, Duin R, Mao J. Statistical pattern recognition: a review[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000, 22: 4-37.
    [46] Kuncheva L I, Bezdek J C, Duin R P W. Decision templates for multiple classifier fusion: an experimental comparison[J]. Pattern Recognition, 2001, 34(2): 299-314.
    [47] Kittler J, Hatef M, Duin R P, et al. On combining classifiers[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1998, 20(3): 226-239.
    [48] Kittler J. Combining classifiers: a theoretical framework[J]. Pattern Analysis and Applications, 1998(1): 18-27.
    [49] Lam L. Classifier combinations: Implementations and theoretical issues[C]. //Science L N I C. First International Workshop on Multiple Classifier Systems. Cagliari, Italy. Springer-Verlag, 2000: 77-86.
    [50] Parmanto B, Munro P, Doyle H. Reducing variance of committee predition with resampling techniques[J]. Connection Science, 1996, 8(3/4): 405-416.
    [51] Freund Y, Schapire R E. A decision-theoretic generalization of on-line learning and an application to boosting[C]. Proceedings of Second European Conference on Computational Learning Theory, 1995: 23-37.
    [52] Freund Y, Schapire R E. Experiments with a new boosting algorithm[C]. //Saitta L. Proceedings of the 13th International Conference on Machine Learning. Morgan Kaufmann, 1996: 148-156.
    [53] Freund Y, Schapire R E. A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting[J]. Journal of Computer and System Sciences, 1997, 55(1): 119-139.
    [54] Freund Y, Schapire R E. A short introduction to boosting[J]. Journal of Japanese Society for Artificial Intelligence, 1999, 14(5): 771-780.
    [55] Schapire R E, Singer Y. Improved boosting algorithms using confidence-rated predictions[J]. Machine Learning, 1999, 37(3): 297-336.
    [56] Schapire R E, Freund Y, Bartlett P, et al. Boosting the margin: A new explanation for the effectiveness of voting methods[J]. The Annals of Statistics, 1998, 26(5): 1651-1686.
    [57] Breiman L. Arcing classifiers[J]. Annals of Statistics, 1998, 26(3): 801-849.
    [58] Khanchel R, Limam M. Empirical comparison of Arcing algorithms[C]. International Symposium on Applied Stochastic Models and Data Analysis. Brest, France, 2005: 1433-1440.
    [59] Ho T K. The random subspace method for constructing decision forests[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1998, 20(8): 832-844.
    [60] Tumer K, Oza N C. Decimated input ensembles for improved generalization[C]. The IEEE-INNS-ENNS International Joint Conference on Neural Networks, 1999: 3069-3074.
    [61] Oza N C, Tumer K. Input Decimation Ensembles: Decorrelation through Dimensionality Reduction[C]. //Roli J K A F. Second International Workshop on Multiple Classifier Systems. Cambridge, UK. Lecture Notes in Computer Science. Springer-Verlag, 2001: 238-247.
    [62] Xu L, Krzyzak C, Suen C Y. Methods of combining multiple classifiers and their applications to handwriting recognition[J]. Systems, Man and Cybernetics, IEEE Transactions on, 1992, 22(3): 418-435.
    [63] Park H S, Lee S W. Off-line recognition of large sets handwritten characters with multiple Hidden-Markov models[J]. Pattern Recognition, 1996, 29(2): 231-244.
    [64] Van M B, Duin R P W, Tax D, et al. Combining classifiers for the recognition ofhandwritten digits[C]. Prague, Czech Republic. 1st IAPR TC1 Workshop on Statistical Techniques in Pattern Recognition, 1997: 13–18.
    [65] Cherkauker K J. Human expert-level performance on a scientific image analysis task by a system using combined artificial neural networks[C]. // C P. Working notes of the AAAI Workshop on Integrating Multiple Learned Models, 1996: 15-21.
    [66] Kuncheva L I. Genetic algorithm for feature selection for parallel classifiers[J]. Information Processing Letters, 1993, 46(4): 163–168.
    [67] Opitz D. Feature selection for ensembles[C]. Proceedings of 16th National Conference on Artificial Intelligence (AAAI), 1999: 379-384.
    [68] Sullivan J, Langford J, Caruana R, et al. Featureboost: a meta-learning algorithm that improves model robustness[C]. Proceedings of the Seventeenth International Conference on Machine Learning, 2000.
    [69] Tumer K, Ghosh J. Error correlation and error reduction in ensemble classifiers[J]. Connection Science, 1996, 8(3-4): 385-404.
    [70] Kuncheva L I, Whitaker C J. Feature Subsets for Classifier Combination: An Enumerative Experiment[C]. //Roli J K A F. Second International Workshop on Multiple Classifier Systems. Cambridge, UK. Lecture Notes in Computer Science. Springer-Verlag, 2001: 228-237.
    [71] Masulli F, Valentini G. Comparing decomposition methods for classification[C]. //Howlett R J, Jain L C. Piscataway, NJ. Fourth International Conference on Knowledge-Based Intelligent Engineering Systems and Allied Technologies, 2000: 788-791.
    [72] Hastie T, Tibshirani R. Classification by pairwise coupling[J]. The Annals of Statistics, 1998, 26(1): 451-471.
    [73] Moreira M, Mayoraz E. Improved pairwise coupling classifiers with correcting classifiers[C]. //Nedellec C, Rouveirol C. 10th European Conference on Machine Learning. Chemnitz, Germany. Lecture Notes in Artificial Intelligence. Berlin, Heidelberg, New York, 1998: 160-171.
    [74] Dietterich T G, Bakiri G. Error-correcting output codes: A general method for improving multiclass inductive learning programs[C]. Proceedings of AAAI-91. AAAI Press/MIT Press, 1991: 572-577.
    [75] Kong E, Dietterich T G. Error-correcting output coding correct bias and variance[C]. San Francisco, CA. The XII International Conference on Machine Learning. Morgan Kauffman, 1995: 313-321.
    [76] Schapire R E. Using output codes to boost multiclass learning problems[C]. Proceedings of the Fourteenth International Conference on Machine Learning, 1997: 313-321.
    [77] Allwein E L, Schapire R E, Singer Y. Reducing multiclass to binary: A unifying approach for margin classifiers[J]. Journal of Machine Learning Research, 2000, 1: 113-141.
    [78] Ricci F, Aha D W. Extending local learners with error-correcting output codes[R]. Technical Report. Naval Research Laboratory, Navy Center for Applied Research in Artificial Intelligence, 1997.
    [79] Kwok S W, Carter C. Multiple decision trees[M]. //Schachter R D, Levitt T S, Kannal L N, et al. Uncertainty in Artificial Intelligence. Amsterdam: Elsevier Science, 1990, 4: 327-335.
    [80] Quinlan J R. C4.5 Programs for Machine Learning[M]: Morgan Kauffman, 1993.
    [81] Dietterich T G, Kong E B. Machine learning bias, statistical bias, and statistical variance of decision tree algorithms[R]. Technical Report. Corvallis, Oregon: Department of Computer Science, Oregon State University, 1995.
    [82] Mackay D. A practical Bayesian framework for backpropagation networks[J]. Neural Computation, 1992, 4(3): 448-472.
    [83] Neal R. Probabilistic inference using Markov chain Monte Carlo methods[R]. Technical Report. Toronto, CA: Department of Computer Science, University of Toronto, 1993.
    [84] Chipman H, George E, Mcculloch R. Bayesian CART[R]. Technical Report. Chicago: Department of Statistics, University of Chicago, 1996.
    [85] Ali K M, Pszzani M J. Error reduction through learning multiple descriptions[J]. Machine Learning, 1996, 24(3): 173-202.
    [86] Sharkey A, Sharkey N, Gerecke U, et al. The test and select approach to ensemble combination[C]. //Kittler J, Roli F. First International Workshop on Multiple Classifier Systems. Cagliari, Italy. Lecture Notes in Computer Science. Springer-Verlag, 2000: 30-44.
    [87] Srihari S N. Reliability analysis of majority vote systems[J]. Information Sciences, 1982, 26: 243-256.
    [88] Lee D S. A Theory of Classifier Combination: The Neural Network Approach[D]. Buffalo:State University of New York, 1995.
    [89] Huang Y S, Suen C Y. A Method of Combining Multiple Experts for the Recognition of Unconstrained Handwritten Numerals[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1995, 17(1): 90-94.
    [90] 寇忠宝, 张长水. 基于Multi-Agent的分类器融合[J]. 计算机学报, 2003, 26(2): 174-179.
    [91] De Borda J. Memoire sur les Elections au Scrutin[J]. Histoire de l'Academie Royale des Sciences, 1781.
    [92] Pampel F C. Logistic regression: a primer[M]. Thousand Oaks, CA: Sage Publications, 2000.
    [93] Opitz D W, Shavlik J W. Actively searching for an effective neural network ensemble[J]. Connection Science, 1996, 8(3/4): 337-353.
    [94] Kim D, Kim C. Forecasting time series with genetic fuzzy predictor ensemble[J]. IEEE Transaction on Fuzzy Systems, 1997, 5(4): 523-535.
    [95] Dempster A P. Upper and lower probabilities induced by multivalued mapping[J]. Ann Math Statist, 1967, 38(3): 325-339.
    [96] Shafer G. A mathematical theory of evidence[M]: Princeton University Press, 1976.
    [97] 孙怀江, 杨静宇. 一种相关证据合成方法[J]. 计算机学报, 1999, 22(9): 1004-1007.
    [98] 孙怀江, 胡钟山, 杨静宇. 基于证据理论的多分类器融合方法研究[J]. 计算机学报, 2001, 24(3): 231-235.
    [99] Sugeno M. Theory of fuzzy integrals and its applications[D]. Tokyo, Japan: Tokyo Institute of Technology, 1974.
    [100] Murofushi T, Sugeno M. An interpretation of fuzzy measure and the Choquet integral as an integral with respect to a fuzzy measure[J]. Fuzzy Sets and Systems, 1989, 29: 201-337.
    [101] Weber S. ⊥-decomposable measures and integrals for Archimedean t-conorm[J]. J. Math. Appl., 1994, 101: 114-138.
    [102] Tahani H, Keller J M. Information fusion in computer vision using the fuzzy integral[J]. IEEE Transactions on Systems, Man and Cybernetics, 1990, 20(3): 733-741.
    [103] 谢禹, 王雅林, 林洪进, 等. 基于模糊积分的企业信用评级方法研究[J]. 中国软科学, 2004(9): 145-149.
    [104] 哈明虎, 王瑞省, 张琳. 模糊积分在物流系统工程中的应用[J]. 模糊系统与数学, 2004,18(4): 72-76.
    [105] 谷雨, 赵佳枢, 杨柽. 基于负相关学习的支持向量机集成算法[J]. 微电子学与计算机, 2006, 23(3): 58-61.
    [106] 姚明海, 何通能. 一种基于模糊积分的多分类器联合方法[J]. 浙江工业大学学报, 2002, 30(2): 156-159.
    [107] 姚明海, 李澎林. 基于遗传算法和模糊积分的多分类器集成[J]. 计算机应用与软件, 2003, 20(8): 66-68.
    [108] Kuncheva L, Bezdek J C, Sutton M A. On combining multiple classifiers by fuzzy templates[C]. Fuzzy Information Processing Society - NAFIPS, 1998 Conference of the North American, 1998: 193-197.
    [109] Ruta D, Gabrys B. An Overview of Classifier Fusion Methods[J]. Computing and Information Systems, 2000, 7(1): 1-10.
    [110] 帅军. 基于熵和KNN的决策模板法在目标识别中的应用[J], 2005, 17(1): 18-21.
    [111] Hinton G E. Products of Experts[C]. Edinburgh, Scotland. Proceedings of the Ninth International Conference on Artificial Neural Networks, 1999: 1-6.
    [112] 孔莲芳. 集成神经网络信息融合技术在旋转机械故障诊断中的应用[J]. 机械与电子, 2004(1): 12-15.
    [113] 万红梅, 金连文, 尹俊勋. 手写体汉字识别纯神经网络多分类器集成[J]. 计算机工程与应用, 2005, 41(28): 44-4587.
    [114] 周志华, 黄甫杰, 张宏江, 等. 基于神经网络集成的多视角人脸识别[J]. 计算机研究与发展, 2001, 38(10): 1204-1210.
    [115] Pang S N, Kim D, Bang S Y. Fraud detection using support vector machine ensemble[C]. 8th International Conference on Neural Information Processing, 2001: 1344-1349.
    [116] Kim H, Pang S, Je H, et al. Pattern classification using support vector machine ensemble[C]. Los Alamitos, CA. Proceedings of the 16th International Conference on Pattern Recognition. IEEE Computer Society, 2002: 160-163.
    [117] Yan R, Liu Y, Jin R, et al. On predicting rare classes with SVM ensembles in scene classification[C]. IEEE International Conference on Acoustics, Speech and Signal Processing, 2003: 6-10.
    [118] Vapnik V. Estimation of dependence based on empirical data[M]. New York: SpringerVerlag, 1982.
    [119] Osuna E, Freund R. An improved training algorithm for training support vector machines[C]. New York. Proceedings for IEEE Workshop on Neural Networks for Signal Processing, 1997: 252-256.
    [120] Platt J C. Sequential minimal optimization: a fast algorithm for training support vector machines[M]. Cambridge: The MIT Press, 1998: 169-182.
    [121] Frossyniotis D S, Stafylopatis A. A multi-SVM classification system[C]. //Kittler J, Roli F. The Second International Workshop on Multiple Classifier Systems. LNCS 2096, 2001: 198-207.
    [122] Collobert R, Bengio S, Bengio Y. A parallel mixture of SVMs for very large scale problems[J]. Neural Computation, 2002, 14: 1105-1114.
    [123] Liu X M, Hall L O, Bowyer K W. Comments on "Parallel Mixture of SVMs for Very Large Scale Problems"[J]. Neural Computation, 2004, 16(7): 1345-1351.
    [124] Yan W W, Chen Z G, Shao H H. Multi support vector machines decision model and its application[J]. Journal of Shanghai Jiaotong University, 2002, E-7(2): 220-222.
    [125] Bellili A, Gilloux M, Gallinari P. An MLP-SVM combination architecture for offline handwritten digit recognition: reduction of recognition errors by support vector machines rejection mechanisms[J]. International Journal on Document Analysis and Recognition (IJDAR), 2003, 5(4): 244-252.
    [126] Pang S, Kim D, Bang S Y. Membership authentication in the dynamic group by face classification using SVM ensemble[J]. Pattern Recognition Letters, 2003, 24(1-3): 215-225.
    [127] Valentini G, Muselli M, Ruffino F. Bagged Ensembles of Support Vector Machines for Gene Expression Data Analysis[C]. Portland, OR, United States. Proceedings of the International Joint Conference on Neural Networks. Institute of Electrical and Electronics Engineers Inc., 2003: 1844-1849.
    [128] Valentini G, Muselli M, Ruffino F. Cancer recognition with bagged ensembles of support vector machines[J]. Neurocomputing, 2004, 56(1-4): 461-466.
    [129] Dong Y S, Han K S. A comparison of several ensemble methods for text categorization[C]. Shanghai, China. Proceedings - 2004 IEEE International Conference on Services Computing, SCC 2004. IEEE Computer Society, Los Alamitos;Massey University,Palmerston, CA 90720-1314, United States;New Zealand, 2004: 419-422.
    [130] Dong Y S, Han K S. Boosting SVM classifiers by ensemble[C]. The 14th International Conference on World Wide Web. New York, NY, USA, 2005: 1072-1073.
    [131] Osowski S, Hoai L T, Markiewicz T. Support Vector Machine based expert system for reliable heart beat recognition[J]. IEEE Transactions on Biomedical Engineering, 2004, 51: 582-589.
    [132] Sun B Y, de Huang S. Least squares support vector machine ensemble[C]. Budapest, Hungary. IEEE International Conference on Neural Networks - Conference Proceedings. Institute of Electrical and Electronics Engineers Inc., Piscataway, NJ 08855-1331, United States, 2004: 2013-2016.
    [133] Wang P, Ji Q. Multi-view face detection under complex scene based on combined SVMs[C]. 17th International Conference on Pattern Recognition. Cambridge, United Kingdom, 2004: 179-182.
    [134] 王海瑜, 潘泉, 张洪才, 等. 基于多支持向量机的DT算法研究及应用[J]. 计算机工程与应用, 2004, 40(18): 83-84143.
    [135] 魏玲, 张文修. 基于支持向量机集成的分类[J]. 计算机工程, 2004, 30(13): 1-217.
    [136] Li X C, Wang L, Sung E. A study of Adaboost with SVM based weak learners[C]. IEEE International Joint Conference on Neural Networks. Montreal, Canada, 2005: 196-201.
    [137] Li X C, Wang L, Sung E. Adaboost with SVM-based component classifiers[J]. under review of IEEE Transaction on System, Man and Cybernetics, 2005.
    [138] Valentini G, Dietterich T G. Bias-vairance analysis and ensembles of SVM[J]. IEEE Transactions on Systems, Man and Cybernetics, 2005, Part B, 35(6): 1252-1271.
    [139] Valentini G, Dietterich T G. Low Bias Bagged Support Vector Machines[C]. Washington D.C. USA. The Twentieth International Conference on Machine Learning. AAAI Press, 2003: 752-759.
    [140] Valentini G. An application of Low Bias Bagged SVMs to the classification of heterogeneous malignant tissues[C]. Pre-WIRN workshop on Bioinformatics and Biostatistic. Lecture Notes in Computer Science. Springer, 2003: 316-321.
    [141] 谷雨, 郑锦辉, 戴明伟, 等. 基于Bagging支持向量机集成的入侵检测研究[J]. 微电子学与计算机, 2005, 22(5): 17-19.
    [142] 李珩. 基于Stacking算法的组合分类器及其应用于中文组块分析[J]. 计算机研究与发展, 2005, 42(5): 844-848.
    [143] He L M, Yang X B, Kong F S. Support vector machines ensemble with optimizing weights by genetic algorithm[C]. Proceedings of the Fifth International Conference on Machine learning and Cybernetics, 2006: 3503-3507.
    [144] 施建宇, 潘泉, 张邵武, 等. 基于支持向量机融合网络的蛋白质折叠子识别研究[J]. 生物化学与生物物理进展, 2006, 33(2): 155-162.
    [145] Ding C H Q, Dubchak I. Multi-class protein fold recognition using support vector machines and neural networks[J]. Bioinformatics, 2001, 17(4): 349-358.
    [146] Pawlak Z. Rough sets[J]. International Journal of Computer and Information Sciences, 1982, 11: 341-356.
    [147] Pawlak Z. Rough sets[J]. Communications of ACM, 1995, 38(11): 89-95.
    [148] Pawlak Z. Rough Sets Theory and Its Applications to Data Analysis[J]. Cybernetics and Systems: An International Journal, 1998, 29: 661-688.
    [149] An A, Chan C, Shun N, et al. Applying Knowledge Discovery to Predict Water-Supply Consumption[J]. IEEE Expert, 1997, 12(4): 72-78.
    [150] Pawlak Z, Slowinski R. Rough Set Approach to Multiattribute Decision Analysis[J]. European Journal of Operational Research, 1994, 72: 443-459.
    [151] Jeonek J, Krawiec K, Slowinski R. Rough Set Reduction of Attributes and Their Domains for Neural Networks,[J]. Computer Intelligence, 1995, 11(2): 339-347.
    [152] 李波, 李新军. 一种基于粗糙集和支持向量机的混合分类算法[J]. 计算机应用, 2004, 24(3): 65-6770.
    [153] 何明, 李博, 马兆丰, et al. 粗糙集理论框架下的神经网络建模研究及应用[J]. 控制与决策, 2005, 20(7): 782-785.
    [154] 冯志鹏, 宋希庚, 薛冬新. 基于广义粗糙集与神经网络集成的旋转机械故障诊断研究[J]. 机械科学与技术, 2003, 22(5): 815-820.
    [155] 张文修, 吴伟志. 粗糙集理论介绍和研究综述[J]. 模糊系统与数学, 2000, 14(4): 1-12.
    [156] 韩祯祥, 张琦, 文福拴. 粗糙集理论及其应用[J]. 信息与控制, 1998, 27(1): 37-45.
    [157] Ohrn A. Dicernibility and Rough Sets in Medicine: Tools and Applications[D]. N-749 Trondheim, Norway: Norwegian University of Science and Technology, 1999.
    [158] Johnson D S. Approximation algorithms for combinatorial problems[J]. Journal of Computer and System Sciences, 1974, 9: 256-278.
    [159] Vinterbo S, Ohrn A. Minimal approximate hitting sets and rule templates[J]. International Journal of Approximate Reasoning, 2000, 25(2): 123-143.
    [160] Bazan J G, Skowron A, Synak P. Dynamic reducts as a tool for extracting laws from decision tables[C]. International Symposium on Methodologies for Intelligent Systems. Lecture Notes in Artificial Intelligence. Springer-Verlag, 1994: 346-355.
    [161] Wroblewski J. Finding minimal reducts using genetic algorithms[C]. The Second International Joint Conference on Information Sciences, 1995: 186-189.
    [162] Bazan J, Nguyen H S, Nguyen S H, et al. Rough Set Algorithms in Classification Problem[M]. //Polkowski L, Tsumoto S, Lin T Y. Rough Set Methods and ApplicationsPhysica-Verlag, Heidelberg, New York, 2000: 49-88.
    [163] Skowron A, Rauszer C. The Discernibility Matrices and Functions in Information Systems[M]. //Slowinski R. Intelligent Decision Support: Handbook of Applications and Advances of the Rough Sets Theory. Dordrecht: Kluwer Academic Publishers, 1992: 331-362.
    [164] James Dougherty, Ron Kohavi M A. Supervised and Unsupervised Discretization of Continuous Features[C]. Proceedings of the Twelfth International Conference on Machine Learning. Morgan Kaufmann Publishers, San Francisco, CA, 1995: 194-202.
    [165] Nguyen H S, Skowron A. Quantization of Real Value Attributes: Rough Set and Boolean Reasoning Approach[C]. Proceedings of the Second Joint Annual Conference on Information Sciences, Society for Information Processing, 1995: 34-37.
    [166] Nguyen S H, Nguyen H S. Some efficient algorithms for Rough Set methods[C]. Proceedings of the Conference on Information Processing and Management of Uncertainty in Knowledge Based Systems. Granada, Spain, 1996: 1451-1456.
    [167] Sarle W S. Neural Networks FAQ, periodic posting to the Usenet newsgroup comp.ai.neural-nets[EB/OL]. 1997. ftp://ftp.sas.com/pub/neural/FAQ.html.
    [168] Newman D J, Hettich S, Blake C L, et al. UCI Repository of machine learning databases[M]: Irvine, CA: University of California, Department of Information and Computer Science, 1998.
    [169] Ohrn A. ROSETTA Technical Reference Manual[Z].Knowledge Systems Group, Department of Computer and Information Science, Norwegian University of Science and Technology (NTNU), Trondheim, Norway, 2001.
    [170] Ma J, Zhao Y, Ahalt S. OSU Support Vector Machines (SVMs) Toolbox[DB/CD]. 3.0 ed. http://www.ece.osu.edu/~maj/osu_svm/.
    [171] Hsu C W, Chang C C, Lin C J. A practical guide to support vector classification[EB/OL]. 2003. http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf.
    [172] Quinlan J R. Bagging, boosting, and C4.5[C]. Portland, OR. Proceedings of the 13th National Conference Artificial Intelligence. AAAI Press and the MIT Press, 1996: 725-730.
    [173] 杨叔子, 丁洪, 史铁林, et al. 基于知识的诊断推理[M]. 北京: 清华大学出版社, 1993: 30-31.
    [174] Holte R C. Very simple classification rules perform well on most commonly used datasets[J]. Machine Learning, 1993(11): 63-90.
    [175] Chiu D K Y, Cheung B, Wong A K C. Information synthesis based on hierarchical entropy discretization[J]. Journal of Experimental and Theoretical Artificial Intelligence, 1990(2): 117-129.
    [176] Catlett J. On changing continuous attributes into ordered discrete attributes[C]. //Kodratoff Y. Proceedings of the European Working Session on Learning. Porto, Portugal. Lecture Notes in Artificial Intelligent. Springer-Verlag, 1991: 164-178.
    [177] Fayyad U M, Irani K B. Multi-interval discretization of continuous-valued attributes for classification learning[C]. Proceedings of the 13th International Joint Conference on Artificial Intelligence. Morgan Kaufmann, 1993: 1022-1027.
    [178] Chmielewski M R, Grzymala-busse J W. Global discretization of attributes as preprocessing for machine learning[C]. Proceedings of the Third International Workshop on Rough Sets and Soft Computing, 1994: 294-301.
    [179] Pfahringer B. Compression-based discretization of continuous attributes[C]. //Prieditis A, Russell S. Proceedings fo the 12th International Conference on Machine learning. San Francisco, CA. Morgan Kaufmann, 1995: 456-463.
    [180] Kerber R. ChiMerge: Discretization of Numeric Attributes[C]. Proceedings of the Tenth National Conference on Artificial Intelligence. AAAI/MIT Press, 1992: 123-128.
    [181] Liu H, Setiono R. Feature selection via discretization[J]. IEEE Transaction on Knowledge and Data Engineering, 1997, 9(4).
    [182] Tay F E H, Shen L X. A modified Chi2 algorithm for discretization[J]. IEEE Transactions on Knowledge and Data Engineering, 2002, 14(3): 666-670.
    [183] 代建华, 李元香, 刘群. 粗糙集理论中基于遗传算法的离散化算法[J]. 计算机工程与应用, 2003(8): 13-1428.
    [184] 王飞, 刘大有, 薛万欣. 基于遗传算法的Bayesian网中连续变量离散化的研究[J]. 计算机学报, 2002, 25(8): 794-800.
    [185] 赵卫东, 戴伟辉, 蔡斌. 遗传算法在决策表连续属性离散化中的应用研究[J]. 系统工程理论与实践, 2003(1): 62-67.
    [186] Nguyen H S, Nguyen S H, Skowron A. Searching for features definded by hyperplanes[C]. 9th International Symposium on Foundations of Intelligent Systems, 1996: 366-375.
    [187] Nguyen S H, Nguyen H S. Pattern Extraction from Data[J]. Pattern Extraction from Data, 1998, 34(1-2).
    [188] Ohrn A, Komorowski J. Rosetta: a rough set toolkit for analysis of data[C]. //Wang P P. Durham, NC, USA. Third International Joint Conference on Information Sciences, Fifth International Workshop on Rough Sets and Soft Computing, 1997: 403-407.
    [189] Brazdil P. Statlog repository[DB/CD]. 1999. http://www.liacc.up.pt/ML/statlog/datasets.html.
    [190] Partridge D, Yates W B. Engineering multiversion neural-net systems[J]. Neural Computation, 1996, 8: 869-893.
    [191] Partridge D. Network generalization differences quantified[J]. Neural Networks, 1996, 9: 263-271.
    [192] Tsymbal A, PádraigCunningham, Pechenizkiy M, et al. Search strategies for ensemble feature selection in medical diagnostics[C]. CBMS, 2003: 124-129.
    [193] Zhou Z H, Wu J X, Tang W, et al. Selectively ensembling neural classifiers[C]. Neural Networks, 2002. IJCNN '02. Proceedings of the 2002 International Joint Conference on, 2002: 1411-1415.
    [194] Barai S V, Reich Y. Ensemble modeling or selecting the best model: Many could be better than one[J]. Artificial Intelligence for Engineering Design, Analysis and Manufacturing:AIEDAM, 1999, 13(5): 377-386.
    [195] Zhou Z H, Wu J X, Tang W. Ensembling neural networks: many could be better than all[J]. Artificial Intelligence, 2002, 137.
    [196] Holland J H. Adaptation in natural and artificial systems[M]. Michigan: University of Michigan Press, 1975.
    [197] Houck C R, Joines J A, Kay M G. A genetic algorithm for function optimization: a Matlab implementation[R]. Technical Report. North Carolina State university, 1995.
    [198] Halpern J Y, Fagin R. Two views of belief as generalized probability and belief as evidence[J]. Artificial Intelligence, 1992, 54(2): 275-317.
    [199] Yao Y Y, Lingras P J. Interpretations of belief functions in the theory of rough sets[J]. Information Sciences, 1998, 104(1-2): 81-106.
    [200] 刘大有, 李岳峰. 广义证据推理的解释[J]. 计算机学报, 1997, 20(2): 158-164.
    [201] Platt J C. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods[M]. Advances in Large-Margin ClassifiersMIT Press, 2000.
    [202] Zadeh L. A simple view of the Dempster-Shafer theory of evidence and its implication for the rule of combination[J]. AI Magazine, 1986, 7(1): 85-90.
    [203] Murphy C K. Combining belief functions when evidence conflicts[J]. Decision Support Systems, 2000, 29(1): 1-9.
    [204] Chen L Z, Shi W K, Deng Y, et al. A New fusion approach based on distance of evidences[J]. Journal of Zhejiang University: Science, 2005, 6 A(5): 476-482.
    [205] 孙全,叶秀清,顾伟康. 一种新的基于证据理论的合成公式[J]. 电子学报, 2000, 28(8): 117-119.
    [206] 邓勇,施文康. 一种改进的证据推理组合规则[J]. 上海交通大学学报, 2003, 37(8): 1275-1278.
    [207] Yager R R. ON THE DEMPSTER-SHAFER FRAMEWORK AND NEW COMBINATION RULES[J]. Information Sciences, 1987, 41(2): 93-137.
    [208] Tay F E, Shen L. Fault diagnosis based on Rough Set Theory[J]. Engineering Applications of Artificial Intelligence, 2003, 16(1): 39-43.
    [209] Shen L, Tay F E, Qu L, et al. Fault diagnosis using Rough Sets Theory[J]. Computers in Industry, 2000, 43(1): 61-72.
    [210] Vinterbo S, Ohrn A. Minimal approximate hitting sets and rule templates[J]. International Journal of Approximate Reasoning, 2000, 25(2): 123-143.