领域自适应学习算法及其应用研究

英文题名：A Study on Domain Adaptation Algorithm and Its Application
作者：许敏
论文级别：博士
学科专业名称：轻工信息技术与工程
中文关键词：领域自适应 ; 最小包含球 ; 核心集 ; 支持向量域描述 ; 支持向量机 ; L2核分类器 ; 浓缩集概率密度估计 ; 密度估计线性组合空间
英文关键词：domain adaptation ; minimum enclosing ball(MEB) ; core set ; support vector
英文关键词：domain description(SVDD) ; support vector machine(SVM) ; L2Kernel Classifier(L2KC) ; reduced set density estimation(RSDE) ; liner combination space of density
学位年度：2014
导师：王士同
学科代码：0822
学位授予单位：江南大学
论文提交日期：2014-04-01

摘要

传统的机器学习假定训练域与测试域独立同分布，将由训练数据集得到的模型直接应用于测试集。但在实际应用中，这种假设并不一定成立，若训练域与测试域分布存在差异，则传统机器学习的性能将会大大降低，故领域自适应学习得以提出，其目标是在领域间建立桥梁，提高测试域预测性能，并广泛应用于解决现实世界中的分类、回归、概率密度估计等机器学习问题。目前，许多国内外专家学者对领域适应学习进行了深入的研究，并获得了许多重要的研究成果，且广泛应用于实际生产中，但仍有许多问题需要进一步的探索和研究。本课题主要从概率密度估计、支持向量域描述、分类、回归等4个方面进行深入的领域自适应学习研究。主要内容如下：
     1、基于最小包含球的领域自适应学习。相同应用领域，不同时间、地点或设备检测到的数据域不一定完整。针对如何进行源域与目标域间知识传递的问题，在支持向量域描述、分类与回归等问题在数学模型上均可等价于中心约束最小包含球的前提下，首次提出相似领域的概率密度差可由两域最小包含球中心点表示，且其上限值与半径无关的定理。基于此定理，提出一种新颖的领域自适应算法，算法中心思想是先将各算法的数学模型转换成其各自等价的最小包含球模型，再利用源域最小包含球中心点对目标域最小包含球中心点进行校正，从而提高目标域机器学习的性能。这种传递中心点，即源域知识的领域自适应算法具有源域数据隐私保护的优点，且新算法仍等价于中心约束最小包含球的理论证明，使所提算法可利用核心集技术解决大规模数据集问题。实验结果表明，这种领域自适应算法可弥补目标域缺失数据的不足，大大提高算法性能。
     2、基于SVM的领域间迁移学习算法。当与某领域相关的新领域出现时，标注这个新领域样本可能代价昂贵，而丢弃所有旧领域数据又显得十分浪费。故提出基于SVM算法的迁移学习新算法TL-SVM，其主要思想是SVM分类器由(w,b)组成，若两领域相关，则两域分类器各自的w值应相近，通过训练目标域少量已标签数据和学习源领域的知识w s来为目标域构建一个高质量的分类模型，实现领域间的知识迁移学习。该方法继承了基于经验风险最小化的最大间隔SVM的优点，又弥补了传统SVM不能进行知识迁移的缺陷。
     将上述理论成果进一步应用于基于密度差(Difference Of Density, DOD)思想的L2核分类器。L2核分类器算法具有良好的分类性能及稀疏性，然而其训练域与测试域独立同分布的假设限制了其应用范围。针对此不足，在L2核分类器的数学模型等价于变形SVM的理论前提下，对其等价的变形SVM进行知识迁移学习，提出具有领域间迁移学习能力的L2核分类器，该算法既保持了L2核分类器算法良好的分类性能，又能处理数据集缓慢变化及训练集在特定约束条件下获得导致训练集和未来测试集分布不一致的问题。
     3、基于浓缩集概率密度估计(Reduced set density estimation, RSDE)算法的领域自适应学习。RSDE算法是一种基于核的密度估计器，它仅使用数据样本中的一小部分的线性组合来表示概率密度估计式，与传统Parzen Window概率密度估计法相比，极大降低计算复杂度的同时实现了数据浓缩的目的，但该算法必须满足训练集与测试集独立同分布条件。本文提出一种新颖的基于RSDE算法的领域自适应概率密度估计方法A-RSDE，通过学习源域(训练域)概率密度函数p (x;θ1)，使目标域(测试域)概率密度估计函数q (x;θ2)最优逼近真实密度函数q(x)的同时，与源域概率密度函数p (x;θ1)也最优逼近，达到领域自适应学习目的；并用基于近似最小包含球的核心集快速算法求解A-RSDE，将其应用于大数据集密度估计。
     上述概率密度函数均可看作密度估计线性组合空间上的概率密度估计式，在此基础上进一步提出密度估计线性组合空间概念，指出若需求线性组合空间内的密度估计函数，可由高斯函数为基函数的线性组合在ISE标准下逼近，并进一步提出密度估计线性组合空间的近似框架。该框架的优势在于可直接对概率密度线性组合函数进行估计而不必依次估计各域的密度函数，与传统概率密度估计法相比具有更好地精度；其参与运算的数据规模为l，l值远小于样本总数，适用于大规模数据集；该框架可应用于分类、数据浓缩、随机变量间的独立性检测、回归模型变量选择、条件概率密度估计等；若使该线性组合空间逼近某已知空间，可用于源域与目标域近似度估计，适用于多源领域自适应学习。
Traditional machine learning algorithms assume that the training data and the test dataare drawn from the same distribution. Models that are purely trained from the training data areapplied directly to the test data. Unfortunately, in practice this assumption is often too strong.Given that the instances in the two domains are drawn from different distributions, traditionalmachine learning can not achieve high performance on the new domain. Therefore domainadaptation algorithms are designed to build a bridge between the training data and the testdata in order to improve the performance of the test domain prediction and these algorithmsare widely used to solve real-world classification, regression, probability density estimationproblems in machine learning problems. Currently, many experts and scholars conductin-depth study in the field of domain adaptation, obtain a number of important research resultsand widely applied them in the actual production. However, there are still many issues whichneed further exploration and research. Several issues are addressed in this dissertation aboutdomain adaptation from four aspects of probability density estimation, support vector domaindescription, classification and regression. The main contents are as follows:
     1. This dissertation proposed a novel domain adaptation algorithm which based onminimum enclosing ball. For many machine learning problems, the incomplete data collectionwould lead to low prediction performance, which arises the issue of domain adaptation. Basedon the theory that many kernel methods such as support vector domain description (SVDD),support vector machine (SVM) and support vector regression (SVR) can be equivalentlyformulated as minimum enclosing ball (MEB) or center-constrained minimum enclosing ball(CC-MEB) problems in computational geometry, novel algorithms are proposed. In order tosolve the problem that how to effectively transfer the knowledge between the two fields, thenew theorem is revealed that the difference between two probability distributions from twosimilar domains only depends on the centers of the two domains’ minimum enclosing balls.Based on these claims, fast adaptive algorithms are proposed for large domain adaptation.These proposed algorithms use the center of the source domain’s MEB or CC-MEB tocalibrate the center of the target domain’s in order to improve the machine learningalgorithms’ performance of the target domain. Experimental results show that these proposeddomain adaptive algorithms can make up for the lack of missing data and greatly improve theperformance of the target domain’s machine learning problems.
     2. A novel transfer learning algorithm based on SVM was proposed in the dissertation.When task from one new domain comes, relabeled the new domain samples costly and itwould also be a waste to discard all the old domain data. A novel algorithm TL-SVM basedon SVM algorithm was proposed. The main idea of this algorithm is that SVM classifier iscomposed of (w, b). If two domains are related, the values of w about the two domains’classifier respectively should be similar. We can build a high-performance classificationmodel by using a small amount of the target domain’s samples and the knowledgew sof thesource domain to accomplish the transfer learning between two domains. The method inheritsthe advantages of the maximum interval SVM based on empirical risk minimization and makes up for the defects that traditional SVM can not migrate knowledge.
     The above theoretical results can be further applied to L2kernel classifier which based onthe concept of the difference of density. L2kernel classifier has good classification effect andsparsity, however, the premise that the training domain and testing domain are independentand identically distributed severely constrains its usefulness. In order to overcome thisshortcoming, under the premise that L2kernel classifier is equivalent to a deformation SVMand knowledge can transfer through its equivalent deformation SVM. So a novel classifiernamed transfer learnging-L2kernel classification (TL-L2KC) is proposed. This classifier candeal with the problem that training set and test set distribution inconsistencies which causedby dataset’s changing slowly or training set obtained in a specific constraints. And at the sametime the algorithm can inherit the good performance of L2KC.
     3. Reduced set density estimation (RSDE) algorithm provides a kernel based densityestimator which employs a small percentage of the available data sample and is optimal in theL2sense. This method provides a reduced set density estimator with comparable accuracy tothat of the full sample Parzen density estimator and demonstrates a nicer performance in thecomputational time, but it can not work well when the training set and the testing set are notindependent and identically. In order to achieve the above goal, a novel A-RSDE is proposedfor adaptive probability density estimation by making full use of the source domain's (trainingdataset)knowledge p (x;θ1)of the probability density distribution, which lets the targetdomain's (testing dataset) probability density estimation q (x;θ2)be closer to the trueprobability density distribution q(x). Meanwhile, the fast core-sets based minimum enclosingball (MEB) approximation algorithm is introduced to develop the proposed algorithmA-FRSDE.
     The above RSDE, A-RSDE algorithms can be viewed as the probability densityestimation in a linear combination space of densities. It is introduced to develop itsapproximation framework based on a linear combination of Gaussian basis functions underintegrated square error criterion. The proposed approximation framework has threeadvantages. Firstly, it can directly estimate the probability density function of the linearcombination space of densities without having to estimate the probability density function ofeach domain, and it has at least comparable to or even better approximation accuracy thantraditional density estimation methods. Secondly, the time complexity of the proposedapproximation framework is, since l is generally much less than the sample size, hence it isvery suitable for large datasets. Thirdly, this proposed framework can be typically used todevelop alternative approaches to classification, data condensation, justification of theindependence between random variables, conditional density estimation and the similarityidentification between multiple source domains and a target domain. If the linear combinationspace of densities is used to approximate a known space, it can be applied to estimate thesource domain and the target domain approximation for multi-source domain adaptivelearning.

引文

1. Vapnik V N. Statistical learning theory[M]. USA: Wiley-Interscience,1998:20-29
    2. Huang K Z, Zheng D N, Sun J. Sparse learning for support vector classification[J]. Pattern RecognitionLetters,2010,31(13):1944-1951
    3.胡文军,王士同. SVDD的快速实时决策方法[J].自动化学报,2011,37(9):1085-1094
    4. Peng X J. TSVR: An efficient twin support vector machine for regression[J]. Neural Networks,2010,23(3):365-372
    5. Freedman D, Kisilev P. Fast data reduction via KDE approximation[C]. Proc. of the data compressionconference. Snowbird, USA: IEEE,2009.445-445
    6. Sandeepkumar S, Sunita S. Domain adaptation of conditional probability models via featuresubsetting[C]. Proc. of the11th European Conference on Principles and Practice of Knowledge Discoveryin Databases. Warsaw, Poland,2007.224-235
    7. Corinna C, Mehryar M. Domain Adaptation in Regression[J]. Algorithmic Learning Theory LectureNotes in Computer Science,2011(6925):308-323
    8. Glorot X, Bordes A, Bengio Y. Domain adaptation for large-scale sentiment classification: A deeplearning approach[C]. Proc. of the28th International Conference on Machine Learning, ICML.2011
    9. Mart nez A M. Recognizing imprecisely localized, partially occluded, and expression variant faces froma single sample per class[J]. IEEE Trans. on Pattern Analysis and Machine Intelligence,2002,24(6):748-763
    10. Rosenfeld R. A maximum entropy approach to adaptive statistical language modeling[J]. ComputerSpeech and Language,1996(10):187-228
    11. Thrun S, Pratt L Y. Learning to learn [M]. Boston: Kluwer Academic Publishers,1998.1-13
    12. Thrun S, Mitchell T M. Learning one more thing[C]. Proc. of the14thInternational Joint Conference onArtificial Intelligence. San Francisco: Morgan Kaufmann Publishers,1995.1217-1225
    13.徐从富,李石坚,王金龙.机器学习研究与应用新进展[M].杭州:浙江大学人工智能研究所,2006.3-6
    14. Brown A L, Kane M J. Preschool children can learn to transfer: learning to learn and learning fromexample [J]. Cognitive Psychology,1998,20(4):493-523
    15. J.M.索里, C.W.特尔福德.教育心理学[M].高觉敷等译.北京:人民教育出版社,1982.374
    16. Pan S J, Tsang I W, Kwok J T, et al. Domain Adaptation via Transfer Component Analysis [J]. IEEETransactions on Neural Networks,2011,22(2):199-210
    17. David S B, Blitzer J, Crammer K, et al. Analysis of representations for domain adaptation[C]. NeuralInformation Processing Systems. Cambridge: MIT Press,2007.137-144
    18. Daume III H, Marcu D. Domain Adaptation for Statistical Classifiers[J]. Artificial IntelligenceResearch,2006(26):101-126
    19. Bruzzone L, Marconcini M. Domain Adaptation Problems: A DASVM classification technique and acircular validation strategy[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2010,32(5):770-787
    20. Jiang J, Zhai C. Instance Weighting for Domain Adaptation in NLP[C]. In: Carroll J A, Antal van denBosch, Annie Zaenen, eds. Proc. of the45th Ann. Meeting of the Assoc. Computational Linguistics, Prague,Czech Republic,2007.264-271
    21. Huang J, Smola A J, Gretton A, Borgwardt K M, Sch lkopf B. Correcting sample selection bias byunlabeled data [C]. Advances in neural information processing systems19,2007.601–608
    22．Sugiyama M, Nakajima S, Kashima H, von Bünau P, Kawanabe M, Direct importance estimation withmodel selection and its application to covariate shift adaptation[C]. NIPS,2008
    23. Tsuboi Y, Kashima H, Hido S, Bickel S, Sugiyama M. Direct density ratio estimation for large-scalecovariate shift adaptation [J]. Information and Media Technologies,2009,4(2):529-546
    24. Cortes C, Mohri M, Riley M, Rostamizadeh A. Sample Selection Bias Correction Theory[J].Algorithmic Learning Theory Lecture Notes in Computer Science,2008,5254:38-53
    25. Blitzer J, Crammer K, Kulesza A, Pereira F, Wortman J. Learning Bounds for Domain Adaptation[C].Proc.21st Ann. Conf. Neural Information Processing Systems,2008.129-136.
    26. Amittai, Axelrod, He X D, et al. Domain adaptation via pseudo in-domain data selection[C].Proceedings of the2011Conference on Empirical Methods in Natural Language Processing,2011.355-362
    27. Cortes C, Mohri M. Domain adaptation and sample bias correction theory and algorithm forregression[J]. Theoretical Computer Science,2014,519:103-126
    28. Argyriou A, Micchelli C A, Pontil M, Ying Y. A Spectral Regularization Framework for Multi-TaskStructure Learning[C]. Proc.20th Ann. Conf. Neural Information Processing Systems,2008,25-32
    29. Satpal S, Sarawagi S. Domain adaptation of conditional probability models via feature subsetting[C].Knowledge Discovery in Databases: PKDD2007,2007:224–235
    30. Brian Quanz, Jun Huan. Large Margin Transductive Transfer Learning [C]. Proceeding of the18thACM conference on Information and knowledge management (CIKM), ACM New York, NY, USA,2009:1327-1336
    31.陶剑文,王士同.领域适应核支持向量机[J].自动化学报,2012,38(5):797-811
    32. Blitzer J, McDonald R, Pereira F. Domain Adaptation with Structural Correspondence Learning[C].Proc. Conf. Empirical Methods in Natural Language,2006.120-128
    33. Blitzer J, Dredze M, Pereira F. Biographies, Bollywood, Boom-Boxes and Blenders: DomainAdaptation for Sentiment Classification[C]. Proc.45th Ann. Meeting of the Assoc. ComputationalLinguistics,2007.432-439
    34. Huang F, Yates A. Exploring Representation-Learning Approaches to Domain Adaptation[C].Proceedings of the2010Workshop on Domain Adaptation for Natural Language Processing, Uppsala,Sweden: Association for Computational Linguistics,2010:23-30.
    35. Sugiyama M, Nakajima S, Kashima H, Buenau P V, Kawanabe M. Direct importance estimation withmodel selection and its application to covariate shift adaptation[C]. Proc.20th Ann. Conf. NeuralInformation Processing Systems,2008
    36. Bickel S, Brückner M, Scheffer T. Discriminative Learning for Differing Training and TestDistributions. Proc.24th Int’l Conf. Machine Learning,2007.81-88
    37. Daumé III H. Frustratingly Easy Domain Adaptation[C]. Proc.45thAnn. Meeting of the Assoc.Computational Linguistics,2007.256-263
    38. Bonilla E, Chai K M, Williams C. Multi-Task Gaussian Process Prediction[C]. Proc.20th Ann. Conf.Neural Information Processing Systems,2008,153-160
    39. Finkel, Rose J, Christopher D. Hierarchical Bayesian domain adaptation[C]. Proceedings of HumanLanguage Technologies: The2009Annual Conference of the North American Chapter of the Associationfor Computational Linguistics,2009.602-610
    40. Evgeniou T, Pontil M. Regularized Multi-Task Learning [C]. Proc.10th ACM SIGKDD Int’l Conf.Knowledge Discovery and Data Mining,2004.109-117
    41. Gao J, Fan W, Jiang J, Han J. Knowledge Transfer via Multiple Model Local Structure Mapping[C].Proc.14th ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining,2008,283-291
    42. Chan Y S, Ng H T. Estimating Class Priors in Domain Adaptation for Word Sense Disambiguation[C].Proc. of the21st International Conference on Computational Linguistics and the44th annual meeting ofthe Association for Computational Linguistics,2006.89-96
    43. Mihalkova L, Huynh T, Mooney R J, Mapping and Revising Markov Logic Networks for TransferLearning. Proc.22nd Assoc. for the Advancement of Artificial Intelligence (AAAI) Conf. ArtificialIntelligence,2007.608-614
    44. Davis J, Domingos P. Deep Transfer via Second-Order Markov Logic. Proc. Assoc. for theAdvancement of Artificial Intelligence (AAAI’08) Workshop Transfer Learning for Complex Tasks,2008
    45. Xiang E W, Cao B, Hu D H, et al. Bridging Domains Using World Wide Knowledge for TransferLearning[J]. IEEE Transactions on Knowledge and Data Engineering,2010,22(6):770-783
    46. Tsang I W, Kwok J T, Lai K T. Core vector regression for very large regression problems[C]. Proc. ofthe22nd Int’l Conf. on Machine Learning (ICML2005). Bonn,2005.913920
    47. Tax D, Duin R. Support Vector Domain Description[J]. Pattern Recognition Letters,1999,20(11~13):1191-1199
    48. Liu Y H, Liu Y C, Chen Y J, et al. Fast Support Vector Data Descriptions for Novelty Detection[J].IEEE Transactions on Neural Networks,2010,21(8):1296-1313
    49. GhasemiGol M, Monsefi R, Yazdi H S. Intrusion Detection by New Data Description Method[C].International Conference on Intelligent Systems, Modelling and Simulation, Liverpool, England:2010,1-5
    50. Tsang I W, Kwok J T, Cheung P. Core vector machines: fast SVM training on very large data sets[J].Journal of Machine Learning Research,2005,6:363-392
    51.B doiu M, Clarkson K L. Optimal core sets for balls[J]. Computational Geometry: Theory andApplications,2008,40(1):14-22
    52. Tsang I W, Kwok J T, Zurada J M. Generalized core vector machines[J]. IEEE Transactions on NeuralNetworks,2006,17(5):1126-1140
    53. Chu C S, Tsang I W, Kwok J K. Scaling up Support Vector Data Description by Using Core-Sets[C].IEEE International Joint Conference on Neural Networks. Budapest, Hungary,2004.425-430
    54. Deng Z H, Chung F L,Wang S T. FRSDE: Fast reduced set density estimator using minimal enclosingball[J]. Pattern Recognition,2008,41(4):1363-1372
    55. Mark G, He C. Probability density estimation from optimally condensed data samples[J]. IEEE Trans.on PAMI,2003,25(10):1253-1264
    56. Marzio M Z, Taylor C C. Kernel density classification and boosting: an L2analysis[J]. Statistics andComputing,2005,15(2):113-123
    57. Hall P, Wand M P. On Nonparametric Discrimination Using Density Differences[J]. Biometrika,1988,75(3):541-547
    58. Sch lkopf B, Smola AJ, Williamson RC, Bartlett PL. New Support Vector Algorithms[J]. NeuralComputation,2000,12(5):1207-1245
    59. Chang C C, Lin CJ. LIBSVM: a library for support vector machines [DB/CD].http://www.csie.ntu.edu.tw/~cjlin/libsvm. Last updated: January,2013
    60.蒋亦樟,邓赵红,王士同.ML型迁移学习模糊系统.自动化学报,2012,38(9):13931409
    61. Pan S J, Yang Q. A survey on transfer learning[J]. IEEE Transaction on Knowledge and DataEngineering,2010,22(10):1345-1359
    62. Pan S J, Zheng V W, Yang Q, Hu D H. Transfer Learning for WiFi-Based Indoor Localization[C]. Proc.Workshop Transfer Learning for Complex Task of the23rd Assoc. for the Advancement of ArtificialIntelligence (AAAI) Conf. Artificial Intelligence,2008
    63. Yin X, Han J, Yang J, Yu P S. Efficient Classification across Multiple Database Relations: ACrossmine Approach[J]. IEEE Trans. Knowledge and Data Eng.,2006,18(6):770-783.
    64. Kuncheva L I, Rodr guez J J. Classifier Ensembles with a Random Linear Oracle[J]. IEEE Trans.knowledge and Data Eng.,2007,19(4):500-508.
    65. Tao J W, Chung F L, Wang S T. A kernel learning framework for domain adaptation learning[J].Science China Information Sciences,2012,55(9):1983-2007
    66. Tao J W, Chung F L, Wang S T. On minimum distribution discrepancy support vector machine fordomain adaptation[J]. Pattern Recognition,2012,45(11):3962-3984
    67. Pan S J, Kwok J T, Yang Q. Transfer learning via dimensionality reduction[C]. Proc of AAAI. MenloPark. CA:AAAI,2008.677-682
    68. Xie S, Fan W, Peng J, et al. Latent space domain transfer between high dimensional overlappingdistributions[C]. Proc of the18th Int Conf on World Wide Web. New York: ACM,2009.91-100
    69. Dai W, Yang Q, Xue G, et al. Boosting for transfer learning[C]. Proc of the24th Int Conf on MachineLearning. New York: ACM,2007.193-200
    70.洪佳眀,陈炳超,印鉴.一种结合半监督Boosting方法的迁移学习算法[J].小型微型计算机系统,2011,32(11),2169-2173.
    71.陈徳品.基于迁移学习的跨领域排序学习算法研究[D].中国科学技术大学,2010
    72. Vapnik V. The Nature of Statistical Learning Theory[M]. New York: Spring-Verlag,1995:123-167
    73. Pal M and Foody G M. Feature selection for classification of hyper spectral data by SVM [J]. IEEETransactions on Geoscience and Remote Sensing,2010,48(5):2297-2307.
    74. Xue H, Chen S, Yang Q. Structural Regularized Support Vector Machine: A Framework for StructuralLarge Margin Classifier[J]. IEEE Transactions on Neural Networks,2011,22(4):573–587
    75. Shao Y H, Zhang C H, Wang X B, Deng N Y. Improvements on Twin Support Vector Machines[J].IEEE Transactions on Neural Networks,2011,22(6):962–968
    76. Tao D P, Jin L W, Liu W F, et al. Hessian Regularized Support Vector Machines for Mobile ImageAnnotation on the Cloud[J]. IEEE Transactions on Multimedia,2013,15(4):833–844
    77. Qin C D, Liu S Y. Fuzzy smooth support vector machine with different smooth functions[J]. Journal ofSystems Engineering and Electronics,2012,23(3):460-466
    78.邓乃杨,田英杰.支持向量机——理论、算法与拓展[M].北京:科学出版杜,2009
    79.王骏,王士同,王晓明.基于特征加权距离的双指数模糊子空间聚类算法[J].控制与决策,2010,25(8):1207-1210
    80.张战成,王士同,邓赵红,等.支持向量机的一种快速分类算法[J].电子与信息学报,2011,33(9):2181-2186.
    81. Kim J and Scott C. Kernel classification via integrated squared error [C]. Proc. of the IEEE14thWorkshop on Statistical Signal Processing, Madison,2007.783-787
    82. Kim J and Scott C. Performance analysis for L2kernel classification[C]. Proc. of Advances in NeuralInformation Processing Systems. Vancouver,2008.836-843
    83. Kim J and Scott C. L2kernel classification [J]. IEEE Transactions on Pattern Analysis and MachineIntelligence,2010,32(10):1822-1831.
    84. Bruzzone L and Marconcini M. Domain adaptation problems: a DASVM classification technique and acircular validation strategy [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2010,32(5):770-787.
    85. Zhang H X. Transfer learning through domain adaptation[C]. Proc. of the8th International Symposiumon Neural Networks, Guilin,2011.505-512
    86.于重重,田蕊,谭励,等.非平衡样本分类的集成迁移学习算法[J].电子学报,2012,40(7):1358-1363.
    87.张建军,王士同,王骏.迁移学习数据分类中的ESVM算法[J].计算机工程,2012,38(8):173-176.
    88. Sun Z, Wang C, Wang H Y, et al. Learn Multiple-Kernel SVMs for Domain Adaptation inHyperspectral Data[J]. IEEE Geoscience and Remote Sensing Letters,2013,10(5):1224–1228
    89. Xia R, Zong C Q, Hu X L, et al. Feature Ensemble Plus Sample Selection: Domain Adaptation forSentiment Classification[J]. IEEE Intelligent Systems,2013,28(3):10-18
    90. Shi Y, Lan Z, Liu W, et al.. Extending semi-supervised learning methods for inductive transferlearning[C]. Proc. of the9th IEEE International Conference on Data Mining. Los Alamitos,2009.483-
    492.
    91.洪佳明,印鉴,黄云等. TrSVM:一种基于领域相似性的迁移学习算法[J].计算机研究与发展,2011,48(10):1823-1830
    92. Ren J,Shi X,Fan W,et al.. Type independent correction of sample selection bias via structural discoveryand re-balancing[C]. Proc. of the Eighth SIAM International Conference on Data Mining, Atlanta.2008.565–576
    93. Parzen E. On estimation of a probability density function and mode[J]. Ann. Math. Stat,1962,33(3):1065-1076
    94. Jeon B, Landgrebe A. Fast Parzen density estimation using clustering based branch and bound[J].IEEE Trans. Pattern Anal. Mach. Intell.1994,16(9):950-954
    95.张炤,张素等.基于支持向量机的概率密度估计方法[J].系统仿真学报.2005,17(10):2355-2357
    96. Girolami M, He C. Probability density estimation from optimally condensed data samples[J]. IEEETrans Pattern Analysis and Machine Intelligence.2003,25(10):1253-1264
    97. He C, Girolami M. Novelty detection employing an L2optimal nonparametric density estimator[J].Pattern Recognition Lett.,2004,25(12):1389-1397
    98. Yenug D Y, Chow C. Parzen-window network intrusion detectors[C]. Proc.16th InternationalConference on Pattern Recognition,2002,(4):385–388
    99. Jiang L X, Zhang H, Cai Z H. A Novel Bayes Model: Hidden Naive Bayes[J]. IEEE TransactionsKnowledge and Data Engineering,2009,21(10):1361-1371
    100.钱鹏江,王士同,邓赵红.大数据集快速均值漂移谱聚类算法[J].控制与决策.2010,25(9),1307-1312
    101. Kollios G, Gunopulos D, Efficient biased sampling for approximatie clustering and outlier detectionin large datasets[J].IEEE Trans. Knowl. Date Eng,2003,15(5):1170-1187.
    102. Badoiu M, Har-Peled S, Indyk P. Approximate clustering via core-sets[C]. Proc of34th Annual ACMSymposium on Theory of Computing. Montreal: ACM Press,2002:250-257.
    103. Smola A, Sch lkopf B. Sparse greedy matrix approximation for machine learning[C]. Stanford, CA:Proc.7th Int. Conf. Mach. Learn.,2000:911-918
    104. Blake C, Merz C. UCI repository of machine learning databases [EB/OL].http://archive.ics.uci.edu/ml/datasets.html
    105. Tao J W, Chung F L, Wang S T. On minimum distribution discrepancy support vector machine fordomain adaptation[J]. Pattern Recognition,2012,45(11):3962-3984
    106. Wang J, Wang S T, Deng Z H, Chung F L. Double indices induced FCM clustering and its integrationwith fuzzy subspace clustering[J]. FUZZ-IEEE,2012:1-8
    107. Deng Z H, Chung F L, Wang S T. Clustering-Inverse: A Generalized Model for Pattern-Based TimeSeries Segmentation[J]. Journal of Intelligent Learning Systems and Applications,2011,3(1):26-36
    108. Wang X M, Chung F L, Wang S T. Theoretical analysis for solution of support vector datadescription[J]. Neural Networks,2011,24(4):360-369
    109. Vapnik V N. Statistical learning theory[M]. New York: Wiley,1998
    110. Ray D, Majumder D D, Das A. Noise reduction and image enhancement of MRI using adaptivemultiscale data condensation[C].20121st International Conference on Recent Advances in InformationTechnology (RAIT),2012,107-113
    111. Angiulli F. Condensed Nearest Neighbor Data Domain Description[J]. IEEE Transactions on PatternAnalysis and Machine Intelligence,2007,29(10):1746–1758
    112. Seth S, Principe C. Estimation of density ratio and its application to design a measure ofdependence[J]. Machine Learning for Signal Processing,2009,1-6
    113. Jos M, Barrios G. Regression analysis and dependence[J]. Metrika,2005,61(1):73-87
    114. Shen Z, Xie S Q, Pan C Y. Probability theory and mathematical Statistics[M]. Higher Education Press.1979
    115. Zhuang F Z, Luo P, Xiong H, et al. Cross-domain learning from multiple sources: A consensusregularization perspective[J]. IEEE Transactions on Knowledge and Data Engineering,2010,22(12):1664-1678
    116. Bollegala D, Weir D, Carroll J. Using multiple sources to construct a sentiment sensitive thesaurus forcross-domain sentiment classification[C]. Proc. of the49th Annu. Mtg. of the ACL: Human LanguageTechnologies, HLT2011.132–141
    117. Duan L X, Xu D, Tsang I W. Domain Adaptation From Multiple Sources: A Domain-DependentRegularization Approach [J]. IEEE Transactions on Neural Networks and Learning Systems,2012,23(3):504–518
    118. Duan L X, Xu Dong, Chang D F. Exploiting web images for event recognition in consumer videos: Amultiple source domain adaptation approach[C]. IEEE Conference on Computer Vision and PatternRecognition (CVPR)2012,1338–1345
    119. Guo Z Y, Wang Z J. Cross-Domain Object Recognition Via Input-Output Kernel Analysis[J]. IEEETransactions on Image Processing,2013,22(8):3108-3119
    120. Fan R E, Chen P H, Lin C J. Working set selection using second order information for training supportvector machines[J]. Journal of Machine Learning Research,2005,6:1889–1918
    121. Suzuki T, Sugiyama M, Sese J. Approximating mutual information by maximum likelihood densityratio estimation[C]. JMLR Workshop and Conference Proceedings, volume4of New Challenges forFeature Selection in Data Mining and Knowledge Discovery,2008.5–20
    122. Fukumizu K, Gretton A. Kernel Measures of Conditional Dependence[C]. Advances in NeuralInformation Processing Systems,2008.489-496
    123. Takeuchi I, Nomura K, Kanamori T. Nonparametric conditional density estimation usingpiecewise-linear solution path of kernel quantile regression. Neural Computation[J].2009,21(2):533–559
    124. Jones M C, Marron J S, Sheather S J. A brief survey of bandwidth selection for density estimation[J].Journal of American Statistical Association,1996,91(433):401–407
    125. Raykar V C, Duraiswami R. Fast optimal bandwidth selection for kernel density estimation[C]. Proc.6th SIAM Int. Conf. Data Mining,2006.524–528
    126. Silverman B W. Density Estimation for Statistics and Data Analysis[M]. London, U.K.: Chapman&Hall,1986