基于迁移学习的跨领域排序学习算法研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着互联网的普及和互联网内容的不断丰富,如何通过有效的方式获取需要的信息显得尤为重要。搜索引擎很好地解决了这个问题,成为了人们访问互联网的入口。如何对搜索引擎返回的结果进行排序成为近年来研究的热点,另一方面,搜索引擎结果排序的质量也直接决定了用户的使用体验,进而影响搜索引擎的市场份额。本文的研究工作正是以搜索引擎为大背景进行的。
     纵观各大主流搜索引擎,查询的结果网页通常在经过排序后,以列表的形式返回给用户,排在最靠前的是系统认为最相关,最能满足用户信息需求的网页。近年来,运用大规模数据处理和机器学习技术训练最优排序模型成为学术界的研究热点,国内外研究者先后提出了一系列经典的方法,有些在工业界已经获得了良好的应用,比如排序支持向量机(Ranking SVM)。绝大多数这类方法都属于监督学习的范畴,为了获得一个可靠的排序模型,我们需要标注大量的训练数据,将这些数据输入到特定的学习机,经过一定时间的自动训练,学习机输出得到的排序模型。
     在排序学习算法的实际应用中,标注数据数量不足,甚至根本没有标注数据的情况经常出现。现有的监督排序学习方法总是需要一定数量规模的标注数据,以保证最终获得的排序模型的可靠性,当标注数据不足时这些方法就无法得到应用。所幸的是,在排序学习算法的实际应用中,我们也发现,虽然目标领域的标注数据不足,但可能还存在另一部分数量较多的标注数据,这些数据来自一个与目标领域不同但相关的领域(我们称之为“源领域”)。如何利用这部分数据来帮助目标领域中的排序学习,以获得改进的排序模型是本文关注的重点。
     本文针对排序学习实际应用中面临的标注数据不足的问题,充分利用来自源领域的标注数据,引入迁移学习的概念,创新性地提出了基于迁移学习的跨领域排序学习算法,并进行了应用研究。在系统分析排序学习算法的基本假设、损失函数、优化公式和学习算法之后,本文分别在实例和特征两个方面进行迁移学习,给出各自的基本假设、优化公式以及学习算法。最后,本文还研究了我们的方法在文档检索、垂直搜索中的应用。
     对于基于实例的迁移排序学习,我们首先提出了一个启发式的方法TransRank,该方法首先对源领域标注数据进行两步预处理,然后将处理过的数据和目标领域的少量训练数据一起输入到Ranking SVM,经过训练得到排序模型。随后,我们又提出了一个改进的概率分布算法CLRankins。对于基于特征的情况,根据假设我们提出了一个统一的优化公式,并将其转换成依次优化两个变量的迭代过程。我们还研究了该优化问题和经典的Ranking SVM之间的关系,并通过证明得出,该优化问题可以使用Ranking SVM作为基础学习机。对该优化问题的求解最终形成了基于特征的迁移排序学习算法CLRankfeat。
     跨领域的迁移排序学习在文档检索中有着广泛的应用前景。本文使用文档检索的一些公共数据集,模拟标注数据不足的情况,通过实验验证了迁移排序学习在文档检索中的应用效果。基于大规模公共数据集的实验表明,本文提出的三个迁移排序学习方法能不同程度地改进目标领域的排序模型。CLRankfeat能在所有的实验数据集上获得5-15%的性能提升;TransRank和CLRankins只能在部分数据集上获得较小的性能提升。同时,我们还在算法敏感性和鲁棒性上,对这些方法进行比较分析。
     垂直搜索引擎是迁移排序学习的另一个应用场景。新开发的垂直搜索往往没有足够的时间去标注数据以训练排序模型,但我们可以利用其它垂直搜索的标注数据,通过迁移排序学习来获得排序模型,用于新开发的垂直搜索。在实验中,我们使用某商业搜索引擎的查询点击数据,抽取影响网页排序的特征集合,构造实验所需的数据集。实验表明,TransRank能有效提升新闻搜索上的排序性能,节省大约80%的目标领域标注数据。此外,我们还分析讨论了不同特征在迁移排序学习过程中所起的作用。
With the quick development of Internet, the way people can effectively access information becomes crucial. Search engines partially solve this problem and act as the main access of Internet. Recently, to acquire ranking models for search engines has become a research hotspot. In search industry, the quality of the top results much influences the user satisfaction which is strongly related with the market share. This dissertation uses search engines as main scenario.
     Most of the mainstream search engines present the ranked search results in a list. The top results are thought to be the most relevant ones, and thus meeting the users' information needs which are expressed via queries. In recent years, large-scale data processing and machine learning techniques have been widely applied in acquiring a ranking model for the search engine, creating a new term "learning to rank". Many methods have been proposed and some have been successfully implemented in industry (e.g. Ranking SVM). Almost all these methods can be categorized to "supervised learning". To obtain a reliable ranking model, we need to label a large amount of training data, input them to a specific learner and experiencing a training period.
     In the real world application of learning to rank, the labeled data are usually in shortage or even absent. Using existing learning to rank methods, we cannot ensure the reliability and generalization ability of the learned ranking model without sufficient training data, which restricts the use of a specific method in real world. Fortunately, except the labeled data in target domain, we may get additional labeled data from a different but related domain which we call source domain. This dissertation focuses on how to exploit these source domain data to help improve the learned ranking model in target domain.
     To solve the lack of labeled data in learning to rank, we make use of the source domain data, introduce the transfer learning technique and propose a novel problem "cross domain learning to rank". After carefully studying the assumption, loss function, optimization equation and learning algorithm, we conduct cross domain learning to rank at instance level and feature level. Finally, we study the application of our proposed methods in document retrieval and vertical search.
     For the instance-level transfer rank learning, we first propose a heuristic method TransRank, which first preprocesses the labeled data from source domain and then input them together with the target domain data into the learner to get a ranking model. Following this, we propose an improved probabilistic method CLRankins. For the feature-level transfer rank learning, based on the assumption, we formalize a unified optimization problem, and convert it to a process where two variables are optimized iteratively. Moreover, we study its relationship with Ranking SVM and prove that this optimization problem can be solved by using Ranking SVM as the basic learner. We name this method as CLRankfeat.
     Cross domain learning to rank has great perspective in document retrieval. In this dissertation, we use several benchmark datasets and validate our methods through experiments. The results show that all the three methods can improve the ranking performance in the target domain. In particular, CLRankfeat gains 5-15% performance improvement on all the experimental datasets, while TransRank and CLRankins only can achieve limited improvement on partial datasets. We further compare them in sensitivity and robustness.
     Vertical searches are another important scenario for cross domain learning to rank. There is no enough time to learn a high-quaiity ranking model for a newly developed vertical search product. However, we can exploit the labeled data from an existing vertical search and help the new one to construct its own ranking model. In the experiments, we use the click-through log data from a commercial search engine, extract rank-related features and set up datasets. Experimental results show that TransRank gets better ranking performance in news search, saving 80% human labeling effort. Furthermore, we analyze and discuss how different features act in the learning process.
引文
[1]Ando R K and Zhang T.2005. A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data.J. Mach. Learn. Res.6:1817-1853.
    [2]Ando R K and Zhang T.2005. A high-performance semi-supervised learning method for text chunking. in Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics. Ann Arbor, Michigan:Association for Computational Linguistics,1-9.
    [3]Argyriou A, Evgeniou T, and Pontil M.2007. Multi-task feature learning. in NIPS 2007. Vancouver, Canada,41-48.
    [4]Argyriou A, Micchelli C, Pontil M, et al.2008. A spectral regularization framework for multi-task structure learning. in NIPS 2008. Cambridge, MA:MIT Press,25-32.
    [5]Baeza-Yates R and Ribeiro-Neto B.1999. Modern Information Retrieval. New York, NY, USA:Addison Wesley.
    [6]Bickel S, Bruckner M, and Scheffer T.2007. Discriminative learning for differing training and test distributions. in Proceedings of the 24th international conference on Machine learning. Corvalis, Oregon:ACM,81-88.
    [7]Blitzer J, McDonald R, and Pereira F.2006. Domain adaptation with structural correspondence learning. in Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing. Sydney, Australia:Association for Computational Linguistics,120-128.
    [8]Blitzer J, Dredze M, and Pereira F.2007. Biographies, bollywood, boomboxes and blenders:Domain adaptation for sentiment classification. in Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics. Prague, Czech Republic, 432-439.
    [9]Bonilla E, Chai K, and Williams C.2008. Multi-task Gaussian process prediction. in NIPS 2008. Cambridge, MA:MIT Press,153-160.
    [10]Burges C.2005. Ranking as Learning Structured Outputs in Proceedings of the NIPS 2005 workshop on Learning 7-11.
    [11]Burges C, Shaked T, Renshaw E, et al.2005. Learning to rank using gradient descent. in Proceedings of the 22nd international conference on Machine learning. Bonn, Germany: ACM,89-96.
    [12]Burges C, Ragno R, and Le Q.2006. Learning to rank with nonsmooth cost functions. in NIPS 2006. Cambridge, MA:MIT Press,395-402.
    [13]Cao Y, Xu J, Liu T-Y, et al.2006. Adapting ranking SVM to document retrieval. in Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval. Seattle, Washington, USA:ACM,186-193.
    [14]Cao Z, Qin T, Liu T-Y, et al.2007. Learning to rank:from pairwise approach to listwise approach. in Proceedings of the 24th international conference on Machine learning. Corvalis, Oregon:ACM,129-136.
    [15]Chakrabarti S, Berg M v d, and Dom B.1999. Focused crawling:a new approach to topic-specific Web resource discovery. in Proceedings of the eighth international conference on World Wide Web. Toronto, Canada:Elsevier North-Holland, Inc., 1623-1640.
    [16]Chan Y S and Ng H T.2005. Word sense disambiguation with distribution estimation. in Proceedings of the 19th international joint conference on Artificial intelligence. Edinburgh, Scotland:Morgan Kaufmann Publishers Inc.,1010-1015.
    [17]Cooper W S, Gey F C, and Dabney D P.1992. Probabilistic retrieval based on staged logistic regression. in Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval. Copenhagen, Denmark: ACM, 198-210.
    [18]Crammer K and Singer Y.2002. Pranking with ranking. in NIPS 2002,641-647.
    [19]Cronen-Townsend S, Zhou Y, and Croft W B.2002. Predicting query performance. in Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval. Tampere, Finland:ACM,299-306.
    [20]Dai W, Xue G-R, Yang Q, et al.2007. Transferring naive bayes classifiers for text classification. in Proceedings of the 22nd national conference on Artificial intelligence-Volume 1. Vancouver, British Columbia, Canada:AAAI Press,540-545.
    [21]Dai W, Xue G-R, Yang Q, et al.2007. Co-clustering based classification for out-of-domain documents. in Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining. San Jose, California, USA:ACM, 210-219.
    [22]Dai W, Yang Q, Xue G-R, et al.2007. Boosting for transfer learning. in Proceedings of the 24th international conference on Machine learning. Corvalis, Oregon:ACM,193-200.
    [23]Dai W, Yang Q, Xue G-R, et al.2008. Self-taught clustering. in Proceedings of the 25th international conference on Machine learning. Helsinki, Finland:ACM,200-207.
    [24]Daume H and Marcu D.2006. Domain adaptation for statistical classifiers. Journal of Artificial Intelligence Research.26:101-126.
    [25]Davis J and Domingos P.2008. Deep transfer via second-order markov logic. in the AAAI-2008 Workshop on Transfer Learning for Complex Tasks. Chicago, Illinois, USA.
    [26]Do C and Ng A, Transfer learning for text classification, in NIPS 2006.2006, MIT Press: Cambridge, MA. p.299-306.
    [27]Evgeniou T and Pontil M.2004. Regularized multi-task learning. in Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining. Seattle, WA, USA:ACM,109-117.
    [28]Fan W, Gordon M, and Pathak P.2004. Discovery of context-specific ranking functions for effective information retrieval using genetic programming.IEEE Transactions on Knowledge and Data Engineering, IEEE Computer Society.16(4):523-527.
    [29]Fan W, Davidson I, Zadrozny B, et al.2005. An Improved Categorization of Classifier's Sensitivity on Sample Selection Bias, in Proceedings of the Fifth IEEE International Conference on Data Mining:IEEE Computer Society,605-608.
    [30]Freund Y and Schapire R E.1997. A decision-theoretic generalization of on-line learning and an application to boosting.J. Comput. Syst. Sci.55(1):119-139.
    [31]Freund Y, Iyer R, Schapire R E, et al.2003. An efficient boosting algorithm for combining preferences.J. Mach. Learn. Res.4:933-969.
    [32]Gao J, Qi H, Xia X, et al.2005. Linear discriminant model for information retrieval, in Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval. Salvador, Brazil:ACM,290-297.
    [33]Gao J, Fan W, Jiang J, et al.2008. Knowledge transfer via multiple model local structure mapping. in Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. Las Vegas, Nevada, USA:ACM,283-291.
    [34]Geng X, Liu T-Y, Qin T, et al.2008. Query dependent ranking using K-nearest neighbor. in Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval. Singapore, Singapore:ACM,115-122.
    [35]Gey F C.1994. Inferring probability of relevance using the method of logistic regression. in Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval. Dublin, Ireland:Springer-Verlag New York, Inc., 222-231.
    [36]Herbrich R, Obermayer K, and Graepel T.2000. Large Margin Rank Boundaries for Ordinal Regression.Advances in Large Margin Classifiers.115-132.
    [37]Huang J, Gretton A, Scholkopf B, et al., Correcting sample selection bias by unlabeled
    data, in NIPS 2007.2007, MIT Press:Cambridge, MA.
    [38]Jebara T.2004. Multi-task feature and kernel selection for SVMs. in Proceedings of the twenty-first international conference on Machine learning. Banff, Alberta, Canada:ACM, 55.
    [39]Jiang J and Zhai C.2007. Instance weighting for domain adaptation in NLP. in ACL'07. Prague, Czech Republic,264-271.
    [40]Karpovich P.2009. Greedy function optimization in learning to rank. in RuSSIR 2009-3rd Russian Summer School in Information Retrieval.
    [41]Lan Y, Liu T-Y, Qin T, et al.2008. Query-level stability and generalization in learning to rank. in Proceedings of the 25th international conference on Machine learning. Helsinki, Finland:ACM,512-519.
    [42]Lancaster F W.1973. Information retrieval:Online. Los Angeles, LA, USA:Melville Publishing Co.
    [43]Lancaster F W.1979. Information retrieval systems:characteristics, testing and evaluation. New York, NY, USA:John Wiley & Sons.
    [44]Lawrence N D and Platt J C.2004. Learning to learn with the informative vector machine. in Proceedings of the twenty-first international conference on Machine learning. Banff, Alberta, Canada:ACM,65.
    [45]Lee S-I, Chatalbashev V, Vickrey D, et al.2007. Learning a meta-level prior for feature relevance from multiple related tasks. in Proceedings of the 24th international conference on Machine learning. Corvalis, Oregon:ACM,489-496.
    [46]Li P, Burges C, and Wu Q.2007. Mcrank:Learning to rank using multiple classification and gradient boosting. in NIPS 2007,845-852.
    [47]Liao X, Xue Y, and Carin L.2005. Logistic regression with an auxiliary data source. in Proceedings of the 22nd international conference on Machine learning. Bonn, Germany: ACM,505-512.
    [48]Lin Y, Lee Y, and Wahba G.2002. Support Vector Machines for Classification in Nonstandard Situations.Mach. Learn.46(1-3):191-202.
    [49]Maron M E and Kuhns J L.1960. On Relevance, Probabilistic Indexing and Information Retrieval.J. ACM.7(3):216-244.
    [50]Menczer F, Pant G, and Srinivasan P.2004. Topical web crawlers:Evaluating adaptive algorithms.ACM Trans. Internet Technol.4(4):378-419.
    [51]Mihalkova L, Huynh T, and Mooney R J.2007. Mapping and revising Markov logic networks for transfer learning. in Proceedings of the 22nd national conference on Artificial intelligence-Volume 1. Vancouver, British Columbia, Canada:AAAI Press, 608-614.
    [52]Mihalkova L and Mooney R J.2008. Transfer learning by mapping with minimal target data. in the AAAI-2008 Workshop on Transfer Learning for Complex Tasks. Chicago, Illinois, USA.
    [53]Nallapati R.2004. Discriminative models for information retrieval. in Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval. Sheffield, United Kingdom:ACM,64-71.
    [54]Pan S J and Yang Q.2009. A Survey on Transfer Learning.IEEE Transactions on Knowledge and Data Engineering, IEEE Computer Society. PP(99).
    [55]Ponte J M and Croft W B.1998. A language modeling approach to information retrieval. in Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval. Melbourne, Australia:ACM,275-281.
    [56]Qin T, Zhang X, Tsai M, et al.2008. Query-level loss functions for information retrieval.Information Processing & Management, Elsevier Science.44(2):838-855.
    [57]Qin T, Liu T-Y, Xu J, et al.2010. LETOR:A Benchmark Collection for Research on Learning to Rank for Information Retrieval. Information Retrieval Journal.
    [58]Quionero-Candela J, Sugiyama M, Schwaighofer A, et al.2009. Dataset Shift in Machine Learning:The MIT Press,248.
    [59]Raina R, Battle A, Lee H, et al.2007. Self-taught learning:transfer learning from unlabeled data. in Proceedings of the 24th international conference on Machine learning. Corvalis, Oregon:ACM,759-766.
    [60]Robertson S E and Jones K S.1976. Relevance weighting of search terms. Journl of the American Society for Information Science.27(3):129-146.
    [61]Robertson S E, Rijsbergen C J v, and Porter M F. 1981. Probabilistic models of indexing and searching. in Proceedings of the 3rd annual ACM conference on Research and development in information retrieval. Cambridge, England:Butterworth\& Co., 35-56.
    [62]Robertson S E and Walker S.1994. Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval. in Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval. Dublin, Ireland:Springer-Verlag New York, Inc.,232-241.
    [63]Robertson S E, Walker S, Hancock-Beaulieu M, et al.1994. Okapi at TREC-3. in 3th Text REtrieval Conference. Gaithersburg, USA:U.S. National Institute of Standards and Technology, NIST Special Publication,109-126.
    [64]Robertson S E and Walker S.1999. Okapi/Keenbow at TREC-8. in 8th Text REtrieval Conference. Gaithersburg, USA:U.S. National Institute of Standards and Technology, NIST Special Publication,151-161.
    [65]Ruckert U and Kramer S.2008. Kernel-Based Inductive Transfer. in Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases-Part Ⅱ. Antwerp, Belgium:Springer-Verlag,220-233.
    [66]Salton G and Lesk M E.1968. Computer Evaluation of Indexing and Text Processing.J. ACM.15(1):8-36.
    [67]Salton G, Fox E A, and Wu H.1983. Extended Boolean information retrieval.Commun. ACM.26(11):1022-1036.
    [68]Salton G and Buckley C.1988. Term-weighting approaches in automatic text retrieval.Information Processing & Management.24(5):513-523.
    [69]Schwaighofer A, Tresp V, and Yu K.2005. Learning gaussian process kernels via hierarchical bayes. in NIPS 2005. Cambridge, MA:MIT Press,1209-1216.
    [70]Shimodaira H.2000. Improving predictive inference under covariate shift by weighting the log-likelihood function.Journal of Statistical Planning and Inference.90:227-244.
    [71]Spearman C.1904. The proof and measurement of association between two things.American Journal of Psychology.15:72-101.
    [72]Sugiyama M and Muller K-R.2005. Input-dependent estimation of generalization error under covariate shift.Statistics & Decisions.23(4):249-279.
    [73]Sugiyama M, Nakajima S, Kashima H, et al.2008. Direct importance estimation with model selection and its application to covariate shift adaptation. in NIPS 2008. Vancouver, Canada.
    [74]Swarup S and Ray S R.2006. Cross-domain knowledge transfer using structured representations. in Proceedings of the 21st national conference on Artificial intelligence-Volume 1. Boston, Massachusetts:AAAI Press,506-511.
    [75]Taylor M, Zaragoza H, Craswell N, et al.2006. Optimisation methods for ranking functions with multiple parameters. in Proceedings of the 15th ACM international conference on Information and knowledge management. Arlington, Virginia, USA:ACM, 585-593.
    [76]Thrun S and Pratt L, eds. Learning to learn. ed. T. Sebastian and P.Lorien.1998, Kluwer Academic Publishers:Norwell, MA, USA.354.
    [77]Trotman A.2005. Learning to Rank.Inf. Retr.8(3):359-381.
    [78]Tsai M-F, Liu T-Y, Qin T, et al.2007. FRank:a ranking method with fidelity loss. in Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. Amsterdam, The Netherlands:ACM,383-390.
    [79]Wang C and Mahadevan S.2008. Manifold alignment using Procrustes analysis. in Proceedings of the 25th international conference on Machine learning. Helsinki, Finland: ACM,1120-1127.
    [80]Wang Z, Song Y, and Zhang C.2008. Transferred Dimensionality Reduction. in Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases-Part II. Antwerp, Belgium:Springer-Verlag,550-565.
    [81]Wu P and Dietterich T G 2004. Improving SVM accuracy by training on auxiliary data sources. in Proceedings of the twenty-first international conference on Machine learning. Banff, Alberta, Canada:ACM,110.
    [82]Xia F, Liu T-Y, Wang J, et al.2008. Listwise approach to learning to rank:theory and algorithm. in Proceedings of the 25th international conference on Machine learning. Helsinki, Finland:ACM,1192-1199.
    [83]Xu J and Li H.2007. AdaRank:a boosting algorithm for information retrieval. in Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. Amsterdam, The Netherlands:ACM,391-398.
    [84]Yue Y, Finley T, Radlinski F, et al.2007. A support vector method for optimizing average precision. in Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. Amsterdam, The Netherlands:ACM, 271-278.
    [85]Zadrozny B.2004. Learning and evaluating classifiers under sample selection bias. in Proceedings of the twenty-first international conference on Machine learning. Banff, Alberta, Canada:ACM,114.
    [86]Zheng Z, Chen K, Sun G, et al.2007. A regression framework for learning ranking functions using relative relevance judgments. in Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. Amsterdam, The Netherlands:ACM,287-294.
    [87]Zheng Z, Zhang H, Zhang T, et al.2007. A general boosting method and its application to learning ranking functions for web search. in NIPS 2007.
    [88]Zhu C, Chen W, Zhu Z A, et al.2009. A general magnitude-preserving boosting algorithm for search ranking. in Proceeding of the 18th ACM conference on Information and knowledge management. Hong Kong, China:ACM,817-826.
    [89]中国互联网络信息中心CNNIC.2010.第25次中国互联网络发展状况统计报告,3.
    [90]袁津生.2008.搜索影擎原理与实践.北京:北京邮电大学出版社.
    [91]黄昌宁.2002.统计语言模型能做什么?.语言文字应用.No.1:77-84.