面向市场情报分析的Web实体事件融合问题研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着Internet的快速发展,Web已经成为一个开放的、分布广泛的全球信息服务中心。企业希望通过大数据的分析,获取有价值的市场情报,在激烈的市场竞争中取得先机。在Web上,企业更加关心描述与其存在利益关系的实体(包括企业、产品、人物等)的一些事件信息、,这些事件描述了实体从事的一些活动或者其最新的状态,为企业挖掘市场情报提供了第一手的资料。大量的事件信息以新闻、评论、消息等形式充斥在Web上,具有冗余度高、准确度差、数据离散等特点,给企业进行市场情报分析带来极大的不便。因此,如何消除冗余、事件关系发现,从而有效地整合事件信息,成为准确获取市场情报的前提。
     作为企业进行市场情报分析的重要步骤,Web实体事件融合可以为市场情报分析提供高质量的数据,为市场情报分析提供全面、真实、可靠的数据支持,因此,吸引了越来越多的研究者的关注。但是,由于Web上事件信息多以新闻等非结构化形式出现,存在表达自由、形式多样、发布随意等特点,因此Web实体事件融合还存在以下问题亟待解决:(1)Web上不同网站对同一事件的描述存在较大的差异,事件融合首先需要识别描述同一事件的不同表象;(2)由于事件不断发展、不同网站对事件描述详细程度的不同、网站的偏好以及编辑出错等原因,使得Web上事件信息存在不完整、过时、错误、虚假等冲突情况的发生,因此,为了保证市场情报分析数据的准确性,Web实体事件融合需要进行事件冲突解决;(3)从单个事件的描述信息很难发现事件的全貌,无法知道其来龙去脉,因此,为了给市场情报分析提供实体事件全面描述信息,Web实体事件融合需要建立起不同实体事件之间的关联关系,为挖掘事件间的隐式关联奠定基础。
     Web实体事件融合是数据质量的保证,以及市场情报分析的前提,本文针对事件融合面临的若干关键问题而展开研究,本文的主要工作与贡献可以概括为如下几个方面:
     (1)针对如何识别Web上存在的众多不同事件表象问题,本文提出了一种基于异质信息网络的Web实体共指事件识别方法,有效地提高了识别事件不同表象的准确性。
     该方法使用一种层次聚类的整体式共指事件识别算法,并利用了匹配决策之间的相互影响,进而迭代实现共指事件的识别。在事件相似度度量方面,本文提出的方法综合利用了实体、事件、文档、数据源之间的关系,通过运用多种特征进行事件相似度度量,得到准确度较高的事件表象综合相似度。通过在企业事件数据集、人物事件数据集、产品事件数据上的实验说明,所提出的算法可以有效地完成Web实体共指事件识别任务,具有较好的查全率和查准率。
     (2)针对不同事件表象所提供的事件信息存在不完整、过时、矛盾、错误等问题,文本提出了一种基于D-S证据理论的事件冲突解决方法,可以有效地解决事件表象间存在的冲突问题。
     该方法根据事件冲突的类型,采用有针对性的冲突解决策略,利用D-S证据理论的组合规则,有效地提高了事件冲突解决的准确率。在事件属性可信度的计算方面,利用事件属性事实出现的频率、在文档中的位置、数据源的质量等因素,采用半监督学习的方法,分别计算事件属性事实的可信度。针对传统D-S证据理论存在的合成法则悖论问题,对D-S证据理论进行了扩展,提高了事件冲突解决的准确度,并且允许新的策略和特征的加入,因此该方法具有较强的适应性。
     (3)针对无法从单个事件描述洞悉事件的起因、发展以及走向等问题,本文提出了基于事件关系和实体关系构建实体事件关联图的方法,可以有效地建立起实体事件间的关联。
     该方法利用事件间最基本的五种关系模式还原事件发生与发展的脉络,并借助于实体关系的发现,将实体事件间的复杂关联以图的形式描述出来,为挖掘事件间存在的隐式关系奠定基础。在事件关系方面,根据已有的事件关系模式,提出了一种事件关联图的构建方法;通过实验验证,本文所提方法能够有效建立实体事件间关联,具有较高的准确率。
     通过对Web实体事件融合的研究,解决了市场情报分析面临的数据质量问题,并为大规模情报分析奠定了基础,因此,本文的研究具有积极的意义。另外,事件关系检测、事件模式发现以及新的事件关系的表示机制是下一步的研究方向。
With the rapid progress of Internet, Web has become an open and global information center. The companies want to obtain valuable market intelligence by big data analysis, obtain the opportunity in fierce market competition. On the Web, the companies concern about the events of entities related to them (include companies, products, people, etc.), these events describe the entities'activity and the latest status, and provide the first-hand information for mining market intelligence. A large number of event information on the Web as the form of news, reviews and message. It has high redundancy, poor accuracy and discrete characteristic, brings great inconvenience for market intelligence analysis. How to eliminate redundancy, discriminating, association events, integrate event information become a preconditions that accurate access to market intelligence.
     As an important step for market intelligence analysis, web entities event fusion can provide high-quality data, comprehensive, truthful and reliable data for market intelligence analysis. Therefore, it has attracted more and more researchers. However, event information in the form such as news appeares on the web, has the characteristics of expression freely, various forms and publish freely, etc. Web entity event fusion has to solve the following problems:(1)There is a big difference to describe the same event in different web sites. So the first thing shoud be solved is event coreference resolution;(2)Since the reason of events progress, different sites provide different event mention, website preferences and editor errors, makes the information on the web incomplete, outdate, erroneous, false, etc. Therefore, in order to ensure market intelligence analysis has accuracy data, web entity event fusin need to solve the events conflict resolution;(3) It is difficult to find the event whole picture from a single event, cannot know the ins and outs. So in order to provide an entities, events panorama, Web entities event fusion need to found the correlation between entities and events.
     The research of Web entity event fusion is a prerequisite for high-quality data and market intelligence analysis. The main work and contribution of this thesis is summarized as follows:
     (1) How to identify a number of different event mention on the Web, we presents a methods of Web entity event coreference resolution based on heterogeneous information network in this paper, it effectively improve the accuryacy of event coreference resolution.
     The method adopts a hierarchical clustering algorithm of event coreference resolution, and using the interaction between decision and making, then iteractive implement the event coreference resolution. In the event similarity measurement, the method of this thesis uses the relation of entities, events, documents and data sources, using event similarity measurement from different angle, obtain reasonable the similarity of event mentions. The experiments on the enterprises event data set, characters event data set and products event data set, the proposed algorithm can accomplisth the tasks of event coreference resolution, has better recall and precision.
     (2)Since the different event mentions provide incomplete, outdate and contradiction data, we puts forward a solution of event conflict resolution based on D-S evidence thory in this paper, can effectively solve the problem of event conflict resolution.
     According to the type of event conflict, the method adopts the strategy to solve the confliction, and uses the combination rules of D-S evidence theory, can effectively improve the accuracy of event conflict resolution. In the calculation of the credibility of event attributes, using the frequency of event attributes, location in the document, the quality of data source and other factors, adopting semi-supervised merchine learning method, calculating the credibility of event attributes's fact. As the combination rule paradox problems existing in the traditional D-S evidence theory, then extend the theory and increase the accuracy of event confliction resolution, and allows to add new features, therefore the method has strong adaptability.
     (3)Since it cannot describe the event's cause and progress from one event mention, we present a method to construct panorama based on entity and event in this paper.
     The method of this thesis uses five basic event relations and entity relation, describes the complex relation of entities and events, and lay the foundation for mining implicit relationship exists in events. In the event relation, according to the event relation types, we put forward a method to construct an event relation graph; We use the entity relationsip to link the event relation graph form a panorama in this paper. According to the experimental results, the proposed method can effectively establish entity, event correlation, has high accuracy.
     The research of Web entity event fusion solves the data quality problems of market intelligence, and lays the foundation for large-scale information analysis. Therefore, the research of this paper is very significance. In addition, the event detection, event pattern discovery and new event representation mechanism is the next research direction.
引文
[1]Per V. Jenster,Klaus Solberg Soilen著,吴晓真译.市场情报:培养战略的视角[M].上海:上海远东出版社.2010:17-19.
    [2]Heng Ji, Ralph Grishman, Zheng Chen and Prashant Gupta. Cross-document Event Extraction and Tracking:Task, Evaluation, Techniques and Challenges. Proc. Recent Advances in Natural Language Processing 2009.
    [3]韩崇昭,朱洪艳,段战胜等.多源信息融合(第2版)[M].北京:清华大学出版社,2010.
    [4]彭冬亮,文成林,薛安克.多传感器多源信息融合理论及应用[M].北京:科学出版社,2010.
    [5]Bostrom H, Andler S F, Brohede M. On the Definition of Information Fusion as a Field of Research[R].Tch.Rep. Hs-iki-tr-07-006. Informatics Research Center, University of Skovde, 2007.
    [6]Khaleghi B, Khamis A, Karray F O. Multisensor data fusion:A review of the state-of-the-art[J]. Information Fusion,2013,14(1):28-44.
    [7]徐晓斌,文成林,刘荣利.基于随机集理论的多源信息统一表示与建模方法[J].电子学报,2008,36(6):1174-1181.
    [8]何友,关欣,王国宏.多传感器信息融合研究进展与展望[J].宇航学报,2005,26(4):524-529.
    [9]Hall D L, Llinas J多传感器数据融合手册[M].北京:电子工作出版社,2008.
    [10]Blasch E,Llinas J, Lambert D. High Level Information Fusion developments, issue, and grand challenges:Fusion 2010 panel discussion[C]//Proc. of the 13th International Conference on Information Fusion. Edinburgh, UK, 2010:1-8.
    [11]Hall D L, McNeese M D, Hellar D B. A cyber infrastructure for evaluating the performance of human centered fusion[C]//Proc. of the 12th International of Conference on Information Fusion. Seattle, USA, 2009:1257-1264.
    [12]于洪珍,徐立中,王慧斌.监测监控信息融合技术[M].北京:清华大学出版社,2011.
    [13]韩德强,邓勇,韩崇昭等.基于证据距离与不确定度的证据组合方法[J].红外与毫米波学报,2011,30(5):396-400,468.
    [14]Liu zhun ga, Jean D, Pan Quan. Combination of sources of evidence with different discountiong factors based on a new dissimilarity measure[J]. Decision Support Systems,2011,52(1):133-141.
    [15]史超,程咏梅.基于证据冲突度的多传感器冲突信息组合方法[J].计算机应用研究,2011,28(3):865-868.
    [16]杨风暴,王肖霞.D-S证据理论的冲突证据合成方法[M].北京:国防工业出版社,2010.
    [17]Smarandache F, Dezert J. Advances and Applications of DSmT for Information Fusion(Vol.1-3)[M]. Rehoboth: American Researh Press, 2004,2006,2009.
    [18]杨小军,潘泉,王睿等.粒子滤波进展与展望[J].控制理论与应用,2006,23(2):261-267.
    [19]Shen X J, Luo Y T, Zhu Y M. Globally distributed Kalman filtering fusion[J]. Science China:Information Science,2012,55(3),512-529.
    [20]邓自立Wiener滤波,Kalman滤波和信息融合滤波理论研究进展[J].黑龙江大学工程学报,2011,2(3):37-44.
    [21]薛志远,张清华.复合粒计算模型研究进展[J].重庆邮电大学学报:自然科学版,2010,22(5):631-640.
    [22]丁世飞,齐丙娟,谭红艳.支持向量机理论与算法研究综述[J].电子科技大学学报,2011,40(1):2-10.
    [23]文益民,王耀南,吕宝粮等.支持向量机处理大规模问题算法综述[J].计算机科学,2009,36(7):20-25,31.
    [24]Yao J, Raghavan V, Wu Z. Web information fusion:A review of the state of the art[J]. Information Fusion,2008,9(4):446-449.
    [25]Yin H, Wang L, Nong J. Survey on Game-theoretic Information Fusion[C]//2010 7th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2010), Yantai,China,2010:2147-2151.
    [26]张燕平.商空间与粒计算—结构化问题求解理论与方法[M].北京:科学出版社,2010.
    [27]Florea M C, Duclos-Hindie N, Bosse E. A Web Service Approach for Multisensor Data Fusion and Geographic Information Systems[C]//4th International Workshop on Information Fusion and Geographical Information Systems. St Petersburg. Russia,2009:209-223.
    [28]Cheng X, Liao X J. The Application of Cloud Computing in Military Intelligence Fusion[C]//Proceeding of the 2011 International Conference on Information Technology,Computer Engineering and Management Sciences(ICM2011). Nanjing, China,2011:241-244.
    [29]宋正德,魏震方.云计算环境下信息资源的融合与集成[J].中国科技信息,2011(24):95.
    [30]ACE (Automatic Content Extraction) Chinese annotation guidelines for events[R]. National Institute of Standards and Technology.2005.
    [31]D. L. Hall, J. Llinas. An Introduction to Multisensor Fusion [C]. In Proceedings of the IEEE: Special Issues on Data Fusion, 1997, 85(1):6-23.
    [32]X. L. Dong, F. Naumann. Data Fusion-Resolving Data Conflicts for Integration [J]. Proceedings of the VLDB Endowment (PVLDB),2009, 2(2):1654-1655.
    [33]X. L. Dong, L. Berti-Equille, D. Srivastava. Integrating Conflicting Data: The Role of Source Dependence [J]. Proceedings of the VLDB Endowment (PVLDB),2009,2(1):550-561.
    [34]X. L. Dong, L. Berti-Equille, D. Srivastava. Truth Discovery and Copying Detection in a Dynamic World [J]. Proceedings of the VLDB Endowment (PVLDB),2009,2(1):562-573.
    [35]L. Berti-Equille, A. Das Sarma, X. L. Dong, A. Marian, D. Srivastava. Sailing the Information Ocean with Awareness of Currents:Discovery and Application of Source Dependence [C]. In Proceedings of 4th Biennial Conference on Innovative Data Systems Research (CIDR), Asilomar, CA, USA, 2009.
    [36]X. L. Dong, L. Berti-Equille, Y. Hu, D. Srivastava. Global Detection of Complex Copying Relationships between Sources [J]. Proceedings of the VLDB Endowment (PVLDB),2010,3(1):1358-1369.
    [37]X. L. Dong, L. Berti-Equille. Large-Scale Copy Detection [C]. In Proceedings of the ACM SIGMOD International Conference on Management of Data, Athens, Greece,2011,1205-1208.
    [38]X. Liu, X. L. Dong, B. C. Ooi, D. Srivastava. Online Data Fusion [J]. Proceedings of the VLDB Endowment (PVLDB),4(11):932-943,2010.
    [39]E. M. Voorhees, N. K. Gupta, B. Johnson-Laird. Learning Collection Fusion Strateies [C]. In Proceedings of the 18th International Conference on Research and Development in Information Retrieval [C], 1995,172-179.
    [40]S. Luo, J. Callan. A Semisupervised Learning Method to Merge Search Engine Results [J]. ACM Transactions on Information Systems,2003, 21(4):457-491.
    [41]Agata Cybulska, Piek Vossen:Semantic Relations between Events and their Time, Locations and Participants for Event Coreference Resolution. RANLP 2013:156-163.
    [42]Heeyoung Lee, Marta Recasens, Angel X. Chang, Mihai Surdeanu, Dan Jurafsky:Joint Entity and Event Coreference Resolution across Documents. EMNLP-CoNLL 2012:489-500.
    [43]Bin Chen, Jian Su, Sinno Jialin Pan, Chew Lim Tan:A Unified Event Coreference Resolution by Integrating Multiple Resolvers. IJCNLP 2011:102-110.
    [44]Cosmin Adrian Bejan, Sanda M. Harabagiu:Unsupervised Event Coreference Resolution with Rich Linguistic Features. ACL 2010:1412-1422.
    [45]Qi Li, Heng Ji, Liang Huang:Joint Event Extraction via Structured Prediction with Global Features. ACL 2013:73-82.
    [46]Zheng Chen, Heng Ji:Graph-based Event Coreference Resolution. Graph-based Methods for Natural Language Processing 2009:54-57.
    [47]赵妍妍,秦兵,车万翔等.中文事件抽取技术研究[J].中文信息学报,2008,22(1):3-8.
    [48]Muhua Zhu, Yue Zhang, Wenliang Chen, Min Zhang and Jingbo Zhu. 2013. Fast and Accurate Shift-Reduce Constituent Parsing. In Proceedings of ACL2013.
    [49]许荣华,吴刚,李培峰,朱巧明.基于指代消解的中文事件融合方法[J].计算机应用.2009,29(8):2264-2267.
    [50]Simone Paolo Ponzetto and Michael Strube.2006.Exploiting semantic role labeling, WordNet and Wikipedia for coreference resolution. InProceedings of HLT-NAACL 2006,192-199.
    [51]Haghighi J. Bleiholder, F. Naumann. Conflict Handling Strategies in an Integrated Information System [C]. In Proceedings of the International Workshop on Information Integration on the Web (ⅡWeb). Edinburgh, UK,2006.
    [52]Veselin Stoyanov, Nathan Gilbert, Claire Cardie, and Ellen Riloff. 2009. Conundrums in noun phrase coref-erence resolution:Making sense of the state-of-the-art. InProceedings of ACL-IJCNLP 2009,656-664.
    [53]Aria Haghighi and Dan Klein. 2010. Coreference resolu-tion in a modular, entity-centered model. In Proceed-ings of HLT-NAACL 2010, 385-393.
    [54]Karthik Raghunathan, Heeyoung Lee, Sudarshan Ran-garajan, Nathanael Chambers, Mihai Surdeanu, Dan Jurafsky, and Chris Manning.2010. A multi-pass sieve for coreference resolution. In Proceedings of EMNLP 2010,492-501.
    [55]Altaf Rahman and Vincent Ng.2011. Coreference resolution with world knowledge. In Proceedings of ACL 2011,814-824.
    [56]A. K. Elmagarmid, P. G. Ipeirotis, V. S. Verykios, Duplicate Record Detection:A Survey [J]. IEEE Transactions on Knowledge and Data Engineering (TKDE),2007,19(1):1-16.
    [57]Lee H, Chang A, Peirsman Y, et al. Deterministic coreference resolution based on entity-centric, precision-ranked rules[J].2013.
    [58]Pascal Denis and Jason Baldridge. 2007. Joint determi-nation of anaphoricity and coreference resolution us-ing integer programming. In Proceedings of NAACL-HLT 2007.
    [59]Hoifung Poon and Pedro Domingos.2008. Joint unsu-pervised coreference resolution with Markov logic. In Proceedings of EMNLP 2008,650-659.
    [60]Michael L. Wick, Khashayar Rohanimanesh, Karl Schultz, and Andrew McCallum.2008. A unified ap-proach for schema matching, coreference and canoni-calization. InProceedings of KDD 2008,722-730.
    [61]Heeyoung Lee, Yves Peirsman, Angel Chang, Nathanael Chambers, Mihai Surdeanu, and Dan Jurafsky. 2011. Stanford's multi-pass sieve coreference resolution sys-tem at the CoNLL-2011 shared task. In Proceedings of CoNLL 2011:Shared Task, 28-34.
    [62]Bean D L, Riloff E. Unsupervised Learning of Contextual Role Knowledge for Coreference Resolution[C]//HLT-NAACL.2004: 297-304.
    [63]He T Y. Coreference resolution on entities and events for hospital discharge summaries[D]. Massachusetts Institute of Technology,2007.
    [64]王厚峰.指代消解的基本方法和实现技术[J].中文信息学报.2002,16(6):9-17.
    [65]M. Poesio,et al. Centering:A parametric theory and its instantiations. Computational Linguistics[J],2004a,30(3):309-363.
    [66]J. McCarthy and W. Lehnert. Using decision trees for coreference resolution. In:C.R. Perrault ed. Proc. of the Fourteenth Inte rnational Joint Conference on Artificial Intelligence. Quebec, Canada:Springer, 1995,1050-1055.
    [67]K.Wagstaff. Intelligent clustering with instance-level constraints[D].2002, Cornell University.
    [68]T. Finley and T. Joachims.2005. Supervised cl ustering with support vector machines. In:S. Dzeroski, L.D. Raedt, and S. Wrobel eds. Proc. of the 22nd international conference on Machine learning. New York, NY, USA:ACM Press,217-224.
    [69]P. Elango.2006. Coreference resolution:A survey. Project report of the course "Advanced natural language processing" In computer science departments, university of Wisconsin Madison.
    [70]F. S C Tseng, J-J Chiang, W-P Yang. Integration of Relations with Conflicting Schema Structures in Heterogeneous Database Systems [J]. Data & Knowledge Engineering (DKE),1998,27(2):231-248.
    [71]S. Ram, J. Park. Semantic Conflict Resolution Ontology (SCROL):An Ontology for Detecting and Resolving Data and Schema-Level Semantic Conflicts [J]. IEEE Transactions on Knowledge and Data Engineering (TKDE),2004,16(2):189-202.
    [72]M. P. Reddy, B. E. Prasad, P. G. Reddy, A. Gupta. A Methodology for Integration of Heterogeneous Databases [J]. IEEE Transactions on Knowledge and Data Engineering (TKDE),1994,6(6):920-933.
    [73]Christine Parent, Stefano Spaccapietra:Database Integration:The Key to Data Interoperability. Advances in Object-Oriented Data Modeling 2000: 221-253.
    [74]Jens Bleiholder, Felix Naumann. Conflict handling strategies in an integrated information system[M]. Humboldt-Universit t zu Berlin, Institut fur Informatik, 2006.
    [75]J. Bleiholder, F. Naumann [J]. Data fusion. ACM Computing Surveys, 2008,41(1):1-41.
    [76]J. Bleiholder, F. Naumann. Conflict Handling Strategies in an Integrated Information System [C]. In Proceedings of the International Workshop on Information Integration on the Web (ⅡWeb). Edinburgh, UK,2006.
    [77]X. Yin, J. Han, P. S Yu. Truth Discovery with Multiple Conflicting Information Providers on the Web [C]. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Jose, California, USA,2007,1048-1052.
    [78]M. Wu, A. Marian. Corroborating Answers from Multiple Web Sources [C]. In Proceedings of the 10th International Workshop on the Web and Databases (WebDB). Beijing, China, 2007.
    [79]G. Doddington, A. Mitchell, M. Przybocki, et al. The Automatic Content Extraction (ACE) Program-Tasks, Data, and Evaluation. in 4th International Conference on Language Resources and Evaluation.2004. Centro Cultural de Belem, Lisbon, Portugal,837-840.
    [80]J. Allan, R. Gupta, and V. Khandelwal. Temporal summaries of new topics.2001:ACM New York, NY, USA. 10-18.
    [81]I. Mani, Recent developments in temporal information extraction recent advances in Natural language processing Ⅲ: Selected Papers from RANLP 2003,2004:45-60.
    [82]刘宗田,黄美丽,周文,等(2009),面向事件的本体模型,计算机科学,2009,36(11),189-192.
    [83]C. Khoo, J. Kornfilt, R. Oddy, et al., Automatic extraction of cause-effect information from newspaper text without knowledge-based inferencing, Literary and Linguistic Computing, 1998.13(4):177-186.
    [84]R. Girju. Automatic detection of causal relations for question answering. in Proceedings of the ACL 2003 Workshop on Multilingual Summarization and Question Answering.2003.76-83.
    [85]B. Steven and H. Martin James. Learning Semantic Links from a Corpus of Parallel Temporal and Causal Relations. in Proceedings of ACL-08:HLT.2008:Association for Computational Linguistics. 177-180.
    [86]干红华.基于事件的因果关系可计算化分析研究.博士论文,浙江大学,2003.
    [87]Raychaudhuri S,Chang JT, Imam F, Altman RB.The computational analysis of scientific literature to define and recognize gene expression clusters [J].Nuleic Acids Res,2003,31(15):4553-4560.
    [88]Stapley BJ,G.Benoit G.Biblionmetrics:Information Retrieval and Visualization from Coocurrences of Gene Names in Medline abstracts[J].Pac Symp Biocomput,2000,(5):526-537.
    [89]Cohen AM, Hersh WR. A survey of current work in biomedical text mining[J].Brief Bioinform,2005,6(1):57-71.
    [90]Huang M,Zhu X,Hao Y,et al. Discovering patterns to extract protein-protein interactions from full texts[J].Bioinformatics,2004,20(18):3604-3612.
    [91]Daraselia N,Yuryev A,Egorov S,et al. Extracting human protein interactions from MEDLINE using a full sentence parser[J]. Bioinformatics,2004,20 (5):604-611.
    [92]张传岩,洪晓光,彭朝晖,等.基于SVM和扩展条件随机场的Web实体活动抽取[J].软件学报,2012,10:008.
    [93]Chen Z, Ji H, Haralick R. A pairwise event coreference model, feature impact and evaluation for event coreference resolution[C]//Proceedings of the Workshop on Events in Emerging Text Types. Association for Computational Linguistics,2009:17-22.
    [94]Miwa M, Thompson P, Ananiadou S. Boosting automatic event extraction from the literature using domain adaptation and coreference resolution[J]. Bioinformatics,2012,28(13):1759-1765.
    [95]Ponzetto S P, Poesio M. State-of-the-art NLP approaches to coreference resolution:theory and practical recipes[C]//Tutorial Abstracts of ACL-IJCNLP 2009. Association for Computational Linguistics,2009: 6-6.
    [96]G. Qi, M. Tsai, S. Tsai, L. Cao, and T.S. Huang, Web-Scale Multimedia Information Networks[J]. In Proceedings of Proceedings of the IEEE. 2012,2688-2704.
    [97]Luo G, Tang C, Yu PS. Resource-Adaptive real-time new event detection. In: Chan CY, Ooi BC, Zhou A, eds. Proc. of the 2007 ACM SIGMOD Int'l Conf. on Management of data. ACM,2007.497-508.
    [98]Zamir O, Etzioni O. Web document clustering: A feasibility demonstration. In:Proc. of the 21st ACM Int'l Conf. on Research and Development in Information Retrieval. Melbourne:ACM Press, 1998. 46-54.
    [99]Hai Leong Chieu, Hwee Tou Ng.A Maximum Entropy Approach to Info rmation Extraction from Semi-Structured and Free Text. Proceedings of the 18th National Conference on Artificial Intelligence,2002,786-791.
    [100]Dempster,A. P. Upper and lower probabilities induced by a multivalued mapping. Annals of Mathematical Statistics,1967,38(2):325-339.
    [101]Dempster, A.P. Generalization of Bayesian Inference. Journal of the Royal Statistical Society. Series B 30,1968:205-247.
    [102]Shafer,G. A Mathematical Theory of Evidence. Princeton University Press,1976.
    [103]韩崇昭,朱洪艳,段战胜.多源信息融合[M].第2版.北京:清华大学出版社,2010:86-87.
    [104]董永权,李庆忠,丁艳辉,等.种基于证据理论和任务分配的DeepWeb查询接口匹配方法术[J].模式识别与人工智能,2011,24(2).
    [105]PHall, GDowling, APProximatestringmatehing, ComPutingSurveys, 1980:381-402.
    [106]WCohen,P Ravikuma, S Fienberg, A comparison of string distance metrics for nam-matehing tasks, In:Proceedings of the 2th Intenational Workshop on Information Integration on the Web,2003,73-75.
    [107]张钹,张铃.问题求解理论及应用[M].北京:清华大学出版社,1990.
    [108]Dayal U. Processing queries over generalization hierarchies in a multidatabase system//Proceedings of the VLDB. Florence, Italy, 1983:342-353.
    [109]Papakonstantionou Y,Abiteboul S,Garica-Molina H.Object fusion in mediator systems//Proceedings of the VLDB. Bombay, India, 1996:413-424.
    [110]Motro A,Anokhin P,Acar A C.Utility-based resolution of data inconsistencies//Proceedings of the IQIS Workshop. Paris,France, 2004:35-43.
    [111]Schallehn E,Sattler K-U,Saake G.Efficient similarity-based operations for data integration. Data and Knowledge Engineering,2004,48 (3): 361-387.
    [112]Q L Li, YX Zhang, L Z Cui Data Conflict Resolution with Markov Logic Networks, m The SPRING 9th International Conference on Computing, Communications and Control Technologies,2011.
    [113]Y F Zhang, T Huang Markov Logic Networks with its application in De-duplication, m Journal of ChongQing University,33,2010.
    [114]Srivastava R P, Liu L P. APPlications of belief functions in business decisions:A review[J], Information Systems Frontiers,2003, 5(4):359-378.
    [115]杨雪蓉,洪宇,马彬,等.基于核心词和实体推理的事件关系识别方法2012 CCIR Best Student Paper.
    [116]马彬,洪宇, 杨雪蓉,等.基于语义依存线索的事件关系识别方法研究.北京大学学报: 自然科学版,2013,49(1):109-116.
    [117]D sanchez, A.Moreno. A methodology for konwledge aqcquisition from the web[J].International Journal of Knowledge-Based and Intelligent Engineering Systems,2006,100:453-475.
    [118]M. Vargas-Vera, D.Celjuska. Event Recognition on News Stories and Semi-Automatic Population of an Ontology[C]//Web Intelligence, WI2004.Proceedings of IEEE/WIC/ACM International Conference,2004: 615-618.
    [119]仲兆满,刘宗田,周文,等.事件关系表示模型[J].中文信息学报, 2009,23(6):56-60.
    [120]ChklovskiT, PantelP. Global path-based refinement of noisy graphs applied to verb semantics//Proceedings of Joint Conference on Natural Language Processing.Jeju Island, Korea, 2005:792-803.
    [121]PantelP, PennacchiottiM. Espresso:leveraging generic patterns for automatically harvesting semantic relations//Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the ACL. Sydney, Australia, 2006:113-120.
    [122]Harris Z S. Mathematical structure of language. New York, 1968.
    [123]Lin D, PantelP. Discovery of inference rules from text//Proceeding of the 7th ACM SIGKDD. San Francisco, California, USA,2001:323-328.
    [124]SzpektorI, TanevH, Dagan I, et al. Scaling web-based acquisition of entailment relations//Proceedings of the Conference on Empirical Methods in Natural Language Processing. Barcelona, Spain, 2004: 41-48.
    [125]张辉,李国辉,贾立.一种基于TF.IEF模型的在线新闻事件探测方法[J].国防科技大学学报,2013,35(3)
    [126]张传岩.Web实体活动与实体关系抽取研究[D].山东大学,2012.
    [127]Yang C C, SHI X D, Wei C P. Discoverying event evolution graphs from news corpors[J].IEEE Transactions on Systems, and Cybernetics, 2009,39(4):850-863.