基于被引片段识别的科技文摘综述

英文篇名：Review on Scientific Summary Based on the Recognition of Cited Spans
作者：李纲 ; 徐健 ; 余辉 ; 马亚雪
英文作者：Li Gang;Xu Jian;Yu Hui;Ma Yaxue;Center for Studies of Information Resources,Wuhan University;School of Economy and Management,Hubei University of Technology;
关键词：被引片段 ; 科技摘要 ; 引文上下文
英文关键词：cited spans;;scientific summary;;citation context
中文刊名：XDQB
英文刊名：Journal of Modern Information
机构：武汉大学信息资源研究中心;湖北工业大学经济与管理学院;
出版日期：2017-09-15
出版单位：现代情报
年：2017
期：v.37;No.315
基金：国家自然科学基金项目“科研团队动态演化规律研究”(项目编号:71273196)
语种：中文;
页：XDQB201709029
页数：6
CN：09
ISSN：22-1182/G3
分类号：173-178

摘要

[目的 /意义]基于被引片段识别的科技文摘生成是文献计量学、信息检索和自然语言处理等领域共同关注的研究问题。通过梳理相关成果,可为后续研究提供借鉴。[方法 /过程]本文首先介绍被引片段概念,进而从被引片段识别与分类、文摘生成与评价等步骤对相关研究进行综述。[结果 /结论 ]当前被引片段识别总体上可以分为机器学习和检索两类,分面判定还存在标准不一致的问题,摘要生成与评估方法相关研究较欠缺。
[Purpose/significance] Scientific summarization based on reference spans recognition is the common research question in study fields such as bibliometrics,information retrieval and natural language processing. By reviewing the related research achievements,we hope to provide some references for the further research. [Method/process]This paper introduced reference spans concept and presented and analyzed the latest development of areas such as reference spans recognition,summary facets classification and summary generation and evaluation. [Result/conclusion] At present,the methods of reference spans recognition could be put into two types,including methods based on machine learning and methods on retrieval. Th ere were no uniform standards of discourse facets of abstract. Th e researches on the produce of summary and evaluation methodology were still relatively lacking.

引文

[1]ELKISS A,SHEN S,FADER A,et al.Blind Men and Elephants:What do Citation Summaries Tell us About a Research Article?[J].Journal of the Association for Information Science and Technology,2008,59(1):51-62.
    [2]LUHN H P.The Automatic Creation of Literature Abstracts[J].IBM Journal of Research and Development,1958,2(2):159-165.
    [3]HU P,GUO Y,JI D,et al.Leveraging Hybrid Citation Context for Impact Summarization[C]//Proceedings of the PacificAsia Conference on Knowledge Discovery and Data Mining.Heidelberg:Springer,2013:354-365.
    [4]刘天祎,步一,赵丹群,等.自动引文摘要研究述评[J].现代图书情报技术,2016,32(5):1-8.
    [5]LE M H,HO T B,Nakamori Y.Detecting Emerging Trends from Scientific Corpora[J].International Journal of Knowledge and Systems Sciences,2005,2(2):53-59.
    [6]祝清松,冷伏海.基于引文内容分析的高被引论文主题识别研究[J].中国图书馆学报,2014,40(1):39-49.
    [7]RITCHIE A.Citation Context Analysis for Information Retrieval[D].University of Cambridge,2009.
    [8]ALJABER B,STOKES N,BAILEY J,et al.Document Clustering of Scientific Texts Using Citation Contexts[J].Information Retrieval,2010,13(2):101-131.
    [9]QAZVINIAN V,RADEV D R.Scientific Paper Summarization Using Citation Summary Networks[C]//Proceedings of International Conference on Computational Linguistics.Manchester:the Association for Computational Linguistics,2008:689-696.
    [10]KAPLAN D,IIDA R,TOKUNAGA T.Automatic Extraction of Citation Contexts for Research Paper Summarization:A Coreference-chain Based Approach[C]//Proceedings of the2009 Workshop on Text and Citation Analysis for Scholarly Digital Libraries.Suntec Singapore:Association for Computational Linguistics,2009:88-95.
    [11]TEUFEL S,SIDDHARTHAN A,TIDHAR D.Automatic Classification of Citation Function[C]//Proceedings of the2006 Conference on Empirical Methods in Natural Language Processing.Sydney:DBLP,2006:103-110.
    [12]MEI Q,ZHAI C X.Generating Impact-Based Summaries for Scientific Literature[C]//Proceedings of the,Meeting of the Association for Computational Linguistics.Columbus:DBLP,2008:816-824.
    [13]COHAN A,GOHARIAN N.Scientific Article Summarization Using Citation-Context and Articles Discourse Structure.//Proceedings of Empirical Methods in Natural Language Processing.Lisbon:the Association for Computational Linguistics,2015:390–400.
    [14]JAIDKA K,CHANDRASEKAREN M K,ELIZALDE B F,et al.The Computational Linguistics Summarization Pilot Task[C]//Proceedings of the Text Analysis Conference(TAC 2014).Maryland,2014.
    [15]JAIDKA K,CHANDRASEKAREN M K,RUSTAGI S,et al.Overview of the CL-Sci Summ 2016 Shared Task[C]//Proceedings of the Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language.Newark:DBLP,2016:93-102.
    [16]TEXT ANALYSIS CONFERENCE(TAC)[EB/OL].https://tac.nist.gov//2014/Biomed Summ/index.html.
    [17]MOLLA D,JONES C,SARKERS A.Impact of Citing Papers for Summarisation of Clinical Documents[C]//Proceedings of the Australasian Language Technology Workshop.Melbourne,2014.
    [18]COHAN A,SOLDAINI L,GOHARIAN N.Towards CitationBased Summarization of Biomedical Literature[C]//Proceedings of the Text Analysis Conference(TAC 2014).Maryland,2014.
    [19]NOMOTO T.NEAL:A Neurally Enhanced Approach to Linking Citation and Reference[C]//Proceedings of the Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language.Newark:DBLP,2016:168-174.
    [20]KLAMPFL S,REXHA A,KERN R.Identifying Referenced Text in Scientific Publications by Summarisation and Classification Techniques[C]//Proceedings of the Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language.Newark:DBLP,2016:122-131.
    [21]MORAES L,BAKI S,VERMA R,et al.University of Houston at CL-Sci Summ 2016:SVMs with tree kernels and Sentence Similarity[C]//Proceedings of the Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language.Newark:DBLP,2016:113-121.
    [22]SAGGION H,ABURAED A,RONZANO F.Trainable Citation-Enhanced Summarization of Scientific Articles[C]//Proceedings of the Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language.Newark:DBLP,2016:175-86.
    [23]CAO Z,LI W,WU D.Poly U at CL-Sci Summ 2016[C]//Proceedings of the Joint Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language.Newark:DBLP,2016:132-138.
    [24]LU K,MAO J,LI G,et al.Recognizing Reference Spans and Classifying their Discourse Facets[C]//Proceedings of the Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language.Newark:DBLP,2016:139-145.
    [25]AGGARWAL P,SHARMA R.Lexical and Syntactic Cues to Identify Reference Scope of Citance[C]//Proceedings of the Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language.Newark:DBLP,2016:103-112.
    [26]叶干军.文摘编写中分面分析的应用[J].现代情报,1987(1):33-33.
    [27]U.S.National Library of Medicine.Structured Abstracts,[EB/OL].[2008-5-8].http://www.nlm.nih.gov/bsd/policy/structured abstracts.html.
    [28]COUNCIL OF NATIONAL LIBRARY ASSOCIATIONS.American National Standard for Writing Abstracts[M].New York:American National Standards Institute,1979.
    [29]SOLDATOVA L N,LIAKATA M.An Ontology Methodology and CISP-the Proposed Core Information about Scientific Papers[EB/OL].JISC Project Report,[2007-12-21].http://repository.jisc.ac.uk/137
    [30]GUO Y,KORHONEN A,LIAKATA M,et al.Identifying the Information Structure of Scientific Abstracts:an Investigation of Three Different Schemes[C]//Proceedings of the 2010Workshop on Biomedical Natural Language Processing.Uppsala:Association for Computational Linguistics,2010:99-107.
    [31]YAMAMOTO Y,TAKAGI T.A Sentence Classification System for Multi-document Summarization in the Biomedical Domain[C]//Proceedings of International Workshop on Biomedical Data Engineering,2005:90-95.
    [32]白光祖,何远标,马建霞,等.利用小样本量机器学习实现学术文摘结构的自动识别[J].现代图书情报技术,2014,30(7):34-40.
    [33]MALENFANT B,LAPALME G.RALI System Description for CL-Sci Summ 2016 Shared Task[C]//Proceedings of the Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language.Newark:DBLP,2016:146-155.
    [34]LI L,MAO L,ZHANG Y,et al.CIST System for CLSci Summ 2016 Shared Task[C]//Proceedings of the Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language.Newark:DBLP,2016:156-167.
    [35]陈海华,黄永,张炯,等.基于引文上下文的学术文本自动摘要技术研究[J].数字图书馆论坛,2016,08:43-49.
    [36]段雪莹,王阳.基于遗传算法的中文多文档自动摘要方法研究[J].科技信息,2010,35:79,45
    [37]CHRISTENSEN J,MAUSAM S S,SODERLAND S,et al.Towards Coherent Multi-Document Summarization[C]//Proceedings of North American Chapter of the Association for Computational Linguistics:Human Language Technologies.2013:1163-1173.
    [38]LIN C Y.Rouge:A package for Automatic Evaluation of Summaries[C]//Proceeding of the Text Summarization Branches Out:Proceedings of the ACL-04 Workshop.Barcelona:ACL,2004:49-52.