基于事件要素的自动文摘抽取
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Automatic Extraction Summarization Based on Event Elements
  • 作者:孙佩佩 ; 廖涛 ; 刘宗田
  • 英文作者:SUN Peipei;LIAO Tao;LIU Zongtian;School of Computer Science and Engineering,Anhui University of Science and Technology;School of Computer Engineering and Science,Shanghai University;
  • 关键词:事件要素 ; 中文突发事件语料库 ; 无向图 ; 权重 ; 自动文摘
  • 英文关键词:event element,CEC corpus,undirected graph,weight,automatic summarization
  • 中文刊名:JSSG
  • 英文刊名:Computer & Digital Engineering
  • 机构:安徽理工大学计算机科学与工程学院;上海大学计算机工程与科学学院;
  • 出版日期:2015-10-20
  • 出版单位:计算机与数字工程
  • 年:2015
  • 期:v.43;No.312
  • 基金:国家自然科学基金面上项目(编号:61273328)资助
  • 语种:中文;
  • 页:JSSG201510025
  • 页数:5
  • CN:10
  • ISSN:42-1372/TP
  • 分类号:121-125
摘要
对传统自动文摘技术中容易产生的信息冗余和内容覆盖不全面问题,而目前主流的技术主要是面向词语的自动文摘。论文针对事件知识粒度下的事件要素在该问题上的有效性进行研究。首先通过标注好的CEC语料库来获取事件要素,然后构建事件要素无向图,其次再对无向图节点和无向边进行权值计算,最后得到简练的文摘句,进而按照原文本顺序输出文摘。实验主要在CEC语料库上进行,较其它方法而言,召回率和准确率取得了较好的效果,平均F值可达0.62,能更好地概括文本内容。
        When adopting traditional automatic summarization,it emerges information redundancy and incomplete content covering,but currently the mainstream automatic summarization turns towards to extracting words.In this paper the effectiveness of this issue about event elements on the size of event is studied.Firstly the event elements through the tagged CEC corpus are obtained;Then an event element undirected graph is built,nodes'and edges'weights of the undirected graph are calculated;Finally the concise summary sentences are gotten and the text summarization in accordance with the original text sequence is outputted.Experiments are conducted on CEC corpus,recall and precision have got better results to many other methods and the average value F of this method can be raised to 0.62,which can better generalize the text content.
引文
[1]LUHN H P.The automatic creation of literature absTracts[J].IBM Journal of Research and Development,1958,2(2):159-165.
    [2]宋锐,林鸿飞.基于文档语义图的中文多文档摘要生成机制[J].中文信息学报,2009,23(3):110-115.SONG Rui,LIN Hongfei.Document Semantic Graph Based Chinese Multi-Document Summarization[J].Journal of Chinese Informa-tion Processing,2009,23(3):110-115.
    [3]JIANG Changjin,PENG Hong,MA Qianli,et al.Auto-matic Summarization for Chinese Text Based on Combined Words Recognition and Paragraph Clustering[C]//Proceedings of 2010 3rd International Symposium on Intelligent Information Technology and Security Informatics(IITSI),2010:591-594.
    [4]ZHANG Peiying,LI Cunhe.Automatic text summarizeation based on sentences clustering and extraction[C]//Proceedings of 2nd IEEE International Conference on Computer Science and Information Technology(ICCSIT),2009:167-170.
    [5]NAOMI Daniel,DRAGOMIR Redav,Timothy Allison.Subevent based multi-document summarization[C]//Proceedings of HLT-NAACL workshop on text summarization,2003:9-16.
    [6]刘茂福,李文捷,姬东鸿.基于事件项语义图聚类的多文档摘要方法[J].中文信息学报,2010,24(5):77-84.LIU Maofu,LI Wenjie,JI Donghong.Multi-Document Summarization Based on Event Term Semantic Relation Graph Clusterring[J].Journal of Chinese Information Processing,2010,24(5):77-84.
    [7]葛斌,李芳芳,李阜,等.基于无向图构建策略的主题句抽取[J].计算机科学,2011,38(5):181-185.GE Bin,LI Fangfang,LI Fu,et al.Subject SentenceExtraction Based on Undirected Graph Construction[J].Computer Science,2011,38(5):181-185.
    [8]蒋效宇.基于关键词抽取的自动文摘算法[J].计算机工程,2012,38(3):183-186.JIANG Xiaoyu.Automatic Summarization Algorithm Based on Keyword Extraction[J].Computer Engineering,2012,38(3):183-186.
    [9]王萌,唐新来,何婷婷.一种文本分割技术的多文档文摘方法研究[J].计算机应用与软件,2014,31(9):40-44.WANG Meng,TANG Xinlai,HE Tingting.On An Multi-Document Abstract Approach Based on Text Segmentation Technology[J].Computer Applications and Software,2014,31(9):40-44.
    [10]刘宗田,黄美丽,周文,等.面向事件的本体研究[J].计算机科学,2009,36(11):189-192,199.LIU Zongtian,HUANG Meili,ZHOU Wen,et al.Research on Event-Oriented Ontology Model[J].Computer Science,2009,36(11):189-192,199.
    [11]廖涛,刘宗田,王先传.基于事件的文本表示方法研究[J].计算机科学,2012,39(12):188-191.LIAO Tao,LIU Zongtian,WANG Xianchuan.Research on Event-Based Method for Text Representation[J].Computer Science,2012,39(12):188-191.
    [12]付剑锋.面向事件的知识处理研究[D].上海:上海大学,2010.FU Jianfeng.Research on Event-Oriented Knowledge Processing[D].Shanghai:Shanghai University,2010.
    [13]苗守谦,卫志华.中文文本信息处理的原理与应用[M].北京:清华大学出版社,2007,109-150.MIAO Shouqian,WEI Zhihua.Principles and Applications of Chinese Text Information Processing[M].Tsinghua University press,2007:109-150.
    [14]BRIN S.The Anatomy of a Large-Scale Hyper-textual Web Search Engine[J].Computer Networks and ISDN Systems,1998,30(3):1-7.
    [15]KLEINBERG J M.Authoritative Sources in a Hyperlinked Environment[J].Journal of the ACM,1998,46(5):604-632.