基于行列式点过程的中文多文档摘要研究
首发时间:2017-06-09
摘要:在目前互联网高速发展的大环境之下,网络中充斥着大量的信息,迫切需要提高用户获取“有效”信息的能力,多文档摘要就是应对这个需求的重要工具之一。本文首次把行列式点过程应用到无监督的中文多文档摘要任务中,主要对行列式点过程的全局负相关及多样性进行研究,并尝试使用句子位置、句子长度、标题相似度、句子覆盖度、层次主题模型、词向量在内的多种方法构造行列式点过程的核心矩阵,最终通过行列式点过程抽样得到候选摘要句集。针对Multiling-2015中文多文档摘要评测数据进行实验,结果充分表明了行列式点过程的多样性及其在中文多文档摘要任务领域应用的可行性。
For information in English, please click here
Research of Chinese Multi-Document Summarization based on DPPs
Abstract:With its rapid development, the Internet has been flooded with a large amount of information. Thus users urgently need to improve their access to effective information. Multi-Document Summarization is an effective way to deal with this demand. In this paper, DPPs are firstly applied to unsupervised Chinese Multi-Document Summarization Task. First, a study is made for the global, negative correlation and diversity of DPPs. Then, various features are tried to construct kernel matrix of DPPs, such as sentence position, sentence length, title similarity, hierarchical topic model and word2vec. Finally, DPPs are used to sample and extract candidate set for the final summary. The data of Multiling-2015 Chinese Multi-Document Summarization Task is used to implement the experiments. The results fully show the diversity of DPPs, and the feasibility of its application in the field of Chinese Multi-Document Summarization.
Keywords: Natural Language Processing Multi-Document Summarization Determinantal Point Processes multi-feature
论文图表:
引用
No.4733848119994214****
同行评议
共计0人参与
勘误表
基于行列式点过程的中文多文档摘要研究
评论
全部评论0/1000