您当前所在位置: 首页 > 学者
在线提示

恭喜!关注成功

在线提示

确认取消关注该学者?

邀请同行关闭

只需输入对方姓名和电子邮箱,就可以邀请你的同行加入中国科技论文在线。

真实姓名:

电子邮件:

尊敬的

我诚挚的邀请你加入中国科技论文在线,点击

链接,进入网站进行注册。

添加个性化留言

已为您找到该学者10条结果 成果回收站

上传时间

2008年03月24日

【期刊论文】Single Document Summarization with Document Expansion

万小军, Xiaojun Wan and Jianwu Yang

,-0001,():

-1年11月30日

摘要

Existing methods for single document summarization usually make use of only the information contained in the specified document. This paper proposes the technique of document expansion to provide more knowledge to help single document summarization. A specified document is expanded to a small document set by adding a few neighbor documents close to the document, and then the graphranking based algorithm is applied on the expanded document set for extracting sentences from the single document, by making use of both the within-document relationships between sentences of the specified document and the cross-document relationships between sentences of all documents in the document set. The experimental results on the DUC2002 dataset demonstrate the effectiveness of the proposed approach based on document expansion. The cross-document relationships between sentences in the expanded document set are validated to be very important for single document summarization.

上传时间

2008年03月24日

【期刊论文】Towards an Iterative Reinforcement Approach for Simultaneous Document Summarization and Keyword Extraction

万小军, Xiaojun Wan; Jianwu Yang; Jianguo Xiao

Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pages 552–559, Prague, Czech Republic, June 2007,-0001,():

-1年11月30日

摘要

Though both document summarization and keyword extraction aim to extract concise representations from documents, these two tasks have usually been investigated independently. This paper proposes a novel iterative reinforcement approach to simultaneously extracting summary and keywords from single document under the assumption that the summary and keywords of a document can be mutually boosted. The approach can naturally make full use of the reinforcement between sentences and keywords by fusing three kinds of relationships between sentences and words, either homogeneous or heterogeneous. Experimental results show the effectiveness of the proposed approach for both tasks. The corpus- based approach is validated to work almost as well as the knowledge-based approach for computing word semantics.

上传时间

2008年03月24日

【期刊论文】Person Resolution in Person Search Results: WebHawk

万小军, Xiaojun Wan, Jianfeng Gao, Mu Li, Binggong Ding

CIKM’05, October 31-November 5, 2005, Bremen, Germany,-0001,():

-1年11月30日

摘要

Finding information about people on the Web using a search engine is difficult because there is a many-to-many mapping between person names and specific persons (i.e. referents). This paper describes a person resolution system, called WebHawk. Given a list of pages obtained by submitting a person query to a search engine, WebHawk facilitates person search in three steps: First of all, a filter removes those pages that contain no information about any person. Secondly, a cluster groups the remaining pages into different clusters, each for one specific person. To make the resulting clusters more meaningful, an extractor is used to induce query-oriented personal information from each page. Finally, a namer generates an informative description for each cluster so that users can find any specific person easily. The archi-tecture of WebHawk is presented, and the four components are discussed in detail, with a separate evaluation of each component presented where appropriate. A user study shows that WebHawk complements most existing search engines and successfully improves users’ experience of person search on the Web.

Person Resolution, Person Search, Clustering, Junk Filtering

上传时间

2008年03月24日

【期刊论文】Using Proportional Transportation Distances for Measuring Document Similarity

万小军, Xiaojun Wan and Jianwu Yang

M. Lalmas et al. (Eds.): ECIR 2006, LNCS 3936, pp. 25 – 36, 2006. ,-0001,():

-1年11月30日

摘要

A novel document similarity measure based on the Proportional Transportation Distance (PTD) is proposed in this paper. The proposed measure improves on the previously proposed similarity measure based on optimal matching by allowing many-to-many matching between subtopics of documents. After documents are decomposed into sets of subtopics, the Proportional Transportation Distance is employed to evaluate the similarity between sets of subtopics for two documents by solving a transportation problem. Experiments on TDT-3 data demonstrate its good ability for measuring document similarity and also its high robustness, i.e. it does not rely on the underlying document decomposition algorithm largely as the optimal matching based measure.

上传时间

2008年03月24日

【期刊论文】Manifold-Ranking Based Topic-Focused Multi-Document Summarization

万小军, Xiaojun Wan, Jianwu Yang and Jianguo Xiao

,-0001,():

-1年11月30日

摘要

Topic-focused multi-document summarization aims to produce a summary biased to a given topic or user profile. This paper presents a novel extractive approach based on manifold-ranking of sentences to this summarization task. The manifold- ranking process can naturally make full use of both the relationships among all the sentences in the documents and the relationships between the given topic and the sentences. The ranking score is obtained for each sentence in the manifold-ranking process to denote the biased information richness of the sentence. Then the greedy algorithm is employed to impose diversity penalty on each sentence. The summary is produced by choosing the sentences with both high biased information richness and high information novelty. Experiments on DUC2003 and DUC2005 are performed and the ROUGE evaluation results show that the proposed approach can significantly outperform existing approaches of the top performing systems in DUC tasks and baseline approaches.

合作学者

  • 万小军 邀请

    北京大学,北京

    尚未开通主页