中国科技论文在线

上传时间

2008年03月24日

【期刊论文】Arnetminer: expertise oriented search using social networks

唐杰， Juanzi LI， Jie TANG， Jing ZHANG， Qiong LUO， Yunhao LIU， Mingcai HONG

Front. Comput. Sci. China ，-0001，（）：

-1年11月30日

Expertise Oriented Search (EOS) aims at providing comprehensive expertise analysis on data from distributed sources. It is useful in many application domains, for example, finding experts on a given topic, detecting the confliction of interest between researchers, and assigning reviewers to proposals. In this paper, we present the design and implementation of our expertise oriented search system, Arnetminer (http: //www.arnetminer.net). Arnetminer has gathered and integrated information about a half-million computer science researchers from the Web, including their profiles and publications. Moreover, Arnetminer constructs a social network among these researchers through their co-authorship, and utilizes this network information as well as the individual profiles to facilitate expertise oriented search tasks. In particular, the co-authorship information is used both in ranking the expertise of individual researchers for a given topic and in searching for associations between researchers. We have conducted initial experiments on Arnetminer. Our results demonstrate that the proposed relevancy propagation expert finding method outperforms the method that only uses person local information, and the proposed twostage association search on a large-scale social network is orders of magnitude faster than the baseline method.

social network， expertise search， association search

100浏览
0点赞
0收藏
0分享
284下载
0

引用

上传时间

2008年03月24日

【期刊论文】A Unified Tagging Approach to Text Normalization

唐杰， Conghui Zhu， Jie Tang， Hang Li， Hwee Tou Ng， Tie-Jun Zhao

，-0001，（）：

-1年11月30日

摘要

This paper addresses the issue of text normalization, an important yet often overlooked problem in natural language processing. By text normalization, we mean converting ‘informally inputted’ text into the canonical form, by eliminating ‘noises’ in the text and detecting paragraph and sentence boundaries in the text. Previously, text normalization issues were often undertaken in an ad-hoc fashion or studied separately. This paper first gives a formalization of the entire problem. It then proposes a unified tagging approach to perform the task using Conditional Random Fields (CRF). The paper shows that with the introduction of a small set of tags, most of the text normalization tasks can be performed within the approach. The accuracy of the proposed method is high, because the subtasks of normalization are interdependent and should be performed together. Experimental results on email data cleaning show that the proposed method significantly outperforms the approach of using cascaded models and that of employing independent models.

41浏览
0点赞
0收藏
0分享
158下载
0

引用

上传时间

2008年03月24日

【期刊论文】A Constraint-Based Probabilistic Framework for Name Disambiguation

唐杰， Duo Zhang， Jie Tang， Juanzi Li， and Kehong Wang

，-0001，（）：

-1年11月30日

摘要

This paper is concerned with the problem of name disambiguation. By name disambiguation, we mean distinguishing persons with the same name. It is a critical problem in many knowledge management applications. Despite much research work has been conducted, the problem is still not resolved and becomes even more serious, in particular with the popularity of Web 2.0. Previously, name disambiguation was often undertaken in either a supervised or unsupervised fashion. This paper first gives a constraint-based probabilistic model for semi-supervised name disambiguation. Specifically, we focus on investigating the problem in an academic researcher social network (http: //arnetminer.org). The framework combines constraints and Euclidean distance learning, and allows the user to refine the disambiguation results. Experimental results on the researcher social network show that the proposed framework significantly outperforms the baseline method using unsupervised hierarchical clustering algorithm.

Name Disambiguation， Social Network Analysis， Digital Library， Semi-supervised Clustering

182浏览
0点赞
0收藏
0分享
359下载
0

引用

上传时间

2008年03月24日

【期刊论文】1iASA: Learning to Annotate the Semantic Web

唐杰， Jie Tang， Juanzi Li， Hongjun Lu， Bangyong Liang， Xiaotong Huang， Kehong Wang

，-0001，（）：

-1年11月30日

摘要

With the advent of the Semantic Web, there is a great need to upgrade existing web content to semantic web content. This can be accomplished through semantic annotations. Unfortunately, manual annotation is tedious, time consuming and error-prone. In this paper, we propose a tool, called iASA, that learns to automatically annotate web documents according to an ontology. iASA is based on the combination of information extraction (specifically, the Similarity-based Rule Learner—SRL) and machine learning techniques. Using linguistic knowledge and optimal dynamic window size, SRL produces annotation rules of better quality than comparable semantic annotation systems. Similarity-based learning efficiently reduces the search space by avoiding pseudo rule generalization. In the annotation phase, iASA exploits ontology knowledge to refine the annotation it proposes. Moreover, our annotation algorithm exploits machine learning methods to correctly select instances and to predict missing instances. Finally, iASA provides an explanation component that explains the nature of the learner and annotator to the user. Explanations can greatly help users understand the rule induction and annotation process, so that they can focus on correcting rules and annotations quickly. Experimental results show that iASA can reach high accuracy quickly.

63浏览
0点赞
0收藏
0分享
135下载
0

引用

上传时间

2008年03月24日

【期刊论文】1A Mixture Model for Expert Finding

唐杰， Jing Zhang， Jie Tang， Liu Liu， and Juanzi Li

，-0001，（）：

-1年11月30日

摘要

This paper addresses the issue of identifying persons with expertise knowledge on a given topic. Traditional methods usually estimate the relevance between the query and the support documents of candidate experts using, for example, a language model. However, the language model lacks the ability of identifying semantic knowledge, thus results in some right experts cannot be found due to not occurrence of the query terms in the support documents. In this paper, we propose a mixture model based on Probabilistic Latent Semantic Analysis (PLSA) to estimate a hidden semantic theme layer between the terms and the support documents. The hidden themes are used to capture the semantic relevance between the query and the experts. We evaluate our mixture model in a real-world system, ArnetMiner 2. Experimental results indicate that the proposed model outperforms the language models.

34浏览
0点赞
0收藏
0分享
145下载
0

引用