基于主题模型的短文本查询扩展算法
首发时间:2014-01-03
摘要:近年来,微博短文本语料下的信息检索需求日益突出。查询扩展作为信息检索领域的关键技术,对于查询结果的优化具有非常重要的作用。本文提出了一种基于Bayes-LDA模型的微博语料建模方法,该模型能够在保证建模质量的基础上对微博短文本的完整建模;并设计了基于以上模型的微博语料查询扩展算法,其核心是将Bayes-LDA的建模结果应用于特征词的生成与选择、查询结果重排序等操作,从而提高短文本查询的效果。实验结果表明,该算法在TREC 2011年微博评测的数据集中的多种主要性能指标均优于BM25伪相关反馈方法。
关键词: 自然语言处理 查询扩展 LDA模型 短文本 贝叶斯理论 伪相关反馈
For information in English, please click here
SHORT TEXT QUERY EXPANSION BASED ON TOPIC MODEL
Abstract:In recent years, the requirement of microblog retrieval is becoming more. As a key technology in the field of information retrieval, query expansion is vital to optimize retrieved results. This paper proposes a Bayes-LDA based modeling method on microblog. The model can guarantee the quality and completeness of the modeling on short texts such as microblogs. We design a query expansion algorithm based on this model. Its core thought is to apply the modeling results of Bayes-LDA to the generation of expansion features and the re-ranking of search results. The experiments show that this algorithm has a better performance of various indicators on the TREC 2011 Microblog evaluation corpus than the BM25 pseudo-relevance feedback method.
Keywords: Natural Language Processing Query Expansion LDA Model Short Texts Bayesian Theory Pseudo-relevance Feedback
基金:
论文图表:
引用
No.****
同行评议
共计0人参与
勘误表
基于主题模型的短文本查询扩展算法
评论
全部评论0/1000