一种基于约束的半监督聚类查询扩展方法
首发时间:2013-11-29
摘要:针对伪相关反馈模型反馈文档信息质量差和扩展词选择不适产生的漂移现象等问题,提出了一种基于约束的半监督聚类查询扩展方法。该方法对初检结果的前k个文档进行人工标注,分成相关文档与不相关文档两类;并利用一种半监督聚类算法对初检结果的前n个文档进行分析,提取出与查询相关的文档作为反馈文档。该方法通过对少量标注文档与查询相关性的学习,能够较准确地估计出大量未知文档与查询的相关性,提高反馈文档的质量,从而有效的提高检索的查全率和查准率。实验结果表明,该方法比传统的伪相关反馈和基于无监督聚类的伪相关反馈有更优的检索性能。
关键词: 信息检索 查询扩展 约束聚类 半监督聚类 伪相关反馈
For information in English, please click here
A query expansion method based on constrained semi-supervised clustering
Abstract:Given that the quality of feedback documents of pseudo-relevance feedback model is poor and expansion terms are selected inappropriately, the new query often drifts from the original query. We propose a query expansion method based on constrained semi-supervised clustering. It marks the top k documents of the initial retrieval set in advance and divides them into relevant documents and irrelevant documents; it analyzes the top n documents using a semi-supervised clustering algorithm to find relevant documents used as feedback documents. The algorithm could more accurately estimate the correlation between a large number of unknown documents and query by learning from a small amount of documents that are known to us, thus improving the quality of the feedback information. The experimental results show that the proposed method outperforms both pseudo-relevance feedback and query-likelihood language model.
Keywords: information retrieval query expansion constrained clustering semi-supervised clustering pseudo-relevance feedback
基金:
论文图表:
引用
No.****
同行评议
勘误表
一种基于约束的半监督聚类查询扩展方法
评论
全部评论0/1000