Re-ranking Answers by Discarding Biases in cQA Sites
首发时间:2018-01-18
Abstract:The vote mechanism employed to rank answers in community-based question answering websites is not very accurate because users will not vote to answers entirely base on their quality. Both the position and the appearance of an answer have an effect on the probability of users voting to it. Except the position bias and appearance bias, the following relationship between users impacts the voting results, too. As a result, the top answers obtained by vote mechanism is not reliable, especially when the votes is not sufficient. To rank answers based on their quality, this paper discussed the influences of the relationship between users to the vote mechanism and proposed a vote process model. Firstly, some assumptions about user's vote activities are made, then the vote process model is processed based on these assumptions to model user's voting process. Through the model inference process, the final equation to calculate answer's quality is get. Finally, an expectation-maximization algorithm is used to calculate the parameters in the final equation. By modeling user voting process,the vote process model can eliminate the influences of biases mentioned above and get the real quality evaluation of answers. Experiments on real dataset demonstrates the effectieness of the model proposed in this paper. In particular, when 30 percent of training data is used, the vote process model achieves a 10.1 percent improvement in precision and a 7.5 percent improvement in MRR compared with the joint click model, which is the state of the art click model.
keywords: data mining, cqa sites, social bias, rank answers
点击查看论文中文信息
问答社区中的社交偏见及答案排序方法研究
摘要:由于问答社区的用户不会完全基于答案的质量对答案进行投票,所以问答社区中使用投票方法进行答案排序得到的排序结果并不准确。答案所在的位置和答案内容的呈现方式都会影响用户的投票结果。除此之外,问答社区中用户之间的关注关系同样影响答案得到的投票数。结果,通过投票方法得到的最佳答案的质量并不十分可靠,当问题下答案的投票数较少时尤其如此。为了使问答社区中答案的排序结果更好地反映答案的质量,本文讨论了用户之间的关注关系对答案投票数的影响,并提出了问答社区中的投票过程模型。首先,本文提出了一些有关用户投票行为的假设,并基于这些假设提出了投票过程模型来对用户的投票过程进行建模。然后,通过模型推断,本文得到了答案质量的最终计算公式。最后使用期望最大化算法计算公式中参数,得到了最终的排序结果。本文提出的投票过程模型可以消除问答社区中用户之间的关注关系对答案得票数的影响,从而得到可靠的答案质量评价。在真实数据集上的对比实验证实了本文提出的答案排序方法的有效性,特别地,与现有的最先进的点击模型相比,当训练数据占比百分之三十时,本文提出的方法在准确率方面获得了百分之10.1的提升,在平均最低排名方面获得了百分之7.5的提升。
关键词: 数据挖掘,问答社区,社交偏见,答案排序
基金:
引用
No.****
同行评议
勘误表
问答社区中的社交偏见及答案排序方法研究
评论
全部评论0/1000