您当前所在位置: 首页 > 学者

于戈

  • 47浏览

  • 0点赞

  • 0收藏

  • 0分享

  • 136下载

  • 0评论

  • 引用

期刊论文

MMPClust: A Skew Prevention Algorithm for Model-Based Document Clustering*

于戈Xiaoguang Li Ge Yu and Daling Wang

DASFAA 2005, LNCS 3453, pp. 536-547, 2005.,-0001,():

URL:

摘要/描述

To support very high dimensionality, model-based clustering is an intuitive choice for document clustering. However, the current model-based algorithms are prone to generating the skewed clusters, which influence the quality of clustering seriously. In this paper, the reasons of skew are examined and determined as the inappropriate initial model, the unfitness of cluster model and the interaction between the decentralization of estimation samples and the over-generalized cluster model. This paper proposes a skew prevention document-clustering algorithm (MMPClust), which has two features: (1) a content-based cluster model is used to model the cluster better; (2) at the re-estimation step, a part of documents most relevant to its corresponding class are selected automatically for each cluster as the estimation samples to break this interaction. MMPClust has less restrictions and more applicability in document clustering than the previous methods.

关键词:

【免责声明】以下全部内容由[于戈]上传于[2005年10月31日 23时08分38秒],版权归原创者所有。本文仅代表作者本人观点,与本网站无关。本网站对文中陈述、观点判断保持中立,不对所包含内容的准确性、可靠性或完整性提供任何明示或暗示的保证。请读者仅作参考,并请自行承担全部责任。

我要评论

全部评论 0

本学者其他成果

    同领域成果