您当前所在位置: 首页 > 学者

周傲英

  • 65浏览

  • 0点赞

  • 0收藏

  • 0分享

  • 97下载

  • 0评论

  • 引用

期刊论文

Distributed Data Stream Clustering: A Fast EM-based Approach

周傲英Aoying Zhou§ Feng Cao§ Ying Yan§ Chaofeng Sha§ Xiaofeng He†‡

,-0001,():

URL:

摘要/描述

Clustering data streams has been attracting a lot of research efforts recently. However, this problem has not received enough consideration when the data streams are generated in a distributed fashion, whereas such a scenario is very common in real life applications. There exist constraining factors in clustering the data streams in the distributed environment: the data records generated are noisy or incomplete due to the unreliable distributed system; the system needs to on-line process a huge volume of data; the communication is potentially a bottleneck of the system. All these factors pose great challenge for clustering the distributed data streams. In this paper, we proposed an EM-based (Expectation Maximization) framework to effectively cluster the distributed data streams, with the above fundamental challenges in mind. In the presence of noisy or incomplete data records, our algorithms learn the distribution of underlying data streams by maximizing the likelihood of the data clusters. A test-and-cluster strategy is proposed to reduce the average processing cost, which is especially effective for online clustering over large data streams. Our extensive experimental studies show that the proposed algorithms can achieve a high accuracy with less communication cost, memory consumption and CPU time.

关键词:

【免责声明】以下全部内容由[周傲英]上传于[2011年01月14日 17时50分25秒],版权归原创者所有。本文仅代表作者本人观点,与本网站无关。本网站对文中陈述、观点判断保持中立,不对所包含内容的准确性、可靠性或完整性提供任何明示或暗示的保证。请读者仅作参考,并请自行承担全部责任。

我要评论

全部评论 0

本学者其他成果

    同领域成果