不同相似度测量方法的K均值聚类分析
首发时间:2012-11-22
摘要:近年来,由于海量数据的普遍性,数据挖掘受到了广泛的关注。聚类作为一个无监督的学习算法,是模式识别、机器学习和数据挖掘等领域的一项重要研究内容。K均值(K-Means)算法是基于划分的一种聚类算法,很多经典的聚类任务都选择该算法作为研究对象。实验采用不同的相似度测量方法,通过UCI的知名数据集Iris在K均值算法上进行聚类实验,从聚类结果错误率和运行效率两个方面对比分析和讨论,为聚类分析研究提供有益的参考。
For information in English, please click here
K-means clustering analysis of the different similarity measures
Abstract:In recent years, due to the universality of the vast amounts of data, data mining has been widespread concern. As an unsupervised learning algorithm, Clustering is an important research content in pattern recognition, machine learning and data mining. K-Means algorithm is a kind of partitioning clustering algorithm and be selected for many classical clustering tasks. In the paper, it use the UCI well-known dataset Iris as the K-Means algorithm input based on different similarity measures for clustering experiments, and comparative analysis and discussion of two aspects of error rate and the running efficiency of the clustering algorithm in Mahout. These can provide useful reference for cluster analysis study.
Keywords: clustering analysis K-MEANS similarity Mahout
基金:
论文图表:
引用
No.****
同行评议
共计0人参与
勘误表
不同相似度测量方法的K均值聚类分析
评论
全部评论0/1000