基于spark平台的K-means改进算法

闫萌; 邹俊伟

0
0
浏览
下载

摘要
关键词
基金信息
论文图表
同行评议
相关论文
评论

基于spark平台的K-means改进算法

首发时间：2017-12-05

闫萌 ¹
闫萌（1993-），女，硕士研究生，主要研究方向：智能卡与信息安全，数据挖掘
邹俊伟 ¹
邹俊伟（1975-），男，讲师，主要研究方向，智能卡与信息安全

1、北京邮电大学电子工程学院通信与网络研究中心，北京，100876

摘要：K-means算法是较为经典的聚类算法。针对经典的K-means算法存在的K值个数和初始聚类中心需要人为指定的缺陷，以及经典的串行K-means算法在面对海量数据时性能不足的问题，提出了一种canopy-Kmeans算法。该算法引入canopy算法，作为K-means算法的前置算法，得到初始聚类中心点和K 值，并结合并行化编程框架 Spark ，实现算法的并行化,充分利用spark的内存计算优势，提高聚类效率。通过实验表明，canopy-Kmeans算法相较于传统的串行K-means算法和未经改进的并行算法，在准确率和效率上均有提升。

关键词：聚类算法 K-means算法并行化 spark

For information in English, please click here

The advanced K-means based on spark

YAN Meng ¹
闫萌（1993-），女，硕士研究生，主要研究方向：智能卡与信息安全，数据挖掘
ZOU Junwei ¹
邹俊伟（1975-），男，讲师，主要研究方向，智能卡与信息安全

1、Communicaton and Network Research Center,School of Electronic Engineering,Beijing University of Posts and Telecommunications,Beijing,100876

Abstract：Aiming at the problem that the number of K values and initial clusteringcenter in classical K-means algorithmneed to be artificially specified and that classical serial K-means algorithm in the face of massive data, a canopy- Kmeans algorithm is raised. The algorithm introduces the canopy algorithm as a pre-algorithm of the K-means algorithm to get the initial clustering center point and K value, and combines the Spark framework to parallelize the algorithm. It takes full advantage of Spark\'s memory computing advantages and improves the clustering efficiency. Experiments show that the canopy-Kmeans algorithm has higher accuracy and efficiency than the traditional K-means algorithm and unmodified parallel algorithm.

Keywords： clustering algorithm,K-means algorithm,parallelization,spark

基金：

论文图表：

引用

导出参考文献

.txt

.ris

.doc

闫萌，邹俊伟. 基于spark平台的K-means改进算法[EB/OL]. 北京：中国科技论文在线 [2017-12-05]. https://www.paper.edu.cn/releasepaper/content/201712-50.

No.****

同行评议

未申请同行评议

全部评论

0/1000

论文编号	201712-50
论文题目	基于spark平台的K-means改进算法
文献类型
收录期刊	上传封面中文期刊英文期刊期刊名称（中文）期刊名称（英文）年，卷（）上传封面中文专著英文专著书名（中文）书名（英文）出版地出版社出版年上传封面中文译著英文译著书名（中文）书名（英文）出版地出版社出版年上传封面中文论文集英文论文集编者.论文集名称（中文） [c]. 出版地出版社出版年， - 编者.论文集名称（英文） [c]. 出版地出版社出版年，- 上传封面中文文献英文文献期刊名称（中文）期刊名称（英文）日期-- 在线地址http:// 上传封面中文文献英文文献文题（中文）文题（英文）出版地出版社,出版日期-- 上传封面中文文献英文文献文题（中文）文题（英文）出版地出版社,出版日期--
英文作者写法：中外文作者均姓前名后，姓大写，名的第一个字母大写，姓全称写出，名可只写第一个字母，其后不加实心圆点“.”, 作者之间用逗号“，”分隔，最后为实心圆点“.”, 示例1：原姓名写法：Albert Einstein,编入参考文献时写法：Einstein A. 示例2：原姓名写法：李时珍；编入参考文献时写法：LI S Z. 示例3：YELLAND R L,JONES S C,EASTON K S,et al.