一种适用于联邦学习的分布式Non-IID数据集生成方法

张弛; 高雨佳; 刘亮

0
0
浏览
下载

摘要
关键词
基金信息
论文图表
动态公开评议
相关论文
评论

一种适用于联邦学习的分布式Non-IID数据集生成方法

首发时间：2021-02-26

张弛 ¹
张弛（1995-），女，硕士研究生，主要研究方向：联邦学习
高雨佳 ¹ 刘亮 ¹
刘亮（1982-），男，教授、博导，主要研究方向：物联网

1、北京邮电大学计算机学院，北京，100876

摘要：联邦学习是一种分布式的机器学习。在联邦学习中，分布式边缘设备（例如移动电话）能够协作学习共享的预测模型，同时将所有训练数据保留在设备上，不仅可以充分利用分布在多个节点上的数据训练好的模型，同时也保护了数据隐私。然而，跨设备的数据很可能是Non-IID（非独立同分布）的，这可能会使通过联邦学习训练的模型的性能不稳定。目前，能够支持联邦学习研究的分布式Non-IID数据集依然空缺。因此，本文提出了一种适用于联邦学习训练框架的数据集生成方法，同时提供了两种用于图像分类任务的联邦数据集，包括人工生成的V-MNIST 数据集和自然收集的Third-Eye数据集。同时制定了数据集的衡量参数，包括节点数、数据量、节点数据分布欧式距离等，并在联邦学习框架上对这些数据集进行了实验，以验证其在联邦学习中的应用效果。

关键词：机器学习联邦学习 Non-IID数据集图像分类任务深度学习

For information in English, please click here

A Distributed Non-IID Dataset Generation Method for Federated Learning

ZHANG Chi ¹
张弛（1995-），女，硕士研究生，主要研究方向：联邦学习
GAO Yujia ¹ LIU Liang ¹
刘亮（1982-），男，教授、博导，主要研究方向：物联网

1、Computer Science, Beijing University of Post and Telecommunications, Beijing, 100879

Abstract：Federated learning is a kind of distributed machine learning. In federated learning, distributed edge devices (such as mobile phones) can collaboratively learn shared prediction models, while keeping all training data on the device, not only can make full use of the data trained models distributed on multiple nodes, but also data privacy is also protected. However, cross-device data is likely to be Non-IID (non-independent and identically distributed), which may make the performance of the model trained through federated learning unstable. Currently, the distributed Non-IID data set that can support federated learning research is still vacant. Therefore, this paper proposes a dataset generation method suitable for the federated learning training framework, and also provides two federated datasets for image classification tasks, including the artificially generated V-MNIST dataset and the naturally collected Third-Eye dataset. At the same time, the measurement parameters of the data set are formulated, including the number of nodes, the amount of data, the Euclidean distance of the node data distribution, etc., and experiments on these data sets are carried out on the federated learning framework to verify their application effects in federated learning.

Keywords： Machine Learning Federated Learning Non-IID Dataset Image Classification Task Deep Learning

基金：

论文图表：

引用

导出参考文献

.txt

.ris

.doc

张弛，高雨佳，刘亮. 一种适用于联邦学习的分布式Non-IID数据集生成方法[EB/OL]. 北京：中国科技论文在线 [2021-02-26]. https://www.paper.edu.cn/releasepaper/content/202102-102.

No.****

动态公开评议

共计0人参与

动态评论进行中

全部评论

0/1000

论文编号	202102-102
论文题目	一种适用于联邦学习的分布式Non-IID数据集生成方法
文献类型
收录期刊	上传封面中文期刊英文期刊期刊名称（中文）期刊名称（英文）年，卷（）上传封面中文专著英文专著书名（中文）书名（英文）出版地出版社出版年上传封面中文译著英文译著书名（中文）书名（英文）出版地出版社出版年上传封面中文论文集英文论文集编者.论文集名称（中文） [c]. 出版地出版社出版年， - 编者.论文集名称（英文） [c]. 出版地出版社出版年，- 上传封面中文文献英文文献期刊名称（中文）期刊名称（英文）日期-- 在线地址http:// 上传封面中文文献英文文献文题（中文）文题（英文）出版地出版社,出版日期-- 上传封面中文文献英文文献文题（中文）文题（英文）出版地出版社,出版日期--
英文作者写法：中外文作者均姓前名后，姓大写，名的第一个字母大写，姓全称写出，名可只写第一个字母，其后不加实心圆点“.”, 作者之间用逗号“，”分隔，最后为实心圆点“.”, 示例1：原姓名写法：Albert Einstein,编入参考文献时写法：Einstein A. 示例2：原姓名写法：李时珍；编入参考文献时写法：LI S Z. 示例3：YELLAND R L,JONES S C,EASTON K S,et al.