一种适用于联邦学习的分布式Non-IID数据集生成方法
首发时间:2021-02-26
摘要:联邦学习是一种分布式的机器学习。在联邦学习中,分布式边缘设备(例如移动电话)能够协作学习共享的预测模型,同时将所有训练数据保留在设备上,不仅可以充分利用分布在多个节点上的数据训练好的模型,同时也保护了数据隐私。然而,跨设备的数据很可能是Non-IID(非独立同分布)的,这可能会使通过联邦学习训练的模型的性能不稳定。目前,能够支持联邦学习研究的分布式Non-IID数据集依然空缺。因此,本文提出了一种适用于联邦学习训练框架的数据集生成方法,同时提供了两种用于图像分类任务的联邦数据集,包括人工生成的V-MNIST 数据集和自然收集的Third-Eye数据集。同时制定了数据集的衡量参数,包括节点数、数据量、节点数据分布欧式距离等,并在联邦学习框架上对这些数据集进行了实验,以验证其在联邦学习中的应用效果。
关键词: 机器学习 联邦学习 Non-IID数据集 图像分类任务 深度学习
For information in English, please click here
A Distributed Non-IID Dataset Generation Method for Federated Learning
Abstract:Federated learning is a kind of distributed machine learning. In federated learning, distributed edge devices (such as mobile phones) can collaboratively learn shared prediction models, while keeping all training data on the device, not only can make full use of the data trained models distributed on multiple nodes, but also data privacy is also protected. However, cross-device data is likely to be Non-IID (non-independent and identically distributed), which may make the performance of the model trained through federated learning unstable. Currently, the distributed Non-IID data set that can support federated learning research is still vacant. Therefore, this paper proposes a dataset generation method suitable for the federated learning training framework, and also provides two federated datasets for image classification tasks, including the artificially generated V-MNIST dataset and the naturally collected Third-Eye dataset. At the same time, the measurement parameters of the data set are formulated, including the number of nodes, the amount of data, the Euclidean distance of the node data distribution, etc., and experiments on these data sets are carried out on the federated learning framework to verify their application effects in federated learning.
Keywords: Machine Learning Federated Learning Non-IID Dataset Image Classification Task Deep Learning
基金:
引用
No.****
动态公开评议
共计0人参与
勘误表
一种适用于联邦学习的分布式Non-IID数据集生成方法
评论
全部评论0/1000