6 Star 38 Fork 10

PaddlePaddle / PaddleFL

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
克隆/下载
贡献代码
同步代码
取消
提示: 由于 Git 不支持空文件夾,创建文件夹后会生成空的 .keep 文件
Loading...
README
Apache-2.0

English | 简体中文

Release License

PaddleFL

PaddleFL是一个基于PaddlePaddle的开源联邦学习框架。研究人员可以很轻松地用PaddleFL复制和比较不同的联邦学习算法,开发人员也比较容易在大规模分布式集群中部署PaddleFL联邦学习系统。PaddleFL提供很多种联邦学习策略(横向联邦学习、纵向联邦学习)及其在计算机视觉、自然语言处理、推荐算法等领域的应用。此外,PaddleFL还将提供传统机器学习训练策略的应用,例如多任务学习、联邦学习环境下的迁移学习。依靠着PaddlePaddle的大规模分布式训练和Kubernetes对训练任务的弹性调度能力,PaddleFL可以基于全栈开源软件轻松地部署。

PaddleFL概述

如今,数据变得越来越昂贵,而且跨组织共享原始数据非常困难。联合学习旨在解决组织间数据隔离和数据知识安全共享的问题。联邦学习的概念由谷歌的研究人员提出[1,2,3]。PaddleFL 基于 PaddlePaddle 框架对联邦学习进行扩展实现。PaddleFL也提供了在自然语言处理,计算机视觉和推荐算法等领域的应用示例。PaddleFL支持当前主流两类联邦学习策略:横向联邦学习策略和纵向联邦学习策略[4]。未来会对联邦学习中的多任务学习[7]以及迁移学习[8]等进行开发和支持。

  • 横向联邦学习策略: 联邦平均 [2],差分隐私 [6],安全聚合[11];
  • 纵向联邦学习策略: 基于privc[5]的两方训练,基于ABY3[10]的三方训练;

PaddleFL框架设计

PaddleFL 中主要提供两种解决方案:Data Parallel 以及 Federated Learning with MPC (PFM)

  • 通过Data Parallel,各数据方可以基于经典的横向联邦学习策略(如 FedAvg,DPSGD等)完成模型训练。

  • PFM是基于多方安全计算(MPC)实现的联邦学习方案。作为PaddleFL的一个重要组成部分,PFM可以很好地支持联邦学习,包括横向、纵向及联邦迁移学习等多个场景。既提供了可靠的安全性,也拥有可观的性能。

Data Parallel

在PaddeFL中,模型训练的整体流程主要分为两个阶段:编译阶段和运行阶段。编译阶段主要定义联邦学习任务,运行阶段主要进行联邦学习训练工作,每个阶段主要包含的组件如下:

A. 编译阶段

  • FL-Strategy: 用户可以使用FL-Strategy定义联邦学习策略,例如Fed-Avg[2]。

  • User-Defined-Program: PaddlePaddle的程序定义了机器学习模型结构和训练策略,如多任务学习。

  • Distributed-Config: 在联邦学习中,系统会部署在分布式环境中。分布式训练配置定义分布式训练节点信息。

  • FL-Job-Generator: 给定FL-Strategy, User-Defined Program 和 Distributed Training Config,联邦参数的Server端和Worker端的FL-Job将通过FL Job Generator生成。FL-Jobs 被发送到组织和联邦参数服务器以进行联合训练。

B. 运行阶段

  • FL-Server: 在云或第三方集群中运行的联邦参数服务器。

  • FL-Worker: 参与联合学习的每个组织都将有一个或多个与联合参数服务器通信的Worker。

  • FL-Scheduler: 训练过程中起到调度Worker的作用,在每个更新周期前,决定哪些Worker可以参与训练。

请参考更多的例子, 获取更多的信息。

Federated Learning with MPC

PaddleFL MPC 中的安全训练和推理任务是基于高效的多方计算协议实现的,PaddleFL支持三方安全计算协议ABY3[10]和两方计算协议PrivC[5]。基于PrivC的两方联邦学习主要支持线性/逻辑回归、DNN模型。基于ABY3的三方联邦学习线性/逻辑回归、DNN、CNN、FM等

在PaddleFL MPC中,参与方可分为:输入方、计算方和结果方。输入方为训练数据及模型的持有方,负责加密数据和模型,并将其发送到计算方(ABY3协议使用三个计算节点、PrivC协议使用两个计算节点)。计算方为训练的执行方,基于特定的多方安全计算协议完成训练任务。计算方只能得到加密后的数据及模型,以保证数据隐私。计算结束后,结果方会拿到计算结果并恢复出明文数据。每个参与方可充当多个角色,如一个数据拥有方也可以作为计算方参与训练。

PFM的整个训练及推理过程主要由三个部分组成:数据准备,训练/推理,结果解析。

A. 数据准备

  • 私有数据对齐: PFM允许数据拥有方(数据方)在不泄露自己数据的情况下,找出多方共有的样本集合。此功能在纵向联邦学习中非常必要,因为其要求多个数据方在训练前进行数据对齐,并且保护用户的数据隐私。
  • 数据加密及分发:在PFM中,提供在线或离线两种数据加密及分发方案。如果采用离线分发数据,那么数据方在数据准备阶段将数据和模型用秘密共享[9]的方法加密,然后用直接传输或者数据库存储的方式传到计算方。如果选择在线分发数据,数据方在训练过程中在线地对数据和模型进行加密和分发。在数据加密及分发过程中,每个计算方只会拿到数据的一部分,因此计算方无法还原真实数据。

B. 训练/推理

PFM 拥有与PaddlePaddle相同的运行模式。在训练前,用户需要定义MPC协议,训练模型以及训练策略。paddle_fl.mpc中提供了可以操作加密数据的算子,在运行时算子的实例会被创建并被执行器依次运行(训练过程中密文的通信支持gloo和grpc两种网络通信模式)。

请参考以下文档, 以获得更多关于训练阶段的信息。

C. 结果重构

安全训练和推理工作完成后,模型(或预测结果)将由计算方以加密形式输出。结果方可以收集加密的结果,使用PFM中的工具对其进行解密,并将明文结果传递给用户(目前数据的分片和重构支持离线和在线两种模式)。

请参考MPC的例子,以获取更多的信息。

安装

环境依赖

  • CentOS 7 (64 bit)
  • Python 3.5/3.6/3.7/3.8 ( 64 bit)
  • pip3 9.0.1+ (64 bit)
  • PaddlePaddle 1.8.5
  • Redis 5.0.8 (64 bit)
  • GCC or G++ 8.3.1
  • cmake 3.15+

安装部署

我们提供三种方式安装PaddleFL,您可以根据自己的实际情况进行选择:

1.在Docker中使用PaddleFL

我们强烈建议 您在docker中使用PaddleFL。

#Pull and run the docker
docker pull paddlepaddle/paddlefl:1.1.2
docker run --name <docker_name> --net=host -it -v $PWD:/paddle <image id> /bin/bash

Docker中环境配置以及paddlepaddle和paddlefl已经安装完成,您可以直接运行示例代码,开始使用PaddleFL。

2.安装包安装

我们提供了编译好的PaddlePaddle及PaddleFL安装包,您可以直接进行下载安装。

首先安装PaddlePaddle

#Install PaddlePaddle
wget https://paddlefl.bj.bcebos.com/paddlepaddle-1.8.5-cp**-cp**-linux_x86_64.whl
pip3 install paddlepaddle-1.8.5-cp**-cp**-linux_x86_64.whl

安装时,请将**替换成安装环境中的python版本。例如,如果您使用的python版本为python3.8,那么使用下面的命令:

wget https://paddlefl.bj.bcebos.com/paddlepaddle-1.8.5-cp38-cp38-linux_x86_64.whl
pip3 install paddlepaddle-1.8.5-cp38-cp38-linux_x86_64.whl

然后安装PaddleFL

#Install PaddleFL
pip3 install paddle_fl

上述命令会自动安装python3.8对应的PaddleFL。对于其他python3环境,您可以从 https://pypi.org/project/paddle-fl/1.1.2/#files 下载对应安装包手动安装。

3.源码安装 若您希望从源码编译安装,请点击这里

如果使用gloo通信模型,需要用到redis,我们也提供了稳定的redis安装包, 可供下载。

wget --no-check-certificate https://paddlefl.bj.bcebos.com/redis-stable.tar
tar -xf redis-stable.tar
cd redis-stable &&  make

Kubernetes简单部署

横向联邦方案

kubectl apply -f ./python/paddle_fl/paddle_fl/examples/k8s_deployment/master.yaml

请参考K8S部署实例

也可以参考K8S集群申请及kubectl安装 配置自己的K8S集群

PaddleFL 相关扩展

联邦算法模拟器 (fl-mobile simulator)

FL-mobile 是一个集移动端算法模拟调研、训练和部署为一体的框架。算法模拟器 (simulator) 是FL-mobile的一部分。

该模拟器的设计目的,是为了模拟实际线上多个移动端设备配合训练的场景。框架的设计思想在服务器上模拟数个端上设备,快速验证算法效果。模拟器的优势为:

  • 支持单机和分布式训练
  • 支持常见开源数据集的训练
  • 支持模型中的私有参数和共享参数,私有参数不参与全局更新

正在进行的工作

  • PFM支持更多的模型。
  • 发布PFM的K8S部署方案。
  • 手机端的联邦学习模拟器将在下一版本开源。

参考文献

[1]. Jakub Konečný, H. Brendan McMahan, Daniel Ramage, Peter Richtárik. Federated Optimization: Distributed Machine Learning for On-Device Intelligence. arXiv preprint 2016

[2]. H. Brendan McMahan, Eider Moore, Daniel Ramage, Blaise Agüera y Arcas. Federated Learning of Deep Networks using Model Averaging. arXiv preprint 2016

[3]. Jakub Konečný, H. Brendan McMahan, Felix X. Yu, Peter Richtárik, Ananda Theertha Suresh, Dave Bacon. Federated Learning: Strategies for Improving Communication Efficiency. arXiv preprint 2016

[4]. Qiang Yang, Yang Liu, Tianjian Chen, Yongxin Tong. Federated Machine Learning: Concept and Applications. ACM Transactions on Intelligent Systems and Technology 2019

[5]. Kai He, Liu Yang, Jue Hong, Jinghua Jiang, Jieming Wu, Xu Dong et al. PrivC - A framework for efficient Secure Two-Party Computation. In Proc. of SecureComm 2019

[6]. Martín Abadi, Andy Chu, Ian Goodfellow, H. Brendan McMahan, Ilya Mironov, Kunal Talwar, Li Zhang. Deep Learning with Differential Privacy. In Proc. of CCS 2016

[7]. Virginia Smith, Chao-Kai Chiang, Maziar Sanjabi, Ameet Talwalkar. Federated Multi-Task Learning. In Proc. of NIPS 2017

[8]. Yang Liu, Tianjian Chen, Qiang Yang. Secure Federated Transfer Learning. IEEE Intelligent Systems 2018

[9]. https://en.wikipedia.org/wiki/Secret_sharing

[10]. Payman Mohassel and Peter Rindal. ABY3: A Mixed Protocol Framework for Machine Learning. In Proc. of CCS 2018

[11]. Aaron Segal Antonio Marcedone Benjamin Kreuter Daniel Ramage H. Brendan McMahan Karn Seth K. A. Bonawitz Sarvar Patel Vladimir Ivanov. Practical Secure Aggregation for Privacy-Preserving Machine Learning. In Proc. of CCS 2017

Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 1. Definitions. "License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document. "Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License. "Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity. "You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License. "Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files. "Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types. "Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below). "Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof. "Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution." "Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work. 2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form. 3. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed. 4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions: (a) You must give any other recipients of the Work or Derivative Works a copy of this License; and (b) You must cause any modified files to carry prominent notices stating that You changed the files; and (c) You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and (d) If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License. You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License. 5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions. 6. Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file. 7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License. 8. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages. 9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability. END OF TERMS AND CONDITIONS APPENDIX: How to apply the Apache License to your work. To apply the Apache License to your work, attach the following boilerplate notice, with the fields enclosed by brackets "[]" replaced with your own identifying information. (Don't include the brackets!) The text should be enclosed in the appropriate comment syntax for the file format. We also recommend that a file or class name and description of purpose be included on the same "printed page" as the copyright notice for easier identification within third-party archives. Copyright [yyyy] [name of copyright owner] Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

简介

PaddleFL是一个基于PaddlePaddle的开源联邦学习框架 展开 收起
Python 等 5 种语言
Apache-2.0
取消

发行版

暂无发行版

贡献者

全部

近期动态

加载更多
不能加载更多了
Python
1
https://gitee.com/paddlepaddle/PaddleFL.git
git@gitee.com:paddlepaddle/PaddleFL.git
paddlepaddle
PaddleFL
PaddleFL
master

搜索帮助