Federated Learning Glossary
From Open Risk Manual
Contents
Federated Learning Glossary
A Glossary of Federated Learning terminology. The glossary covers a cross-section of terms that are relevant for privacy-preserving computation, spanning domains such as cryptography, database architectures and common statistical / machine learning algorithms. It does not aim to be exhaustive in any of those contributing domains.
Categories
For easier use of the glossary we classify terms according to context in which they arize. NB: The boundaries may not always be crystal clear:
- Use Case is a general application domain where some type of federated analysis is used
- Process is a required procedure
- Risk Factor is any aspect that can compromise / invalidate the premise of federated analysis (not necessarily malicious)
- Property is any rigorously defined aspect of a federated system that measures or guarantees e.g. privacy or security features
- Algorithm in this context are federated algorithms for data analysis, NOT the low level computation primitives (see Protocol)
- Protocol is any concretely defined pattern of information exchange (that is specifically useful in a federated context)
- Architecture is a scheme or organizational pattern of organizations, computational or storage devices etc. that defines a particular type of federation
- Agent is any entity within an overal architecture
Glossary
Term | Acronym | Meaning and Context | Category | Links / References |
---|---|---|---|---|
Federated Learning | FL | A machine learning paradigm that trains an algorithm across multiple devices or servers holding local data samples, without exchanging them. A centralised model is trained by locally computing updates and merging them to the centralised model without sharing data. | Use Case | Wikipedia |
Privacy by Design | Privacy by design aims at building privacy and Data Protection up front, into the design specifications and architecture of information and communication systems and technologies, in order to facilitate compliance with Data Privacy and data protection principles | Architecture | ||
Privacy Preserving Technology | PET | A coherent system of information and communication technology (ICT) measures that protect privacy by eliminating or reducing personal data or by preventing unnecessary and/or undesired processing of personal data, all without losing the functionality of the information system | Architecture | |
Privacy-Preserving Computation | PPC | Any general IT architecture that allows performing computations on networked computing devices while preserving some aspect of Data Privacy | Use Case | |
Privacy-Preserving Data Mining | PPDM | The extraction of relevant knowledge from large amount of data (Big Data), while protecting at the same time sensitive information. | Use Case | |
Secure Multi-Party Computation | MPC | A subfield of cryptography with the goal of creating methods for parties to jointly compute a function over their inputs while keeping those inputs private. SMC guarantees that none of the parties share anything with
each other or with any third party, it can not prevent an adversary from learning some individual information |
Protocol | Wikipedia |
Federated Database System | FDBS | A type of meta-database management system (DBMS), which transparently maps multiple autonomous database systems into a single federated database | Architecture | Wikipedia |
Differential Privacy | ε-differential privacy. A mathematical definition for the privacy loss associated with any data release drawn from a statistical database. It measures, e.g., to what extent the parameters or predictions of a model reveal information about any individual points in the training dataset. It ensures that the addition or removal does not substantially affect the outcome of any analysis. | Property | Wikipedia | |
Data Federation | Also Data Sharing. It is the general process of aggregating / sharing data that exist in distributed data sources | Process | Wikipedia | |
k-anonymity | A release of data is said to have the k-anonymity property if the information for each person contained in the release cannot be distinguished from at least k − 1 individuals whose information also appear in the release | Property | Wikipedia | |
Data Anonymization | The process of removing personally identifiable information from data sets, so that the people whom the data describe remain anonymous | Process | Wikipedia | |
Data Re-Identification | Also de-anonymization. Is the risk rising from the possibility of matching anonymous data with publicly available information, or auxiliary data, to discover information that was deemed private | Risk Factor | Wikipedia | |
Homomorphic Encryption | HE | A form of encryption allowing one to perform calculations on encrypted data without decrypting it first. Homomorphic encryption is a public key system, where any party can encrypt its data with a known public key and perform calculations with data encrypted by others with the same public key. Arbitrarily complicated functions of the data can be computed this way (“Fully Homomorphic Encryption”) though at greater computational cost. | Protocol | Wikipedia |
Private Set Intersection | A secure multiparty computation cryptographic technique (MPC) that allows two parties holding sets to compare encrypted versions of these sets in order to compute the intersection. Neither party reveals anything to the counterparty except for the elements in the intersection. | Protocol | Wikipedia | |
Alice and Bob | Alice and Bob are fictional characters commonly used as placeholders in discussions about cryptographic protocols or systems. They typically represent agents possessing (or seeking) private information | Agent | Wikipedia | |
Trusted Third Party | TTP | An entity which facilitates interactions between two parties who both trust the third party. Whether a TTP exists or not has major design implications for privacy-preserving computations. | Agent | Wikipedia |
Horizontally Partitioned Data | A data distribution design that applies to structured data. In horizontally partitioned data different rows from a common schema are located in distinct databases / devices. For example distinct sub-samples from a population that are stored separately | Architecture | ||
Vertically Partitioned Data | A data distribution design that applies to structured data. In vertically partitioned data different columns from a common schema are located in distinct databases / devices. For example distinct features characterising a population and stored separately | Architecture | ||
Private Information Retrieval | PIR | A protocol that allows a user to retrieve an item from a database without revealing which item is retrieved. PIR is a weaker version of 1-out-of-n oblivious transfer, where it is also required that the user should not get information about other database items. Private information retrieval is a functionality for one client and one server. | Protocol | Wikipedia |
Oblivious Transfer | OT | A type of protocol in which a sender transfers one of potentially many pieces of information to a receiver, but remains oblivious as to what piece (if any) has been transferred | Protocol | Wikipedia |
Garbled Circuit | GC | A protocol that enables two-party secure computation in which two mistrusting parties can jointly evaluate a function over their private inputs without the presence of a trusted third party. | Protocol | Wikipedia |
Personally Identifiable Information | PII | Personal data, also known as personal information or personally identifiable information is any information relating to an identifiable person. It the subject of various regulations (e.g. HIPAA, GDPR) | Risk Factor | Wikipedia |
Federated Data Analysis | Also Federated Analysis, Federated Data Mining. A general term denoting the analysis of distributed datasets | Use Case | Wikipedia | |
Client/Server Architecture | An architecture for federated data analysis that shares models, model parameters or other statistical aggregated information rather individual data / information with a central server that is operated by a trusted third party | Architecture | Wikipedia | |
Decentralized Architecture | A decentralized architecture for federated data analysis does not require a central node (server) to collect
aggregate intermediary results from participating entities but rather exchanges information on a a peer-to-peer basis |
Architecture | Wikipedia | |
Zero-Knowledge Proof | ZKP | A protocol by which one party (the prover) can prove to another party (the verifier) that they know a value x, without conveying any information apart from the fact that they know the value x. | Protocol | Wikipedia |
Federated Data System | Denotes the overall foundation of shared technology architecture that enables federated data analysis. It extends beyond the specific federated database architecture and includes e.g. operational components such security, auditing, authentication and access rights. | Architecture | ||
Electronic Health Record | EHR | The systematic collection and storage of patient health information in a digital format. Records will include a variety of data formats. Federation of EHR constitutes one of major use cases of federated analysis. | Use Case | Wikipedia |
Local Differential Privacy | A model of differential privacy with the added requirement that even if an adversary has access to the personal responses of an individual in a database, that adversary will still be unable to learn too much about the user's personal data. Algorithms with differential privacy necessarily incorporate some amount of randomness or noise, which can be tuned to mask the influence of the user on the output. | Property | Wikipedia | |
Non-IID Challenge | The challenge that data samples available for federated analysis may not satisfy the non-independent and non-identically distributed (IDD) property that is a precondition for the validity of various algorithms and statistical analyses | Risk Factor | [1] | |
Trusted Execution Environment | TEE | A secure area of a main processor (Also Secure Enclave). It guarantees code and data loaded inside to be protected with respect to confidentiality and integrity. TEEs provide the ability to run code on a remote machine, even if not trusting the machine’s owner/administrator. This is achieved by limiting the capabilities of any party, including the administrator. | Architecture | Wikipedia |
Verifiable Computation | A property enabling one party to prove to another party that it has executed the desired behavior on its data faithfully, without compromising the potential secrecy of the data | Property | Wikipedia | |
Patient Similarity Learning | Patient similarity learning aims to develop computational algorithms for defining and locating clinically similar patients to a query patient under a specific clinical context | Use Case | [2] | |
Federated Stochastic Gradient Descent | FedSGD | Federated stochastic gradient descent is the direct transposition of the classic algorithm to the federated setting, by using a random fraction C of the nodes and using all the data on a given node. The gradients are averaged by the server proportionally to the number of training samples on each node, and used to make a gradient descent step | Algorithm | [3] |
Federated Averaging | FedAvg | Federated averaging (FedAvg) is a generalization of FedSGD, which allows local nodes to perform more than one batch update on local data and exchanges the updated weights rather than the gradients | Algorithm | [4] |
Federated Principal Component Analysis | FedPCA | Computing principal components on federated data | Algorithm | [5] |
See Also
Disclaimers
- This glossary is not in any way or form attempting to attribute priority for any of the methodologies / algorithms mentioned. Consult the academic literature.
References
- ↑ The Non-IID Data Quagmire of Decentralized Machine Learning, Kevin Hsieh, Amar Phanishayee, Onur Mutlu, Phillip B. Gibbons
- ↑ Privacy-Preserving Patient Similarity Learning in a Federated Environment: Development and Analysis Junghye Lee; Jimeng Sun; Fei Wang; Shuang Wang; Chi-Hyuck Jun; Xiaoqian Jiang
- ↑ Privacy Preserving Deep Learning, R. Shokri and V. Shmatikov
- ↑ Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. Communication-Efficient Learning of Deep Networks from Decentralized Data.
- ↑ Federated Principal Component Analysis Andreas Grammenos, Rodrigo Mendoza-Smith, Jon Crowcroft, Cecilia Mascolo