Federated Learning Glossary

A Glossary of Federated Learning terminology. The glossary covers a cross-section of terms that are relevant for privacy-preserving computation, spanning domains such as cryptography, database architectures and common statistical / machine learning algorithms. It does not aim to be exhaustive in any of those contributing domains.

Glossary

Term	Acronym	Meaning and Context	Category	Links / References
Federated Learning	FL	A machine learning paradigm that trains an algorithm across multiple devices or servers holding local data samples, without exchanging them. A centralised model is trained by locally computing updates and merging them to the centralised model without sharing data.	Use Case	Wikipedia
Privacy by Design		Privacy by design aims at building privacy and Data Protection up front, into the design specifications and architecture of information and communication systems and technologies, in order to facilitate compliance with Data Privacy and data protection principles	Architecture
Privacy Preserving Technology	PET	A coherent system of information and communication technology (ICT) measures that protect privacy by eliminating or reducing personal data or by preventing unnecessary and/or undesired processing of personal data, all without losing the functionality of the information system	Architecture
Privacy-Preserving Computation	PPC	Any general IT architecture that allows performing computations on networked computing devices while preserving some aspect of Data Privacy	Use Case
Privacy-Preserving Data Mining	PPDM	The extraction of relevant knowledge from large amount of data (Big Data), while protecting at the same time sensitive information.	Use Case
Secure Multi-Party Computation	MPC	A subfield of cryptography with the goal of creating methods for parties to jointly compute a function over their inputs while keeping those inputs private. SMC guarantees that none of the parties share anything with each other or with any third party, it can not prevent an adversary from learning some individual information	Protocol	Wikipedia
Federated Database System	FDBS	A type of meta-database management system (DBMS), which transparently maps multiple autonomous database systems into a single federated database	Architecture	Wikipedia
Differential Privacy		ε-differential privacy. A mathematical definition for the privacy loss associated with any data release drawn from a statistical database. It measures, e.g., to what extent the parameters or predictions of a model reveal information about any individual points in the training dataset. It ensures that the addition or removal does not substantially affect the outcome of any analysis.	Property	Wikipedia
Data Federation		Also Data Sharing. It is the general process of aggregating / sharing data that exist in distributed data sources	Process	Wikipedia
k-anonymity		A release of data is said to have the k-anonymity property if the information for each person contained in the release cannot be distinguished from at least k − 1 individuals whose information also appear in the release	Property	Wikipedia
Data Anonymization		The process of removing personally identifiable information from data sets, so that the people whom the data describe remain anonymous	Process	Wikipedia
Data Re-Identification		Also de-anonymization. Is the risk rising from the possibility of matching anonymous data with publicly available information, or auxiliary data, to discover information that was deemed private	Risk Factor	Wikipedia
Homomorphic Encryption	HE	A form of encryption allowing one to perform calculations on encrypted data without decrypting it first. Homomorphic encryption is a public key system, where any party can encrypt its data with a known public key and perform calculations with data encrypted by others with the same public key. Arbitrarily complicated functions of the data can be computed this way (“Fully Homomorphic Encryption”) though at greater computational cost.	Protocol	Wikipedia
Private Set Intersection		A secure multiparty computation cryptographic technique (MPC) that allows two parties holding sets to compare encrypted versions of these sets in order to compute the intersection. Neither party reveals anything to the counterparty except for the elements in the intersection.	Protocol	Wikipedia
Alice and Bob		Alice and Bob are fictional characters commonly used as placeholders in discussions about cryptographic protocols or systems. They typically represent agents possessing (or seeking) private information	Agent	Wikipedia
Trusted Third Party	TTP	An entity which facilitates interactions between two parties who both trust the third party. Whether a TTP exists or not has major design implications for privacy-preserving computations.	Agent	Wikipedia
Horizontally Partitioned Data		A data distribution design that applies to structured data. In horizontally partitioned data different rows from a common schema are located in distinct databases / devices. For example distinct sub-samples from a population that are stored separately	Architecture
Vertically Partitioned Data		A data distribution design that applies to structured data. In vertically partitioned data different columns from a common schema are located in distinct databases / devices. For example distinct features characterising a population and stored separately	Architecture
Private Information Retrieval	PIR	A protocol that allows a user to retrieve an item from a database without revealing which item is retrieved. PIR is a weaker version of 1-out-of-n oblivious transfer, where it is also required that the user should not get information about other database items. Private information retrieval is a functionality for one client and one server.	Protocol	Wikipedia
Oblivious Transfer	OT	A type of protocol in which a sender transfers one of potentially many pieces of information to a receiver, but remains oblivious as to what piece (if any) has been transferred	Protocol	Wikipedia
Garbled Circuit	GC	A protocol that enables two-party secure computation in which two mistrusting parties can jointly evaluate a function over their private inputs without the presence of a trusted third party.	Protocol	Wikipedia
Personally Identifiable Information	PII	Personal data, also known as personal information or personally identifiable information is any information relating to an identifiable person. It the subject of various regulations (e.g. HIPAA, GDPR)	Risk Factor	Wikipedia
Federated Data Analysis		Also Federated Analysis, Federated Data Mining. A general term denoting the analysis of distributed datasets	Use Case	Wikipedia
Client/Server Architecture		An architecture for federated data analysis that shares models, model parameters or other statistical aggregated information rather individual data / information with a central server that is operated by a trusted third party	Architecture	Wikipedia
Decentralized Architecture		A decentralized architecture for federated data analysis does not require a central node (server) to collect aggregate intermediary results from participating entities but rather exchanges information on a a peer-to-peer basis	Architecture	Wikipedia
Zero-Knowledge Proof	ZKP	A protocol by which one party (the prover) can prove to another party (the verifier) that they know a value x, without conveying any information apart from the fact that they know the value x.	Protocol	Wikipedia
Federated Data System		Denotes the overall foundation of shared technology architecture that enables federated data analysis. It extends beyond the specific federated database architecture and includes e.g. operational components such security, auditing, authentication and access rights.	Architecture
Electronic Health Record	EHR	The systematic collection and storage of patient health information in a digital format. Records will include a variety of data formats. Federation of EHR constitutes one of major use cases of federated analysis.	Use Case	Wikipedia
Local Differential Privacy		A model of differential privacy with the added requirement that even if an adversary has access to the personal responses of an individual in a database, that adversary will still be unable to learn too much about the user's personal data. Algorithms with differential privacy necessarily incorporate some amount of randomness or noise, which can be tuned to mask the influence of the user on the output.	Property	Wikipedia
Non-IID Challenge		The challenge that data samples available for federated analysis may not satisfy the non-independent and non-identically distributed (IDD) property that is a precondition for the validity of various algorithms and statistical analyses	Risk Factor	^[1]
Trusted Execution Environment	TEE	A secure area of a main processor (Also Secure Enclave). It guarantees code and data loaded inside to be protected with respect to confidentiality and integrity. TEEs provide the ability to run code on a remote machine, even if not trusting the machine’s owner/administrator. This is achieved by limiting the capabilities of any party, including the administrator.	Architecture	Wikipedia
Verifiable Computation		A property enabling one party to prove to another party that it has executed the desired behavior on its data faithfully, without compromising the potential secrecy of the data	Property	Wikipedia
Patient Similarity Learning		Patient similarity learning aims to develop computational algorithms for defining and locating clinically similar patients to a query patient under a specific clinical context	Use Case	^[2]
Federated Stochastic Gradient Descent	FedSGD	Federated stochastic gradient descent is the direct transposition of the classic algorithm to the federated setting, by using a random fraction C of the nodes and using all the data on a given node. The gradients are averaged by the server proportionally to the number of training samples on each node, and used to make a gradient descent step	Algorithm	^[3]
Federated Averaging	FedAvg	Federated averaging (FedAvg) is a generalization of FedSGD, which allows local nodes to perform more than one batch update on local data and exchanges the updated weights rather than the gradients	Algorithm	^[4]
Federated Principal Component Analysis	FedPCA	Computing principal components on federated data	Algorithm	^[5]

Disclaimers

This glossary is not in any way or form attempting to attribute priority for any of the methodologies / algorithms mentioned. Consult the academic literature.

References

↑ The Non-IID Data Quagmire of Decentralized Machine Learning, Kevin Hsieh, Amar Phanishayee, Onur Mutlu, Phillip B. Gibbons
↑ Privacy-Preserving Patient Similarity Learning in a Federated Environment: Development and Analysis Junghye Lee; Jimeng Sun; Fei Wang; Shuang Wang; Chi-Hyuck Jun; Xiaoqian Jiang
↑ Privacy Preserving Deep Learning, R. Shokri and V. Shmatikov
↑ Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. Communication-Efficient Learning of Deep Networks from Decentralized Data.
↑ Federated Principal Component Analysis Andreas Grammenos, Rodrigo Mendoza-Smith, Jon Crowcroft, Cecilia Mascolo

[1] The Non-IID Data Quagmire of Decentralized Machine Learning, Kevin Hsieh, Amar Phanishayee, Onur Mutlu, Phillip B. Gibbons

[2] Privacy-Preserving Patient Similarity Learning in a Federated Environment: Development and Analysis Junghye Lee; Jimeng Sun; Fei Wang; Shuang Wang; Chi-Hyuck Jun; Xiaoqian Jiang

[3] Privacy Preserving Deep Learning, R. Shokri and V. Shmatikov

[4] Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. Communication-Efficient Learning of Deep Networks from Decentralized Data.

[5] Federated Principal Component Analysis Andreas Grammenos, Rodrigo Mendoza-Smith, Jon Crowcroft, Cecilia Mascolo

[1]

[2]

[3]

[4]

[5]

Federated Learning Glossary

Contents

Federated Learning Glossary

Categories

Glossary

See Also

Disclaimers

References