Federated Learning Glossary

From Open Risk Manual

Federated Learning Glossary

A Glossary of Federated Learning terminology. The glossary covers a cross-section of terms that are relevant for privacy-preserving computation, spanning domains such as cryptography, database architectures and common statistical / machine learning algorithms. It does not aim to be exhaustive in any of those contributing domains.

Categories

For easier use of the glossary we classify terms according to context in which they arize. NB: The boundaries may not always be crystal clear:

  • Use Case is a general application domain where some type of federated analysis is used
  • Process is a required procedure
  • Risk Factor is any aspect that can compromise / invalidate the premise of federated analysis (not necessarily malicious)
  • Property is any rigorously defined aspect of a federated system that measures or guarantees e.g. privacy or security features
  • Algorithm in this context are federated algorithms for data analysis, NOT the low level computation primitives (see Protocol)
  • Protocol is any concretely defined pattern of information exchange (that is specifically useful in a federated context)
  • Architecture is a scheme or organizational pattern of organizations, computational or storage devices etc. that defines a particular type of federation
  • Agent is any entity within an overal architecture

Glossary

Term Acronym Meaning and Context Category Links / References
Federated Learning FL A machine learning paradigm that trains an algorithm across multiple devices or servers holding local data samples, without exchanging them. A centralised model is trained by locally computing updates and merging them to the centralised model without sharing data. Use Case Wikipedia
Privacy-Preserving Computation PPC Any general IT architecture that allows performing computations on networked computing devices while preserving some aspect of Data Privacy Use Case
Privacy-Preserving Data Mining PPDM The extraction of relevant knowledge from large amount of data (Big Data), while protecting at the same time sensitive information. Use Case
Secure Multi-Party Computation MPC A subfield of cryptography with the goal of creating methods for parties to jointly compute a function over their inputs while keeping those inputs private. SMC guarantees that none of the parties share anything with

each other or with any third party, it can not prevent an adversary from learning some individual information

Protocol Wikipedia
Federated Database System FDBS A type of meta-database management system (DBMS), which transparently maps multiple autonomous database systems into a single federated database Architecture Wikipedia
Differential Privacy ε-differential privacy. A mathematical definition for the privacy loss associated with any data release drawn from a statistical database. It measures, e.g., to what extent the parameters or predictions of a model reveal information about any individual points in the training dataset. It ensures that the addition or removal does not substantially affect the outcome of any analysis. Property Wikipedia
Data Federation Also Data Sharing. It is the general process of aggregating / sharing data that exist in distributed data sources Process Wikipedia
k-anonymity A release of data is said to have the k-anonymity property if the information for each person contained in the release cannot be distinguished from at least k − 1 individuals whose information also appear in the release Property Wikipedia
Data Anonymization The process of removing personally identifiable information from data sets, so that the people whom the data describe remain anonymous Process Wikipedia
Data Re-Identification Also de-anonymization. Is the risk rising from the possibility of matching anonymous data with publicly available information, or auxiliary data, to discover information that was deemed private Risk Factor Wikipedia
Homomorphic Encryption HE A form of encryption allowing one to perform calculations on encrypted data without decrypting it first. Homomorphic encryption is a public key system, where any party can encrypt its data with a known public key and perform calculations with data encrypted by others with the same public key. Arbitrarily complicated functions of the data can be computed this way (“Fully Homomorphic Encryption”) though at greater computational cost. Protocol Wikipedia
Private Set Intersection A secure multiparty computation cryptographic technique (MPC) that allows two parties holding sets to compare encrypted versions of these sets in order to compute the intersection. Neither party reveals anything to the counterparty except for the elements in the intersection. Protocol Wikipedia
Alice and Bob Alice and Bob are fictional characters commonly used as placeholders in discussions about cryptographic protocols or systems. They typically represent agents possessing (or seeking) private information Agent Wikipedia
Trusted Third Party TTP An entity which facilitates interactions between two parties who both trust the third party. Whether a TTP exists or not has major design implications for privacy-preserving computations. Agent Wikipedia
Horizontally Partitioned Data A data distribution design that applies to structured data. In horizontally partitioned data different rows from a common schema are located in distinct databases / devices. For example distinct sub-samples from a population that are stored separately Architecture
Vertically Partitioned Data A data distribution design that applies to structured data. In vertically partitioned data different columns from a common schema are located in distinct databases / devices. For example distinct features characterising a population and stored separately Architecture
Private Information Retrieval PIR A protocol that allows a user to retrieve an item from a database without revealing which item is retrieved. PIR is a weaker version of 1-out-of-n oblivious transfer, where it is also required that the user should not get information about other database items. Private information retrieval is a functionality for one client and one server. Protocol Wikipedia
Oblivious Transfer OT A type of protocol in which a sender transfers one of potentially many pieces of information to a receiver, but remains oblivious as to what piece (if any) has been transferred Protocol Wikipedia
Garbled Circuit GC A protocol that enables two-party secure computation in which two mistrusting parties can jointly evaluate a function over their private inputs without the presence of a trusted third party. Protocol Wikipedia
Personally Identifiable Information PII Personal data, also known as personal information or personally identifiable information is any information relating to an identifiable person. It the subject of various regulations (e.g. HIPAA, GDPR) Risk Factor Wikipedia
Federated Data Analysis Also Federated Analysis, Federated Data Mining. A general term denoting the analysis of distributed datasets Use Case Wikipedia
Client/Server Architecture An architecture for federated data analysis that shares models, model parameters or other statistical aggregated information rather individual data / information with a central server that is operated by a trusted third party Architecture Wikipedia
Decentralized Architecture A decentralized architecture for federated data analysis does not require a central node (server) to collect

aggregate intermediary results from participating entities but rather exchanges information on a a peer-to-peer basis

Architecture Wikipedia
Zero-Knowledge Proof ZKP A protocol by which one party (the prover) can prove to another party (the verifier) that they know a value x, without conveying any information apart from the fact that they know the value x. Wikipedia
Federated Data System Denotes the overall foundation of shared technology architecture that enables federated data analysis. It extends beyond the specific federated database architecture and includes e.g. operational components such security, auditing, authentication and access rights. Architecture
Electronic Health Record EHR The systematic collection and storage of patient health information in a digital format. Records will include a variety of data formats. Federation of EHR constitutes one of major use cases of federated analysis. Use Case Wikipedia
Local Differential Privacy A model of differential privacy with the added requirement that even if an adversary has access to the personal responses of an individual in a database, that adversary will still be unable to learn too much about the user's personal data. Algorithms with differential privacy necessarily incorporate some amount of randomness or noise, which can be tuned to mask the influence of the user on the output. Property Wikipedia
Non-IID Challenge The challenge that data samples available for federated analysis may not satisfy the non-independent and non-identically distributed (IDD) property that is a precondition for the validity of various algorithms and statistical analyses Risk Factor [1]
Trusted Execution Environment TEE A secure area of a main processor (Also Secure Enclave). It guarantees code and data loaded inside to be protected with respect to confidentiality and integrity. TEEs provide the ability to run code on a remote machine, even if not trusting the machine’s owner/administrator. This is achieved by limiting the capabilities of any party, including the administrator. Architecture Wikipedia
Verifiable Computation A property enabling one party to prove to another party that it has executed the desired behavior on its data faithfully, without compromising the potential secrecy of the data Property Wikipedia
Patient Similarity Learning Patient similarity learning aims to develop computational algorithms for defining and locating clinically similar patients to a query patient under a specific clinical context Use Case [2]
Federated Stochastic Gradient Descent FedSGD Federated stochastic gradient descent is the direct transposition of the classic algorithm to the federated setting, by using a random fraction C of the nodes and using all the data on a given node. The gradients are averaged by the server proportionally to the number of training samples on each node, and used to make a gradient descent step Algorithm [3]
Federated Averaging FedAvg Federated averaging (FedAvg) is a generalization of FedSGD, which allows local nodes to perform more than one batch update on local data and exchanges the updated weights rather than the gradients Algorithm [4]
Federated Principal Component Analysis FedPCA Computing principal components on federated data Algorithm [5]

See Also

Disclaimers

  • This glossary is not in any way or form attempting to attribute priority for any of the methodologies / algorithms mentioned. Consult the academic literature.

References

  1. The Non-IID Data Quagmire of Decentralized Machine Learning, Kevin Hsieh, Amar Phanishayee, Onur Mutlu, Phillip B. Gibbons
  2. Privacy-Preserving Patient Similarity Learning in a Federated Environment: Development and Analysis Junghye Lee; Jimeng Sun; Fei Wang; Shuang Wang; Chi-Hyuck Jun; Xiaoqian Jiang
  3. Privacy Preserving Deep Learning, R. Shokri and V. Shmatikov
  4. Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. Communication-Efficient Learning of Deep Networks from Decentralized Data.
  5. Federated Principal Component Analysis Andreas Grammenos, Rodrigo Mendoza-Smith, Jon Crowcroft, Cecilia Mascolo

Contributors to this article

» Wiki admin