Difference between revisions of "Missing Data Imputation"

Latest revision as of 20:51, 21 November 2022

Definition

Missing Data Imputation is one of the steps of the Data Cleansing process that aims to remedy data sets that are incomplete (Missing Data). It is based on the substitution of estimated values for missing or inconsistent data items (fields). The imputed values aim to create a data record that does not fail the Data Integrity Validation process.

Methodologies

Depending on the cause and nature of the missing data problem, a number of techniques can be used for deriving missing values:

using mean values
matching observations
regressions
more advanced methods such as Bayesian analysis or Monte-Carlo Simulation.

Issues and Challenges

If the causes behind missing data are correlated with the risk being modelled, a missing data imputation carries the risk of biasing any estimates

@@ Line 1: / Line 1: @@
 == Definition ==
-'''Missing Data Imputation''' is one of the steps of the [[Data Cleansing]] process that aims to remedy data sets that are incomplete ([[Missing Data]]). It is based on the substitution of estimated values for missing or inconsistent data items (fields). The imputed values aim to create a data record that does not fail the [[Data Integrity Validation]] process.
+'''Missing Data Imputation''' is one of the steps of the [[Data Cleansing]] process that aims to remedy data sets that are incomplete ([[Missing Data]]). It is based on the substitution of [[Estimated Data | estimated values]] for missing or inconsistent data items (fields). The imputed values aim to create a data record that does not fail the [[Data Integrity Validation]] process.
 == Methodologies ==
@@ Line 7: / Line 7: @@
 * matching observations
 * regressions
-* more advanced methods such as Bayesian analysis or simulation.
+* more advanced methods such as Bayesian analysis or [[Monte-Carlo Simulation]].
 == Issues and Challenges ==