Missing Data Imputation

From Open Risk Manual

Definition

Missing Data Imputation is one of the steps of the Data Cleansing process that aims to remedy data sets that are incomplete (Missing Data). It is based on the substitution of estimated values for missing or inconsistent data items (fields). The imputed values aim to create a data record that does not fail the Data Integrity Validation process.

Methodologies

Depending on the cause and nature of the missing data problem, a number of techniques can be used for deriving missing values:

  • using mean values
  • matching observations
  • regressions
  • more advanced methods such as Bayesian analysis or simulation.

Issues and Challenges

  • If the causes behind missing data are correlated with the risk being modelled, a missing data imputation carries the risk of biasing any estimates