Difference between revisions of "Missing Data Imputation"

From Open Risk Manual
 
 
(One intermediate revision by the same user not shown)
Line 1: Line 1:
 
== Definition ==  
 
== Definition ==  
'''Missing Data Imputation''' is one of the steps of the [[Data Cleansing]] process that aims to remedy data sets that are incomplete ([[Missing Data]]). It is based on the substitution of estimated values for missing or inconsistent data items (fields). The imputed values aim to create a data record that does not fail the [[Data Integrity Validation]] process.
+
'''Missing Data Imputation''' is one of the steps of the [[Data Cleansing]] process that aims to remedy data sets that are incomplete ([[Missing Data]]). It is based on the substitution of [[Estimated Data | estimated values]] for missing or inconsistent data items (fields). The imputed values aim to create a data record that does not fail the [[Data Integrity Validation]] process.
  
 
== Methodologies ==
 
== Methodologies ==
Line 7: Line 7:
 
* matching observations
 
* matching observations
 
* regressions  
 
* regressions  
* more advanced methods such as Bayesian analysis or simulation.
+
* more advanced methods such as Bayesian analysis or [[Monte-Carlo Simulation]].
  
 
== Issues and Challenges ==
 
== Issues and Challenges ==

Latest revision as of 20:51, 21 November 2022

Definition

Missing Data Imputation is one of the steps of the Data Cleansing process that aims to remedy data sets that are incomplete (Missing Data). It is based on the substitution of estimated values for missing or inconsistent data items (fields). The imputed values aim to create a data record that does not fail the Data Integrity Validation process.

Methodologies

Depending on the cause and nature of the missing data problem, a number of techniques can be used for deriving missing values:

  • using mean values
  • matching observations
  • regressions
  • more advanced methods such as Bayesian analysis or Monte-Carlo Simulation.

Issues and Challenges

  • If the causes behind missing data are correlated with the risk being modelled, a missing data imputation carries the risk of biasing any estimates