Garbage In Garbage Out

From Open Risk Manual

Definition

Garbage in garbage out (GIGO) in the context of Quantitative Risk Management refers to the fact that mathematical algorithms may process as Model Inputs flawed, even nonsensical data ("Garbage In") and as a consequence produce nonsensical, unusable Model Outputs outcomes ("Garbage Out").

Types of GIGO

Missing data or incorrect data formats

Different IT systems (databases, programming languages etc) strike different compromises between the need to allow flexible handling of data and the need to enforce a strict data type. The result may be:

  • Allowing for Missing Data when the downstream algorithms require valid data
  • Misinterpretation of values (e.g. assigning a numerical value to a string)

Erroneous data values

Erroneous data values are data values that are nominally valid but are nevertheless wrong entries, typically as a symptom of manual data processing. Infamous examples are Fat-finger errors

Inaccurate data values

Data Accuracy refers to the degree to which the available data represents the phenomenon that is being modelled. Accuracy may be difficult to establish. Some typical indicators of increased risk are

  • Stale data. For dynamic (evolving) data sets, Data Timeliness may be critical for accuracy.
  • Extensive use of Data Proxies due to lack of more relevant / representative data
  • Complex domain with many alternative indicators. For example financial reports contain hundreds of variables with varying degrees of suitability to any given question
  • Challenging modelling requirement. Some uses cases it may be intrinsically difficult to base any quantitative model on observed data. In this instance data accuracy overlaps with Model Risk

GIGO in Development versus GIGO in Production

The nature of the GIGO pathology becomes more specific when one considers the phases of Model Development and Model Usage (in production). The risk of GIGO may refer to either

  • The Model Estimation phase, where poor selection of datasets may lead to flawed model selection or model parameter estimation
  • During model use in production, where problematic model inputs may lead to flawed model outcomes (predictions)

XKCD

See Also