Data Outliers

From Open Risk Manual

Definition

Data Outliers refer to data values within a given data set that are apparent statistical anomalies. Such anomalies can be understood both in a univariate and multivariate sense.

Examples

  • Numerical values that indicate a measurement that does not seem to fit the bulk of the Distribution
  • Categorical values that are highly unusual, indicating potential erroneous records. Given that all categorical values are (presumably) controlled within a range of allowed values this phenomenon is most likely to occur in the combined realizations of several variables.

Issues and Challenges

  • Distinguishing erroneous outliers from genuine phenomena (that happen to produce unusual values) is not always easy. It is conceivable (and not unusual) that there are several overlapping underlying processes contributing to measurements, giving for example rise to Tail Risk. Tail risk processes that are not sufficiently sampled may look like outliers.

References