# Difference between revisions of "Data Outliers"

From Open Risk Manual

Wiki admin (talk | contribs) |
Wiki admin (talk | contribs) |
||

Line 4: | Line 4: | ||

== Examples == | == Examples == | ||

* [[Numerical Variable | Numerical values]] that indicate a measurement that does not seem to fit the bulk of the [[Distribution]] | * [[Numerical Variable | Numerical values]] that indicate a measurement that does not seem to fit the bulk of the [[Distribution]] | ||

− | + | * [[Categorical Variable | Categorical values]] that are highly unusual, indicating potential erroneous records. Given that all categorical values are (presumably) controlled within a range of allowed values this phenomenon is most likely to occur in the combined realizations of several variables. | |

== Issues and Challenges == | == Issues and Challenges == |

## Latest revision as of 11:05, 3 September 2019

## Definition

**Data Outliers** refers to data values within a given data set that are apparent statistical anomalies. Such anomalies can be understood both in a univariate and multivariate sense.

## Examples

- Numerical values that indicate a measurement that does not seem to fit the bulk of the Distribution
- Categorical values that are highly unusual, indicating potential erroneous records. Given that all categorical values are (presumably) controlled within a range of allowed values this phenomenon is most likely to occur in the combined realizations of several variables.

## Issues and Challenges

- Distinguishing erroneous outliers from genuine phenomena (that happen to produce unusual values) is not always easy. It is conceivable and not unusual that there are several overlapping underlying processes contributing to measurements, giving for example rise to Tail Risk. Tail risk processes that are not sufficiently sampled may look like outliers.