Risk Data Standards

From Open Risk Manual

Definition

Risk Data Standards are the documented (public or private) agreements on the standardized representation of Risk Data. Such standards constitute a form of assurance that the set of information supporting data-based Risk Analysis is fit-for-purpose.

Components

The concept of risk data standards is broader than that of Data Quality Standards. It is part of both the Data Governance and Risk Management function of an organization.


The specification of risk data standards may include elements such as:

  • Risk data definitions. This concerns both human oriented documentation (Data Dictionary) and/or a Risk Data Schema oriented towards computer systems / databases
  • Risk data types, the classification of Risk Data in categories that share structural similarities such as dimensionality, data models and data types.
  • Risk data formats, the specification of the formats in which data is stored or exchanged
  • Risk data structuring processes, namely the transformation of unstructured forms of data (text, audio, video etc) into more structured data
  • Risk data tagging, adding collection of related fields (tags) that represent a vocabulary for classifying data assets
  • Risk data transmission protocols over digital networks
  • Risk data processing APIs, application programing interfaces that enable the exchange of risk data


Regulatory Context

The regulatory focus on risk data (post financial crisis) is best summarized in BCBS239, "Principles for effective risk data aggregation and risk reporting, January 2013". As implied by the title, this is a principles document, not a specification or standards document. Early 2016 was the deadline for Global Systemically Important Banks (GSIBs) to comply with the ‘Principles for effective risk data aggregation and risk reporting’

Current versus Historical Risk Data

There is a natural decomposition of risk data (and hence the corresponding standards) in the following two broad categories:

  • Current State Representation Data (such as current portfolio data, financial position data etc.), i.e., the set of indicators and variables that define (or at least represent to some approximation) the object or entity that is the focus of the Risk Assessment framework as of now. Such datasets will typically be the most relevant, actionable (and possibly most sensitive).
  • Historical Risk Data (such economic indicators, Market Data, credit or operational loss data capturing past risk events), i.e. sets of objective empirical evidence that can be used to inform a risk assessment framework. Historical data sets may also contain previous state information (e.g., historical balance sheets)

Standards applicable to Historical Data

Vendor Historical Data Formats and Standards

A large segment of historical risk data comes in the form of market data (prices, trading volumes etc.) This segment is typically served by vendors who collect data from exchanges / trading platforms and process / provide such data under proprietary formats and API's. Such vendor data would typically be the basis for creating market / trading risk frameworks (VaR systems, CVA/PFE systems etc). Vendor formats are generally proprietary and are not reviewed further here.

OpenMAMA is an open source initiative for standardizing an API for market data exchange. OpenMAMA means the "Open Middleware Agnostic Messaging API", is a lightweight vendor-neutral integration layer for systems built on top of a variety of message orientated middlewares.

Statistical Data Formats

Data standards pertaining to statistical data exist for the various domains where statistical analysis (in the broadest sense) plays a key role. Non exhaustive list:

  • SDMX: SDMX is an initiative to foster standards for the exchange of statistical information. It started in 2001 and aims at fostering standards for Statistical Data and Metadata eXchange. Focused on economic data. Sponsored by ECB, EuroStat, BIS, IMF, OECD, World Bank and more). There is a JSON based protocol, but not all SDMX data providers are supporting it (end 2014). The ECB Statistical Warehouse is only supporting XML, but some effort seems to be underway to also support JSON
  • NCES Education Oriented. Sponsored by the US National Center for Education Statistics (NCES) - No global standard
  • Health Data No global standard

Industry Risk Data Initiatives

Two industry initiatives collecting historical risk data employ (de-facto) data standards / formats specific to these initiatives

  • ORX The Operational Riskdata eXchange Association (ORX) is a not-for-profit industry association dedicated to advancing the measurement and management of operational risk in the global financial services industry.
  • PECDC PECDC has been created by its Member-banks to provide them with a collection of historical loss data, analysis and research resource, due to contribute to a better understanding of credit risk

State Representation Standards

Vendor Reference Data

Reference data is the current name used for state representation for a wide variety of entities:

  • Securities (equity issues, bonds)
  • Corporate entities


The vast majority of reference data are corrected and provided by vendors under their (de-facto) standards and formats (not further reviewed here)

Financial Product Descriptions

The only extant standard for describing financial products seems to be FPML (Financial products Markup Language) is the open source XML standard for electronic dealing and processing of OTC derivatives. It establishes a protocol for sharing information electronically on, and dealing in swaps, derivatives and structured products. Sponsored by ISDA (industry association).

Financial Reporting Standards

There is increasingly a drive to digitize financial reporting across businesses.The main effort in this direction is XBRL, a global, not-for-profit consortium working through the XBRL Standard to improve business reporting for the public good. Member supported organization, with dedicated volunteers from all over the world offering their time and expertise to develop specifications to support the collection, sharing and use of structured data for data reporting and analysis.

Regulatory Reporting Data Formats

The intensified data flows between firms and regulators after the financial crisis necessitated the more detailed specification of reporting formats with a prominent example provided by the EBA Data Point Model (Based on XBRL)

Central Bank Collateral Standards

Direct Central Bank funding of securitisation pools has precipitated the need for more specific requirements. Prominent example are the ECB Loan Level Data Requirements

File Formats

Risk Data are frequently transmitted as files. There is large variety of file formats. Some common formats that are currently relevant (in parenthesis data standards that support the format)

  • XML (XBRL, OData, SDMX)
  • JSON (Javascript Object Notation)
  • YAML (Yet Another Markeup Language)
  • CSV (Comma Separated Values)
  • Spreadsheet (xlsx, ods)
  • PDF (Portable Data Format)


The choice of file format to support a risk data standard is not trivial because modern formats (e.g., XML, JSON) require dedicated parsers and might be unusable without them.

JSON is popular for implementing standards (especially if lightweight) as it retains a certain amount of human readability (avoids the verbosity of XML) and can be readily consumed in a web browser (is is essentially a javascript object definition)

Large File Formats

Common file formats may not be able to support very large data volumes (in terms of efficiency of storage, mechanisms to transmit large files, ability for readers to open large files etc.) There is a variety of large file formats available, typically developed for scientific applications that produce very large amounts of data

  • CDF (Common Data Format)
  • HDF (Hierarchical Data Format)
  • NetCDF (Network Common Data Form)

Internet Protocols & API's

Data Communication Protocols

HTTP(S) is the low level protocol for web based communication (compare with FTP etc). It is also a building block for higher level abstractions.

Possible generic building frameworks for building more specialized abstractions useful for risk data exchange can be based on either:

  • Resource Description Framework RDF RDF is a standard model for data interchange on the Web
  • Open Data Protocol OData, an OASIS standard for building and consuming RESTful APIs.


Note: RDF is a W3C sponsored framework, whereas OData is essentially a Microsoft sponsored framework (Azure). The currently most developed relevant data format (SDMX) has not relation with either.

API Designs

  • REST, an abstraction of the architecture of the World Wide Web
  • SOAP

Issues and Challenges

  • There are currently no comprehensive risk data standards. This articles collects and summarizes connected information to facilitate the development of such standards