The Open Risk Data functionality of the Open Risk Manual is in still in active development!

Data Model

From Open Risk Data

This is a primer about the Open Risk Data model using a concrete example. For a more technical and general specification please check the underlying Wikibase Data Model.

Overview of the Open Risk Data model

The Open Risk Data knowledge base content can be summarized as a collection of Entities. Entities are thus the basic data elements of the knowledge base. There are two predefined kinds of Entities: Items and Properties.

Item Pages

Most pages in Open Risk Data describe one item. Items are the way Open Risk Data refers to anything in scope. Usually items are data points (or metadata) relevant for risk management. So for example in Open Risk Data we will have an Item for any concrete Risk Event recorded, such as A risk event involving Wonga. Hence both the abstract concept "Risk Event", and a concrete realization of a risk event are possible Items.

  • For people familiar with standard databases, an Item roughly corresponds to an identifiable record (or a line in a spreadsheet / CSV file). But in our case it is a record that is available online and following an established schema!
  • For people familiar with graph databases, items are the nodes of a graph, along with a node label and a node description
  • For people familiar with semantic data, items are the subjects of an RDF triple

Item Discussion Pages

Each Item Page has also a Discussion Page where any issues about an Item can be discussed.

Example: Discussing the "incorporate" property

Properties

Properties are special entities (also described in pages) that help construct Statements about items. For example, a relevant statement is: "Risk Event X has date YYYY-MM-DD" which is using the property has date.

  • Properties may associate basic datatypes to items (as in the date example) in which case they act as column cells of a spreadsheet.
  • Propoerties may also link two items. For example, the statement "Risk Event X Property:P6 | involves]] Entity Y" uses the is involved property to link an event to an entity. In graph terms, such a property expresses an edge between two nodes.

In summary, the core of the Open Risk Data model involves Items with an arbitrary set of Statements that are constructed using Properties, basic data and other Items

The Full Data Model Structure

The full description of Items and Properties is as follows (annotations can be multi-lingual - we ignore this here for simplicity).

  1. Item:
    1. Item identifier (a serial ID number, prefixed with Q). This is a unique ID for an Item in the context of the Open Risk Data instance. This is assigned automatically by the system when an item is first inserted into the database. While numerologists might assign special significance to e.g. Q42, we promote a world with less superstition :-)
    2. Fingerprint, consisting of:
      1. Multilingual label, a human readable label. This is a unique textual description of an Item.
      2. Multilingual description, a longer description of the item
      3. Multilingual aliases, other possible labels for an item
    3. Statements, associated with the item, each consisting of:
      1. Claim, consisting of:
        1. Property
        2. Value
        3. Qualifiers (additional property-value pairs)
      2. References (each consisting of one or more property-value pairs)
      3. Rank
    4. Site links
  2. Property
    1. Property identifier (number prefixed with P)
    2. Fingerprint, consisting of:
      1. label
      2. description
      3. aliases
    3. Statements, each consisting of:
      1. Claim, consisting of:
        1. Property
        2. Value
        3. Qualifiers (additional property-value pairs)
      2. References (each consisting of one or more property-value pairs)
      3. Rank
    4. Datatype

Going Deeper into Items

As part of their fingerprint, every item has a label (a name) and a description in each supported language which must be a unique combination. The description helps disambiguate concepts that express different information using the same label. In addition to labels, items can have aliases which provide alternative names for an item to be found. Aliases are meant to offer the user search convenience, much like redirects on Wikipedia, and thus even popular misspellings may be used as aliases.

Going Deeper into Statements

In line with a core Wikibase design choice, is that "Open Risk Data will not be about the truth, but about statements and their references." This means that in Open Risk Data we do not actually model the items themselves, but statements about them. We do not say that: Company X has went bankrupt in August 1975, we say: There is this a statement about Company X going bankrupt in August 1975 according to a reference to a certain Court record.

A statement may consist of

  • one property (in the example, "went bankruptcy")
  • one value (a date)
  • optionally one or more references (a Court record)

The property, value, and qualifiers together are also called the claim, which together with any source references forms a statement.

Properties are described on their pages. Properties also have labels and descriptions, and additionally to that they also have a data type associated with them and perhaps additional properties. The data type defines the type of the value used with this property. The set of properties is created and maintained by Open Risk to accomodate the requirements of available data sets.

Values themselves can be either very simple -- another item or just a string -- or quite complex, like a geographic shape, a measurement with a unit and an accuracy, or a time period. We will describe values in more detail in their own page in the future. The set of data types is (mostly) predefined.

References offer a source that supports the given claim. There can be several references given for a statement. We are still working on how to further structure a reference, but in general they will point to a source (which would be a Wikibase item in its own right, e.g. a book, a website, etc.) and have further information, like the page where the claim is supported, etc. A claim without references is not necessarily wrong, nor is a claim with references true. It is still up to the reader of the statement to decide if they want to trust the claim or not. We will describe references in more detail in their own page in the future.

Qualifiers

Qualifiers are used to further describe or refine the value of a property given in a statement. They consist of a property and a value, which are the same as for statements.

While it would be convenient if we could express all the data we need for our use cases with simple property-value pairs, this is unfortunately not the case. Many statements require further qualifiers in order to be expressed. In order to reduce the number of properties to a manageable size, qualifiers are used to further specify the statement in some way. The qualifier is an integral part of the statement: take away the qualifier, and the meaning of the statement is changed. This is far less true for the references.

Ranks

As there are potentially many different statements for a given item and property, we need to select which ones to return when the database gets asked. In order to facilitate this, three ranks of statements are introduced. There can be any number of statements in each rank, but within each rank, their order is not significant.

  • Preferred statements: if preferred statements exist, these statements are returned in response to a query.
  • Normal statements: if there are no preferred statements (or the query explicitly says to include normal statements too), these statements are returned.
  • Deprecated statements: for statements that are being discussed, or known to be erroneous, but still listed for the sake of completion or in order to prevent them being constantly added and removed.

Within Open Risk Data, the ranks are also used to make the display cleaner. Only the preferred statements are displayed by default, and the reader has to click on a link like "more values" in order to see the normal-ranked statements.

See also