Credit Scorecard Development
Contents
Definition
Credit Scorecard Development is the quantitative process of developing a Credit Scorecard.
The specifics of the scorecard development process depend on the type of scorecard. In general there is a requirement to have in place an number of IT tools:
Information Technology Prerequisites
- A framework for working with data (import and safe storage of data, export)
- A machine learning estimation framework (if applicable). This can be achieved using either a commercial or open source toolkit (library, computational system). Judgemental scorecards that are mostly expert opinion based may also need some form of implementation (e.g. spreadsheets) and formal monitoring of use.
Development Activities
The following is a list of development activities that will generally be required for most common types of scorecard development. The can be organized around two pillars:
- the practical side that revolves around the procurement and processing of data and which we might term the Data Engineering component and
- the conceptual side, which focuses on model development and which we might term the Data Science component
Data Engineering
The steps in this sequence aim to provide suitable resources and tools for the development of the required scorecard
- Data Collection. This step helps establishing links with existing databases or files. Depending on the available systems it involves writing and testing queries and filters and importing data
- Data Cleaning. This steps involves reviewing and establishing the Data Quality of the collected data.
- Missing Data. In this step (where appropriate) Missing Data may be remedied with Missing Data Imputation
- Creating a Master Data Table. This table of potential characteristics and outcomes (see Credit Event) is the basic input to the quantitative estimation using common statistical models
The above steps are not necessarily sequential nor do they strictly precede the data science component (for example after pursuing a certain modelling approach it may transpire that there are additional data requirements)
Data Science
The conceptual development aims to identify, fully specify and fit a specific model to underpin the scorecard. There may be legal, regulatory or business (cost) limitations to the available modelling options. The relevant concepts for quantitative (statistical development) are:
- The Historical Sample Selection (the relevant population, temporal period, any exclusions). Both in achieving Model Stability and in regulatory context the Representativeness of the data is particularly important.
- The Portfolio Segmentation. It is possible that the scorecard will be applied on distinct sub-segments of the relevant population
- The precise Credit Event definition. It is essentially what the scorecard will aims to predict (infer) when deployed in production. It may have implications for data availability. E.g relaxing the Default Definition may significantly increase the observed event rate in the historical record.
- The Identification of Characteristics (features, attributes) to include in the model, a stage sometimes denoted Exploratory Data Analysis. There is an enormous variety of possible characteristics depending of the type of credit risk being evaluated:
- Numerical Variable (either integer or fractional numbers) versus Categorical Variable (boolean or choice list)
- Static (non-varying) versus Dynamic variables (potentially changing over time)
- Addressing different aspects of the Five Cs Of Credit Analysis
- Characteristic (Feature) Selection. The narrowing down the list of characteristics (potential Risk Factor), e.g. using Backward Selection or some other well defined procedure
- Transformation Methodologies. Investigating the application of non-linear transformations to characteristics (wikipedia:Feature Engineering)
- The selection of a model family (e.g logistic regression or any of the large catalog of alternative Credit Scoring Models)
- Performing the actual statistical fit, usually by running a statistical algorithm such as maximum likelihood
- Performing and reviewing model estimation outcomes ( model accuracy, out-of-sample performance etc.)