Stratified Sampling

From Open Risk Manual

Approach To Selecting A Stratified Sample

The AQR Manual approach to selecting a portfolio sample consists of five steps. These steps are not necessarily consecutive.

Step 1. Define Perimeter Of Selectable Debtors

Some parts of each portfolio will be excluded from sampling (and therefore projection of findings). The exclusions are:

  • Retail exposures other than retail mortgages (i.e. retail SMEs and retail others). These exposures will be reviewed through the collective provisioning review (see AQR Manual Section on the collective provisioning review) (Also retail mortgages shall be assessed through the collective provisioning review; however critical inputs for the calibration of the collective provisioning parameters shall be sourced through the review of files and collaterals. ;
  • Portfolios that have not been selected for Phase 2;
  • Individual debtors from selected portfolios that are externally rated and this rating is better than an ECAI Credit Quality Step 4, as defined in the loan tape descriptive Excel –The risk of material misstatements is negligible;
  • Corporates with both Debt/EBITDA < 1 and Equity/Assets > 50% based on audited accounts that are less than 12 months old;
  • Debtors that have been 95% provisioned or more.


1.1 Calculation Approach

Loan tape data is provided in three different views: debtor view, facility view and collateral view; as described in Section 0. This subsection outlines how these three views have to be combined to prepare the sampling dataset, which is defined at the debtor level and aggregates up past due and LTV. For the avoidance of doubt, each debtor represents one line in the sampling database, except for retail exposures in which each facility represents one line in the sampling database.

The first task is to prepare the sampling dataset, which contains the fields described in the following Table for each debtor (or facility for RRE). As the loan tape for RRE is collected at the facility level, throughout the description of the sampling process in this Chapter, “debtor” should be read as “facility” for RRE.

The third task is to exclude from the collated dataset the portfolios and debtors that are not subject to credit file review:

  • Portfolio is not among the portfolios selected during Phase 1;
  • Portfolio = Retail SME;
  • Portfolio = Other retail;
  • CQS better than 4;
  • Both Debt/EBITDA < 1 and Equity/Assets > 50%;
  • Provisions > 95% of Debtor exposure.


The general convention about how to treat missing values applies to this dataset: “not applicable” will be designated as “N/A” for text and “11111111111” for numeric fields; whereas “missing information” will be designated as “MISS” for text and “99999999999” for numeric fields.

Step 2. Stratify Portfolio

Every portfolio will be split into strata. This stratification enables a manageable sample size, while maintaining high standards of accuracy and representativeness of the sample. Stratification will be based upon the criteria of exposure size and riskiness. Figure 5 below illustrates how each portfolio is divided into strata and how the stratified sample is selected. Matrix numbers represent the percentage of observations selected from each bucket, from an example large corporate portfolio.

2.1. Stratify By Riskiness Buckets

Riskiness buckets (vertical axis of the Figure 5 above) are defined using basic definitions that all significant banks should be able to provide in their loan tape (see Section 2.4), such as past due status etc. To simplify this distinction, forward looking criteria – such as PD – have been avoided. The specific definitions are:

  • Default more than 12 months: Is and has been non-performing with days past due more than 12 months (internal or EBA definition).
  • Default more than six months but less than 12 months: Is and has been non-performing with days past due of more than six months but less than 12 (internal or EBA definition);
  • Default less than six months: Is and has been non-performing with days past due of less than six months (internal or EBA definition);
  • High-risk cured: Was NPE less than 12 months ago (internal or EBA definition), and currently shows any of the potential deterioration signs referred to below;
  • High risk: Has not been non-performing for the last 12 months, but currently shows one of the signs of potential deterioration defined in Table 28;
  • Normal cured: Currently has none of the high risk signs, but has been non-performing less than 12 months ago (internal or EBA definition);
  • Normal: Currently has none of the high risk signs, and has not been non-performing for the last 12 months, at least;


Note: Past due definitions should respect local definition of materiality as per Article 178 of CRR.

Data required

The basis for the stratification is the sampling dataset, as per the section above. The fields required are listed in the table below.

Parameters required

Riskiness buckets will be defined through the combination of three flags: //Current status flag//, //Time in default// and //Cured//:

Calculation approach

To calculate the riskiness buckets, the parameters above have to be simply combined:

  • Default more than 12 months when: ­
    • Current status flag = Default;
    • And Time in default = More than 12 months;
    • And Cured = N/A;
  • Default less than 12 months when:
    • Current status flag = Default;
    • And Time in default = six to 12 months; ­unknownLineBreak
    • And Cured = N/A.
  • Default less than 6 months when:
    • Current status flag = Default; ­unknownLineBreak
    • And Time in default = Less than six months; ­unknownLineBreak
    • And Cured = N/A.
  • High-risk cured when:
    • Current status flag = High Risk;
    • And Time in default = N/A;
    • And Cured = 1.
  • High risk when:
    • Current status flag = High Risk;
    • And Time in default = N/A; ­unknownLineBreak
    • And Cured = 0.
  • Normal cured when:
    • Current status flag = Normal;
    • And Time in default = N/A; ­unknownLineBreak
    • And Cured = 1.
  • Normal when:
    • Current status flag = Normal; ­unknownLineBreak
    • And Time in default = N/A; ­unknownLineBreak
    • And Cured = 0.

2.2 – Stratify By Exposure Size Buckets

Exposure size buckets (horizontal axis of the Figure 5 above) are defined in three steps:

  • Top ten debtors by exposure size of each portfolio and risk bucket are sampled;
  • Smallest exposures (i.e. less than 5th percentile ((5% smallest exposures (based on total number of debtors in the portfolio) ordered by exposure size. )) ) are excluded from the analysis on the basis of the immateriality of the potential adjustment;
  • The range between the tenth debtor by exposure size and the 5th percentile (5% smallest exposures (based on total number of debtors) ordered by exposure size) is split into five buckets of the same absolute difference in exposure.


Data required

The basis for the stratification is the sampling dataset, as per the sections above. The fields required are listed in the table below.


Parameters required

For clarity:

  • A Stratum is a sub-segment of the portfolio with similar exposure size and risk classification – i.e. normal risk, exposure size bucket 1 would be an example of a Stratum. Strata is the plural of Stratum(!)
  • A Common Risk Strata is a group of Stratum with different levels of exposures but the same risk characteristics – i.e. normal risk, exposure size bucket 1 and normal risk, exposure size bucket 2 would both be in a Common Risk Strata
  • A Common Exposure Strata is a group of sub-segments with different levels of risk but the same exposure characteristics – i.e. normal risk, exposure size bucket 1 and normal cure risk, exposure size bucket 1 would both be in a Common Exposure Strata

Exposure size buckets will be defined through the comparison of the Exposure for each debtor and a number of exposure cut-off points:

  • 5th Percentile;
  • Cut-off1;
  • Cut-off2;
  • Cut-off3;
  • Cut-off4;
  • Top10th Exposure.


These cut-offs are specific to each portfolio and riskiness buckets, meaning that, for instance, cut-off points for retail mortgages normal will be different from cut-off points for retail mortgages defaulted >12 months and different from large corporates defaulted >12 months. The steps to calculate them are explained below and illustrated in the Figure 6:

  • Calculate the 5th Percentile of exposure (by debtor) for each portfolio and riskiness bucket i.e. determine the exposure of the debtor which has an exposure smaller than 95% of the other debtors in the same Common Risk Strata. );
  • Identify the exposure size of the Top 10th debtor by exposure size in each Common Risk Strata;
  • Calculate the auxiliary variable “Step” as:
  • Step = (Top10th Exposure - 5th Percentile) / 5
  • For i = 1 to 4, calculate Cut-offi as: Cut offi = 5th percentile + (Step xi)


Calculation approach

Once the parameters are calculated, each debtor is allocated to the corresponding exposure size bucket:

  • Exposure size bucket = Top10 when Top10th Exposure ≤ Exposure;
  • Exposure size bucket = 5 when Cut-off4 ≤ Exposure < Top10th Exposure;
  • Exposure size bucket = 4 when Cut-off3 ≤ Exposure < Cut-off4;
  • ...
  • Exposure size bucket = 1 when 5th Percentile < Exposure < Cut-off1;
  • Exposure size bucket = 5th Percentile when Exposure ≤ 5th Percentile;

Step 3. Select The Priority Debtors

In order to anticipate the beginning of the credit file review, the “priority debtors” will be selected. This will consist of the top 10 debtors (top 5 for small granular non-retail portfolios) by exposure size per portfolio and riskiness bucket. Picking these files should be relatively straight forward, allowing credit file review to begin swiftly on completion of the loan tape. If the 10th and 11th debtor are strictly identical by exposure then lowest allocated value of collateral can be used to select which debtor to go into the priority debtors. If allocated collateral is equal then a random choice should be made.

At NCA discretion, in addition to the top 10 debtors, all debtors within the top 20 groups of connected clients (across all selected portfolios, not by portfolio/riskiness bucket) can be selected as an additional priority group, to the extent they have not already been analysed. NCAs will decide at the beginning of this step if they wish to pursue this option.

Data Required

The basis for the selection of the priority debtors is the sampling dataset, as per the sections above. The fields required are listed in the table below.

Calculation Approach

The selection of the priority debtors is as easy as picking the debtors that have been allocated to the Top10 exposure size bucket for all the portfolios and riskiness buckets. For the avoidance of doubt, this means that 70 debtors will be selected per portfolio (10 per riskiness bucket), though some debtors may belong to the same group of connected clients, and therefore be analysed together. In these circumstances, no extra priority debtors should be selected.

Step 4. Select Random Stratified Sample

The stratification of the portfolios enables sufficient audit evidence with only a few observations per stratum. This section outlines how the number of observations per stratum is defined and how individual debtors will be picked once the sample size has been calculated.

4.1 Calculate Sample Size

Not all of the strata will be sampled. In general, small exposures will not be reviewed and in the case of retail mortgage portfolios, for those debtors that do not show any evidence of current or past reasons for potential impairment, only the largest exposures will be reviewed.

The number of files sampled per stratum is defined based on the following factors:

  • The risk category of the stratum;
  • The AQR asset segment (residential real estate (RRE) vs. non-retail);
  • Whether the portfolio is granular or not (i.e. has more than 1,000 individual debtors);
  • The size of the portfolio;
  • The number of debtors in the stratum.


Data required

The basis for the calculation of the sample size is the sampling dataset, as per the sections above. The fields required are listed in the table below.

Parameters required

The parameters required to determine the statistical sufficiency of the sample are provided by the CPMO. The parameters are shown in the Table below.

NCA bank teams may apply the parameters for small concentrated non-retail portfolios when: The total RWA of the portfolio is less than 5% of the total credit RWA of the bank and the top 50 debtors account for at least 40% of the total exposure in the portfolio. NCA bank teams may petition to apply the parameters where the total RWA of the portfolio is between 5 and 10% of the total credit RWA of the bank and the top 50 debtors account for at least 40% of the total exposure in the portfolio where the number of files selected for the bank is greater than the expected number of files communicated by the CPMO at the end of Phase 1.The following subsection explains how these parameters are applied.

Calculation approach

The first step in the calculation is to allocate exposure and number of debtors (after exclusions) by stratum, as illustrated in the following figure.

The number of observations is then looked up for each stratum from the table above. In doing so, the correct set of corporate parameters (granular, non-granular or small and granular) should be looked up, depending on the number of observations in the portfolio after exclusions.

If forbearance information is not available to determine the high risk segment and no conservative proxy is available (as described in section on DIV), the sample size for normal cured and normal should be increased by a factor of 4 (up to the total population of the stratum). For instance, if forbearance/restructuring information is not available for the above example, the revised sample size will be:


4.2 – Select Specific Debtors

To ensure that the sample is representative and unbiased, random sampling will be applied to select specific debtors.

Data required

The basis for the selection of specific debtors is the sampling dataset, as per the sections above. The fields required are listed in the Table below.

Calculation approach

The approach to select specific debtors is:

  • Ensure that the portfolio follows a random order by assigning a randomly generated number ((ISA 530, Appendix 4, Paragraph a: “Random selection (applied through random number generators, for example, random number tables).” )) (e.g. SAS’ ranuni(seed)) to each debtor and sorting in descending order;
  • Starting with the first debtor in the randomly sorted list, select the first “n” debtors, for each stratum where “n” is the total sample size for each stratum described in the previous section.

Alternatively, typical data management software offers solutions to run stratified samples easily (e.g. SAS’ PROC SURVEYSELECT combined with the statement “strata”). The NCA bank team may use these solutions as long as the randomness of the selection is ensured.

Experience suggests that some parties can struggle to select samples randomly. Therefore following selection of the sample, the party responsible for selecting the sample should sign a declaration that appropriate measures have been taken to ensure the sample is random and the NCA should ensure the sample selection process has been Quality Assured.

Step 5 Select The Reserve Sample

Together with the main sample, the NCA bank team will select a reserve sample. Its purpose is allowing the replacement of files under very precise circumstances, explained in Section 4.4 and Chapter 6 and to check anomalies in the projection of findings phase. This section outlines how the reserve sample is selected while preserving all the attributes defined for the main sample, such as representativeness, non-bias, sufficiency, etc.

5.1 Calculate The Sample Size For The Reserve Sample

The calculation of the reserve sample size is a parallel step to the calculation of the main sample size. The data required is the same as for the main sample and that the reserve sample will be calculated right after the main sample size has been calculated.

Calculation approach

The reserve sample, when combined with actual sample can never be more than the total number of debtors in the stratum. Given “N” debtors per strata and a main sample size of “n*”, the reserve sample size is calculated using the following expression:

  • R = min(n*, N – n*)

5.2 Designate Specific Debtors For The Reserve Sample

The selection of the specific reserve sample debtors will be carried out after the selection of the main sample. The required dataset is therefore the same, excluding those files that have been already selected, and the approach is also the same as described above.