A fresh take on risk and valuation

Start receiving our RegBrief straight to your inbox!



Non-regulatory credit models – Retention and collection models

Written by Nemanja Djajic, Senior Consultant

Collection Models

Increasing write-offs, delinquencies and bankruptcies caused by the economic crisis and inflation are placing collection departments under the spotlight in recent years. To overcome these challenges and protect financial institutions from risk, collection departments had to include advanced analytics techniques and predictive statistical modelling into their working methods. 
Developing and applying scorecards for collection purposes enables banks and other financial institutions to make the best decisions on delinquent accounts and balance the costs of collection against recoverable revenues.
Using scorecards as an additional tool helps financial institutions rank customers into various categories and provides better monitoring of the observed portfolio. This approach improves risk-based decisions on delinquent accounts and potentially separates portfolio between: 

  • accounts that are kept for in-house recovery,
  • accounts that are sent to a collection agency
  • accounts that should be written-off
  • etc.

For these reasons, collection scorecards can be an important and powerful banking tool for improving portfolio quality, reducing collection costs and increasing effectiveness. 

Model Framework

Collection scorecards can be built on different levels (account, client and product level) and for different purposes (based on the internal strategy and target group), but they all share the same motivation and follow the same development algorithm. To have the highest impact of those models and be more effective in monitoring them, a financial institution should also introduce the "ageing buckets" segmentation into their collection portfolio. This segmentation is based on customers' current delinquency and has the following buckets: 

  1. Performing customers (bucket zero)
  2. Customers with delinquency in the range 1 – 30 (bucket one)
  3. Customers with delinquency in the range 31 – 60 (bucket two)
  4. Customers with delinquency in the range 61 – 90 (bucket three)
  5. Defaulted customers (bucket four)

Having established these "buckets", there are several collection scorecards that can be built for maximizing the impact on the internal portfolio and cost reduction:

  1. Customer-level scorecard that prevents regular customers from entering the first bucket (1-30 days past due (DPD))
  2. Customer-level scorecard that prevents already delinquent customers from moving from the first bucket (1-30 DPD) to the second bucket (31 – 60 DPD)
  3. Customer-level scorecard that prevents already delinquent customers from moving from the second bucket (31-60 DPD) to the third bucket (61 – 90 DPD)
  4. Customer-level scorecard that prevents already delinquent customers from entering the default.

Depending on the purpose of development, different target variables are considered for different types of collection scorecard (e.g. customer past due in observation month, increase in DPD etc.). Unlike target variables, the output of those scorecards is always the same and represents different risk categories used by the collection department for segmenting customers:

  • Very low risk – the best customers in the portfolio (customers that will regularly pay without any collection action performed)
  • Low risk – customers that will regularly pay without any collection action performed
  • Medium risk – customers that will regularly pay, but occasionally some collection action is required
  • High risk – customers where specific action (call, letter, message) is required
  • Very high risk – customers where collection actions (call, letter, message) have the highest priority, otherwise they will probably become delinquent

According to this output, collection managers can create different strategies that will have the highest effectiveness for each segment and consequently give the best results in terms of collected recoveries and profit.

Data for Modelling

Based on the database availability, five different types of features/variables can be created and used for the development of scorecards (having all those databases is not mandatory, scorecards can be created without any of them):

  • Collection data – Data stored in the collection database connected with specific customer behaviour, customer responses, and actions. Most predictive variables from this database are usually the number of calls in the last 3/6/12 months, number of PTPs (promises-to-pay) in the last 3/6/12 months, number of letters/reminders in the last 3/6/12 months etc.
  • Application data – Information collected from the customer at the moment of application: sociodemographic data, income data and product-specific data. 
  • Credit Bureau data and Block List data – Data gathered in Credit Bureau and Block List databases (if they exist), delinquency and exposure in other financial institutions.
  • Behavior data – Internal data about customers' behaviour: DPD data, delinquency data, amount due data, credit exposure-related data and other information related to customers' internal behaviour.
  • PSD2 data (if available) – A database that contains customers' transactional information is useful for this kind of modeling, variables that can be created with these data are: ratio of income within last 3/6 months, ratio between positive and negative transactions, number of months with the increasing trend for positive/negative transactions, the amount spent on the specific type of transactions(payments for loans, payments for insurance)  etc.


An algorithm that is used for collection scorecard development follows the standard methodology used for the regulatory models (mainly behaviour and application scorecard used for PD modelling) with few differences and less restrictive rules:

  • Data preparation phase – depending on the selected scorecard type, there are multiple events/target variables that can be predicted, such as: the probability that client will not go in the second/ third collection bucket, the probability that client will not increase DPD in selected outcome period, or probability that clients will not have 30+ dpd/90+ dpd etc. The outcome period for target variable creation in these scorecards is shorter than regulatory models and usually only one month long – since the goal is to have the meaningful/and accurate collection activities for the potentially delinquent customers in the shortest period. 
  • Development sample definition – development sample for these scorecards should contain daily level data as well as information for all customers present in the collection portfolio at selected observation points.
  • Variable modification – Two types of variables are created - numerical and character variables. Character variables are replaced with appropriate WoE that represents each subcategory within the variable, while for numerical variables we have two different approaches – either to use them without modification (no grouping/bucketing) or to use fine/coarse classing approach and transform those variables into buckets. 
  • Univariate/Multivariate analysis – This step aims to reduce the number of variables that are usually too large for meaningful model development. In the Univariate analysis step variables that are not representative for the model are excluded by the following set of criteria: discriminatory power, IV, outliers and/or correlation with a target for each individual variable. Considering Multivariate analysis approach, clustering analysis is suggested for shortening the list of potential model predictors. 
  • Model development –Since those models are non-regulatory models, financial institutions may use any statistical algorithm for the development process. Machine learning methodologies as well as traditional statistical modelling techniques like decision tree, linear or logistic regression can be used for this purpose. More advanced modelling techniques, usually XGBoost or Neural Network algorithms can be considered.
  • Model validation – Validation of the collection models is the same as validation for Application/Behavior scorecards. The tests that are performed depend on the data availability and internal validation guidelines. The suggested tests that should be performed as a minimal requirement are KS (Kolmogorov-Smirnov statistics), Gini value, PSI (population stability index), VIF (variance inflation factor) and binomial test.
  • Model monitoring - there is only one difference in implementing these models and other risk-related scorecards. Collection scorecards should never be applied to the whole collection portfolio, only to a part of the population (usually 70% or 80% of active customers). Random actions should still be performed within the remaining part of the collection portfolio. This sample is also used for testing purposes - collection model effectiveness, monthly validation, and future redevelopments (champion/challenger strategy).

Use Case - Example

An example of a collection scorecard used to prevent customers to have at least 1 day delinquency (“Customer level scorecard that prevents regular customers to enter first bucket (1-30 DPD)”) is presented below:











Maximal amount due that customer had during the last 3 months.




Maximal number of days past due that customer had during the last 6 months.




Number of Promises-to-pay that customer made in the last 3 months.




Number of calls to the customer during the last 6 months.




Dummy indicator for revolving products.




Dummy indicator for missed payments.




Total inflow on customer's CA during the last 3 months.



Accordingly, since the model from example is developed using standard logistic regression algorithm, the model function is written as:

f=1,5029+ β1*F1+ β2*F2+ ⋯  β7*F7

, where βi represents estimates of the model, while Fi are used to represent variable values.

Based on the output of this function, all customers that belong to performing portfolio (in this case “bucket zero”) are separated into five risk-level buckets (very low – very high risk) – and accordingly different collection strategies are created for all of them.




Very low risk

15% customers

No action performed

Low risk

25% customers

No action preformed

Medium risk

40% customers

No action performed, occasional collection activities

High risk

15% customers

Standard approach - calls, letters, reminders

Very high risk 

5% customers

Aggressive approach - calls, letters, reminders

Key Benefits

There are many advantages for the banks that incorporate the scoring process in their collection activities:

  • Portfolio improvement - Providing a consistent approach to assess customers, so that objective decisions are made for the same group of clients consistently. This helps banks to make better-informed decisions about clients, which results in a reduced number of risky and less reliable customers. 
  • Collection costs reduction – More efficient collections operation, decreased write-offs, increased collections and recoveries, identification of self-cured accounts, and thus lower collection expenses.
  • Collection effectiveness – Increased income from good customers, more efficient debt management decisions, more effective debt collection strategies etc.

Retention models

The second type of non-regulatory models introduced in this article are retention models. Retention models became popular during the recent period (covid/ post covid era) mainly because competition in the financial market started to be extremely aggressive, and banks started to bid for customers to enlarge their portfolios. Following these changes in the financial market, banks, and other financial institutions, started to investigate new methods that can be applied to the portfolio for customer retention. 

In general, two approaches can be used:

  1. Reactive retention - applying specific actions at the moment when customer has already come into the branch to close his account. Unfortunately, at this moment, client already made his decision to move the account to a competing bank and consequently success rate of this approach is usually very low.
  1. Pro-Active retention – trying different actions to the customers who are identified as customers with higher probability to leave the bank. This approach yields a significantly higher success rate compared to the previous one, but it can have issues with “false positive” cases - customers who are wrongly marked as potential retention candidates.

The second type of retention strategy – “proactive approach”, uses data analytics and collection scorecards as a basic instrument for recognising customers who are unsatisfied in a financial institution. Besides helping banks to retain existing customers, these scorecards also reduce costs because retaining existing customers usually costs less than acquiring new ones. 

Model framework

Since retention models have a similar framework as the collection models described in the paragraph above, the procedure will not be repeated here – instead, we will focus on the differences between the two types of scorecards.

The most significant difference between the scorecards is the target variable that is used for development. Unlike the previous scorecard, where our goal was to predict and recognise delinquent customers, here we are focusing on customers that are going to leave the bank in the near future. Accordingly, the target variable used for the model development is a binary variable that marks the customers who left the bank within 3/6 months compared to the observation period. 

The output of this scorecard separates customers into five categories based on their propensity to leave the bank. For each of these categories, the financial institution should apply different proactive retention approach – no action for low-risk categories and proactive offers for customers belonging to highest-risk categories.

Data for modelling

Databases used for retention modelling purposes are the same as those used for collection scorecards. However, one different database can provide an additional value to the retention model if it is available in an internal data warehouse:

  • Internal digital data – if a financial institution has data related to customers' behaviour on digital channels – electronic and mobile banking (e. g., activity on the cIhannels, RFM indicators, number of loggings etc.)


The development process follows the standard modelling algorithm described in the collection scorecard section. For this scorecard, we can also use advanced algorithms and methods currently not in the scope of regulatory models.

Use case example

In the example below you can see one of the models for retention purposes, developed by logistic regression methodology:











Dummy indicator for account in another bank.




EAD calculated in the observation month




Maximal number of days past due that customer had during the last 6 months.




Total inflow on customer's CA during the last 3 months.




Total outflow on customer's CA during the last 3 months.




Number of various products that customer had during the last 12 months



In this case the model function can be written as:

f=0,9854+ β1*F1+ β2*F2+ ⋯  β6*F6

where βi represents estimates of the model, while Fi are used to represent variable values.


Similarly, to the collection scorecard, this model output also separates customers into five different buckets based on the propensity that they will leave the bank in the next 3/6 months.

Key benefits

There are many advantages to banks implementing the principles of scoring in their retention strategies and approaches:

  • Increasing retention rate - reducing the number of customers that will potentially leave the bank
  • Improving reputation and preventing negative reviews – customers leaving the bank usually provide negative reviews and impressions. Recognising those customers and reducing the number of them helps financial institutions to have a “good reputation” on the market.
  • Reducing costs – Retaining customers usually requires less effort and fewer costs than acquiring new ones. 
  • Better customer loyalty program – recognising the moment when our customers are unsatisfied helps financial institutions approach them proactively and build a trustful relationship.


In this article, we have presented the development of non-regulatory models and why this work is important for risk-related problems in the financial industry. Even if they are not mandatory requested by the regulators, those scorecards can significantly improve portfolio and cost control within the institutions. And although they are not directly connected with rating estimation/ECL calculation, they can support risk estimation in many other ways:

• improving relationships with the customers, 
• increasing customer satisfaction, 
• increasing business effectiveness
• reducing the costs. 

Consequently, by using these scorecards, banks are improving their internal portfolios and decreasing the number of delinquent customers. As an additional benefit, the effectiveness of internal departments is also increased since they are able to have more precise business strategies and more efficient collection and retention processes.

Share this article: