The purpose of feature engineering is to create more meaningful and explainable variables from existing variables. To improve the predictive power, it is important to engineer the features intuitively and effectively.
The features should be normalized or standardized to improve the interpretability and functionality of learning algorithms. This step applies a common scale to all the features and prevents distorting the differences in the ranges.
Lagging is the simplest and most popular way of transforming existing variables into new ones. It is possible to create various lag features for a variable. Depending on expert knowledge, it can be useful to include lag features from last 6 months, 1 years, 2 years, etc.
Other feature engineering techniques include Rolling Mean, Difference, and Quotient. These techniques will inevitably produce missing values in the first few periods of the dataset. If the number of observations is sufficient, the first few rows that have missing values can be removed to have a complete dataset.
- Rolling Mean: the mean of a certain number of previous periods in a time series.
- Difference: the difference between 2 periods.
- Quotient: the ratio between 2 periods.
More advanced feature engineering techniques include:
- Creating a combination of two or more input variables, such as ratios, e.g. HPI to CPI ratio, which could be useful for modelling PIT PD for Mortgage portfolios;
- Creating a combination of two or more already transformed variables, e.g. a ratio of current GDP to the long-period rolling mean GDP, which can illustrate how a current state of economy compares to the TTC PD.
Example of Feature Engineering of EU Inflation Rate with Lag2, Difference2, Quotient3, RollingMean2 using Dplyr package in R: