Retiree mortality forecasting: A partial age-range or a full age-range model?

An essential input of annuity pricing is the future retiree mortality. From observed age-specific mortality data, modeling and forecasting can be taken place in two routes. On the one hand, we can first truncate the available data to retiree ages and then produce mortality forecasts based on a partial age-range model. On the other hand, with all available data, we can first apply a full age-range model to produce forecasts and then truncate the mortality forecasts to retiree ages. We investigate the difference in modeling the logarithmic transformation of the central mortality rates between a partial age-range and a full age-range model, using data from mainly developed countries in the Human Mortality Database (2020). By evaluating and comparing the short-term point and interval forecast accuracies, we recommend the first strategy by truncating all available data to retiree ages and then produce mortality forecasts. However, when considering the long-term forecasts, it is unclear which strategy is better since it is more difficult to find a model and parameters that are optimal. This is a disadvantage of using methods based on time series extrapolation for long-term forecasting. Instead, an expectation approach, in which experts set a future target, could be considered, noting that this method has also had limited success in the past.


Introduction
Improving human survival probability contributes greatly to an aging population. To guarantee one individual's financial income in retirement, a policyholder may purchase a fixed-term or lifetime annuity. A fixed-term or lifetime annuity is a contract offered by insurers guaranteeing regular payments in exchange for an initial premium. Since an annuity depends on survival probabilities and interest rates, pension funds and insurance companies are more likely to face a risk of longevity. Longevity risk is a potential systematic risk attached to the increasing life expectancy of policyholders, which can eventually result in a higher payout ratio than expected (Crawford et al. 2008). The concerns about longevity risk have led to a surge of interest in modeling and forecasting age-specific mortality rates.
Many models for forecasting age-specific mortality indicators have been proposed in demographic literature (see Booth & Tickle 2008, for reviews). Of these, Lee & Carter (1992) implemented a principal component method to model the logarithm of age-specific mortality rates (m x,t ) and extracted a single time-varying index representing the trend in the level of mortality, from which the forecasts are obtained by a random walk with drift. Since then, the Lee-Carter (LC) method has been extended and modified (see Booth & Tickle 2008, Pitacco et al. 2009, Shang et al. 2011, Shang & Haberman 2018, for reviews). The LC method has been applied to many countries, including Belgium (Brouhns et al. 2002), Austria (Carter & Prkawetz 2001), England and Wales (Renshaw & Haberman 2003), and Spain (Guillen & Vidiella-i-Anguera 2005, Debón et al. 2008. Many statistical models focus on time-series extrapolation of past trends exhibited in agespecific mortality rates (see, e.g., Booth & Tickle 2008). We consider two modeling strategies: On the one hand, we can first truncate all available data to retiree ages and then produce mortality forecasts (see, e.g., Cairns et al. 2006Cairns et al. , 2009. On the other hand, we can first use the available data to produce forecasts and then truncate the mortality forecasts to retiree ages (see, e.g., Shang & Haberman 2017). In this paper, our contribution is to investigate the difference in modeling the logarithmic transformation of the central mortality rates between a partial age-range and a full age-range model, using data from mainly developed countries in the Human Mortality Database (2020). By evaluating and comparing the short-term point and interval forecast accuracies, we recommend the first strategy by truncating all available data to retiree ages and then produce mortality forecasts. However, when we consider the long-term forecasts, it is unclear which strategy is better since it is more difficult to find a model and parameters that are optimal. This is a disadvantage of using methods based on time series extrapolation for long-term forecasting. Instead, an expectation approach, in which experts set a future target, could be considered, noting that this method has also had limited success in the past (Booth & Tickle 2008). Our recommendations could be useful to actuaries for choosing a better modeling strategy and more accurately pricing a range of annuity products.
The article is organized as follows: In Section 2, we describe the mortality data sets of 19 mainly developed countries. We revisit ve time-series extrapolation models for forecasting age-specic mortality rates, which have been shown in the literature to work well across the full age range for some data sets (for more details, consult Shang 2012, Shang & Haberman 2018.
Using these models as a testbed, we compare point and interval forecast accuracies between the two modeling strategies and provide our recommendations in Section 3. Conclusions are presented in Section 4.

Data sets
The data sets used in this study were taken from the Human Mortality Database (2020). For each sex in a given calendar year, the mortality rates obtained by the ratio between "number of deaths" and "exposure to risk" are arranged in a matrix for age and calendar year. Nineteen countries, mainly developed countries, were selected, and thus 38 sub-populations of ageand sex-specific mortality rates were obtained for all analyses. The 19 countries selected all have reliable data series commencing at/before 1950. Due to possible structural breaks (i.e., two world wars), we truncate all data series from 1950 onwards. The omission of Germany is because the Human Mortality Database (2020) for a reunited Germany only dates back to 1990. The selected countries and their abbreviations are shown in Table 1, along with their last year of available data (recorded in April 2019). To avoid fluctuations at older ages, we consider ages from 0 to 99 in a single year of age and the last age group is from 100 onwards. Should we consider all ages from 0 to 110+, and we may encounter the missing-value issue and observe mortality rates outside the range of [0, 1] for some years.

Forecast evaluation
We present 19 countries that begin in 1950 and end in the last year listed in Table 1. We keep the last 30 observations for forecasting evaluation, while the remaining observations are treated as initial fitting observations, from which we produce the one-step-ahead to 30-step-ahead forecasts. Via an expanding window approach (see also Zivot & Wang 2006), we re-estimate the parameter in the time series forecasting models by increasing the fitted observations by one year and produce the one-step-ahead to 29-step-ahead forecasts. We iterate this process by increasing the sample size by one year until the end of the data period. The process produces 30 one-step-ahead forecasts, 29 two-step-ahead forecasts, . . . , one 30-step-ahead forecast. We compare these forecasts with the holdout samples to determine the out-of-sample forecast accuracy.

Forecast error criteria
To evaluate the point forecast accuracy, we consider the mean absolute percentage error (MAPE) and root mean squared percentage error (RMSPE). The MAPE and RMSPE criteria measure how close the forecasts compare with the actual values of the variable being forecast, regardless of the error sign. The MAPE and RMSPE criteria can be expressed as: where m x,j represents the actual holdout sample for age x in the forecasting year j, p denotes the total number of ages, and m x,j represents the forecasts for the holdout sample.
To evaluate the pointwise interval forecast accuracy, we consider the interval score criterion of Gneiting & Raftery (2007). We consider the common case of the symmetric 100(1 − α)% prediction intervals, with lower and upper bounds that were predictive quantiles at α/2 and 1 − α/2, denoted by m lb x,j and m ub x,j . As defined by Gneiting & Raftery (2007), a scoring rule for evaluating the pointwise interval forecast accuracy at time point j is where 1{·} denotes the binary indicator function. The optimal interval score is achieved when m x,j lies between m lb x,j and m ub x,j , with the distance between the upper and lower bounds being minimal. To obtain summary statistics of the interval score, we take the mean interval score across different ages and forecasting years. The mean interval score can be expressed as:

Comparison of point and interval forecast errors
Modeling and forecasting mortality can be taken place in two routes. On the one hand, we can first truncate the available data to certain ages, such as from 60 to 99 in a single year of age and 100+ as the last age, and then produce mortality forecasts for these retiree ages. On the other hand, we can first use the available data, i.e., age-specific mortality from 0 to 99 in a single year of age and 100+ as the last age, to produce forecasts for these 101 ages and then truncate the mortality forecasts to certain ages, such as 60 to 99 in a single year of age and 100+ as the last age.
We study five time-series extrapolation models for forecasting age-specific mortality, which have been shown in the literature to work well across the full age range for some data sets.
Note that the Cairns-Blake-Dowd suite of models are not included in this paper, because they are designed just for ages 55 and over. The models that we have considered are subjective and far from extensive, but they suffice to serve as a testbed for comparing the forecast accuracy.
For the short-term forecast horizon (i.e., the one-step-ahead forecast horizon), we compute the mean of the MAPEs and RMSPEs to evaluate the point forecast accuracy. From Table 2, there is an advantage of directly modeling and forecasting the truncated series for the female mortality. For modeling the male mortality, the advantage of directly modeling and forecasting the truncated series intensifies. By comparing the mean errors of the 19 countries,

7.13
For the short-term forecast horizon (i.e., the one-step-ahead forecast horizon), we compute the S α=0.2 to evaluate the interval forecast accuracy. From Table 3, there is an advantage of directly modeling and forecasting the truncated series for both female and male mortality. By comparing the mean errors of the 19 countries, Table 3 shows that the most accurate forecasting method is the Plat model for providing the interval forecasts of both female and male mortality rates.

4.72
For the long-term forecast horizon (i.e., the 30-step-ahead forecast horizon), we also compute the mean of the MAPEs and RMSPEs to evaluate the point forecast accuracy. From Table 4, it is unclear from the comparison of the point forecast errors if there is an advantage of modeling and forecasting the truncated series for the female mortality. In contrast, there is an advantage of modeling and forecasting the male mortality and forecasting the whole series and then truncating the mortality forecasts. By comparing the mean errors of the 19 countries, Table 4 shows that the most accurate forecasting method is the Lee-Carter model with Poisson errors for providing best estimates of the female mortality forecasts. The most accurate forecasting method is the APC model for providing the best estimates of the male mortality forecasts.
The Lee-Carter model with Poisson errors produces smaller MAPEs and RMSPEs than the Lee-Carter model with Gaussian errors. For the long-term forecast horizon (i.e., the 30-step-ahead forecast horizon), we also compute the S α=0.2 to evaluate the interval forecast accuracy. From Table 5, there is a slight advantage of directly modeling and forecasting the truncated series for the female mortality. For modeling the male mortality, there is an advantage of modeling and forecasting the whole series and then truncating the mortality forecasts. By comparing the mean errors of the 19 countries, Table 5 shows that the most accurate forecasting method is the Lee-  For the long-term mortality forecasts, we recommend the first strategy for modeling female mortality but the second strategy for modeling male mortality. It is difficult to recommend a strategy when the model and its parameters may not be optimal for the long-term forecasts. This is a disadvantage of using methods based on time series extrapolation for long term forecasting.
Instead, an expectation approach, in which experts set a future target, could be considered, noting that this method has also had limited success in the past (Booth & Tickle 2008).
There are several ways in which the present study can be further extended, and we briefly mention two: 1) We could consider some machine learning methods to model mortality forecasts (see, e.g., , Perla et al. 2020. 2) The results depend on the age range considered as well as the selected countries. The R code for reproducing the results can be provided upon request from the corresponding author.