City Research Online

Predictive Models for Medical Costs in Private Healthcare

Lopes, L R (2021). Predictive Models for Medical Costs in Private Healthcare. (Unpublished Doctoral thesis, City, University of London)


Crucial insurance operations such as pricing, policy renewal, reserving and underwriting rely heavily upon estimations of future claim amounts. In the healthcare field, specifically, there is also a considerable interest in identifying potential high cost individuals for inclusion in preventive care programs in order to avoid or reduce catastrophic future medical expenditures.

A valuable source of information that could be used as input to predictive models aiming to estimate future medical claim costs is the administrative claims data. The highly detailed information contained in these data come from the invoices that are sent by the healthcare providers to insurers relative to the medical procedures and services provided to the policyholders. So far, however, such data have remained relatively unexplored.

In this thesis, we use a large administrative claims data set (over 795,000 policyholders) from a Brazilian healthcare insurance company in order to build predictive models that are able to extract the most relevant information for claim cost prediction in the policyholder level. We compare traditional models such as multiple linear regression, two-part and frequency-severity, with alternative statistical learning methods that make use of a linear combination of predictors such as ridge regression, lasso and Cubist. We use 10-fold cross-validation to and models that provide a better balance between prediction accuracy and model complexity, making it easier to interpret the outcomes without compromising the accuracy of estimations.

Both lasso and Cubist offer significant improvements over traditional models in terms of accuracy of predictions and in terms of making better use of detailed medical information present in the claims data. In general, all models agree on the relevance of predictors, with previous claim amount being the most important covariate. Other covariates that suggest higher future costs are chemotherapy sessions and hospitalisations related to cancer, diabetes and kidney diseases. Among the variables that suggest lower future cost are physiotherapy sessions and hospitalisations related to pregnancy.

Publication Type: Thesis (Doctoral)
Subjects: H Social Sciences > HG Finance
R Medicine
Departments: Bayes Business School
Bayes Business School > Actuarial Science & Insurance
Doctoral Theses
[img] Text - Accepted Version
This document is not freely accessible until 31 October 2024 due to copyright restrictions.

To request a copy, please use the button below.

Request a copy



Downloads per month over past year

View more statistics

Actions (login required)

Admin Login Admin Login