Financial prediction vs. clinical engagement - which is best?

Ken Yale


Background: Several statistical models are used to predict future healthcare costs, engage patients to improve care  and avoid deterioration in their health. These models were created in the 1990s, and aside from occasional updates, they have not changed much. Recent research using other sources of data and advanced predictive analytics show a large increase in the ability to predict who is at-risk of future clinical problems, and may be a better, scalable way to determine appropriate patient engagement services, and improve both population health management and personal health management.

Predictive models used to estimate future healthcare needs are usually financially focused and based on age, gender, diagnoses and sometimes medications of individuals in populations used to develop the models. These models have achieved no greater than 30% predictive accuracy, as measured by the coefficient of determination (R2), or how well actual observations are predicted or replicated by the model (Winkelman, R., Mehmud, S., Wachenheim, L., A Comparative Analysis of Claims-Based Tools for Health Risk Assessment, Society of Actuaries, Schaumburg, IL, 2007).

One argument for low predictive accuracy is that health and medicine is too complex with too many variables and confounding forces to be any more accurate. As a result, for the past twenty years our ability to find persons with a potential health problem, contact them, engage them to improve their health, maintain their motivation, and obtain good results, has been limited. 

We developed a different approach, using clinical intervention, as measured by how well an individual patient complies with current evidence-based medicine (EBM). To develop current evidence based medicine we took the most recent nationally recognized, evidence-base guidelines and applied a rigorous process of reviewing the latest medical literature (large-scale, randomized clinical trials meeting specific criteria) as well as government healthcare agency pronouncements (e.g. the federal government Food and Drug Administration) and used the latest medical research findings and other new information to update evidence-based guidelines, turning them into EBM. This process was performed continuously, over a period of years, in order to update all available evidence-based guidelines and ensure the latest medical protocols were available. We then applied the resulting protocols to a large set of administrative (medical and pharmacy) claims data to assign “medical findings” to individuals represented in the dataset, and determine whether they complied with current EBM, recorded their future claims (utilization/costs, used as a proxy for their future health condition), and built a predictive model based on the impact of compliance on future claims. We then tested the model on a separate, evaluation dataset we had held out of the original dataset. Our results showed this approach to be a better predictor of future healthcare needs (utilization) and costs, compared to other models.


What we studied – Development of a new clinical intervention-based model.

What we did not study - Rather than test a specific hypothesis comparing a financially focused predictive model against a clinical intervention model, we constructed a predictive analytics model using a development and validation sample. Once the model was developed, we tested it against a holdout sample from the same dataset to evaluate predictive accuracy.

Methodology –

  • Two years of claims data (April 2010 to March 2012) from 2 million persons was obtained.
  • The data was divided randomly into three distinct, but otherwise similar, subsets:

  • estimation/development sample,
  • validation sample,
  • evaluation/holdout sample

  • We developed a predictive analytics model based on clinical interventions using the development and validation samples, then tested it on the holdout sample
  • The underlying data distribution was investigated before fitting any specific model, including a large class of distributions, such as normal and generalized normal distributions, Logistic, Poisson, and Negative Binomial distributions.
  • The normal distribution was considered as possible approximation for certain variables, but normal distribution assumptions were found violated, so it was ruled out as the best option.
  • All of the models developed were generalized linear models, e.g. a Logistic regression model.

Results: A new predictive model was developed that outperformed industry benchmarks. The new model demonstrated between 40 percent and 45 percent predictive accuracy, as compared to a 27 percent maximum predictive accuracy from a group of 11 commercially available predictive models.

Conclusion: Using clinical information, and specifically how an individual patient complies with current evidence-based medicine, is a better predictor of future healthcare needs (utilization) and costs, compared to other models. This new model promises greater accuracy in predicting future healthcare needs and allow better ability of clinicians to engage patients who need services, before their condition deteriorates.

Contact Us

We're not around right now. But you can send us an email and we'll get back to you, asap.

Not readable? Change text. captcha txt

Start typing and press Enter to search