Machine Learning in Ratemaking, an Application in Commercial Auto Insurance

Summary

Exploration and modeling of a new large auto insurance data set provided by the CAS.

Data Availability

The data used in this paper were obtained from the Casualty Actuary Society (CAS). Readers interested in obtaining the data for their own research projects should email Brian Fannin of the CAS a short proposal of the project they hope to conduct.

Abstract

This paper explores the tuning and results of two-part models on rich datasets provided through the Casualty Actuarial Society (CAS). These datasets include bodily injury (BI), property damage (PD) and collision (COLL) coverage, each documenting policy characteristics and claims across a four-year period. The datasets are explored, including summaries of all variables, then the methods for modeling are set forth. Models are tuned and the tuning results are displayed, after which we train the final models and seek to explain select predictions. Data were provided by a private insurance carrier to the CAS after anonymizing the dataset. These data are available to actuarial researchers for well-defined research projects that have universal benefit to the insurance industry and the public. Our hope is that the methods demonstrated here can be a good foundation for future ratemaking models to be developed and tested more efficiently.

Spencer Matthews
Spencer Matthews
PhD Student in Statistics

I enjoy studying survival analysis and applying it to new problems