ANALYSIS AND PREDICTION OF HEALTH INSURANCE PREMIUM VALUE USING MACHINE LEARNING ALGORITHM

Authors

Danica Recca Danendra , Januponsa Dio Firizqi

DOI:

10.54443/morfai.v5i2.2775

Published:

2025-05-19

Downloads

Abstract

Rising healthcare costs and administrative complexity in the health insurance sector underscore the need for an efficient predictive model to anticipate insurance premium prices. The study explores Machine Learning (ML) techniques to predict the value of health insurance premiums. Also, it aims to provide further insights to stakeholders to create strategies in premium pricing and risk management. This study uses the Kaggle.com datasets and a boosting regression algorithm to compare the accuracy and metric evaluation results in predicting the value of insurance premiums. Feature engineering techniques are applied to improve model performance, reduce over-fitting, and interpret the model to ensure the inclusion of relevant predictors by studying the strengths and limitations of each technique. They overcome this through feature selection, model interpret-ability, scalability, and generalization. Through this comprehensive review, the results of this study aim to provide valuable insights for practitioners, researchers, and policymakers, as well as facilitate informed decision-making in the context of determining the value of health insurance premiums through the use of ML methodologies.

Keywords:

XGBoost CatBoost LGBM Premiums of Insurance Grid Search

References

Ahmad Nur Azam Ahmad Ridzuan, Aina Zafirah Azman, Fatin Alya Marzuki, Wan Shazmien Danieal Mohamed Faudzi, Siti Hajar Abd Aziz, & Norida Abu Bakar. (2024). Health Insurance Premium Pricing Using Machine Learning Methods. Journal of Advanced Research in Applied Sciences and Engineering Technology, 41(1), 134–141. https://doi.org/10.37934/araset.41.1.134141

Amor, E. N. (2023). Analisis Klasifikasi Dengan Metode Random Forest, LogitBoost, dan XGBoost untuk Memprediksi Status Klaim Asuransi. Repository UGM.

Awan, M. J., Mohd Rahim, M. S., Salim, N., Rehman, A., & Nobanee, H. (2022). Machine Learning-Based Performance Comparison to Diagnose Anterior Cruciate Ligament Tears. Journal of Healthcare Engineering, 2022(Mcl), 1–18. https://doi.org/10.1155/2022/2550120

Aydin, Z. E., & Ozturk, Z. K. (2021). XGBoost Feature Selection on Chronic Kidney Disease Diagnosis. International Conference on Data Science and Applications, June, 7.

Bentéjac, C., Csörgő, A., & Martínez-Muñoz, G. (2021). A Comparative Analysis of Gradient Boosting Algorithms. Artificial Intelligence Review, 54(3), 1937–1967. https://doi.org/10.1007/s10462-020-09896-5

Billa, M. M., & Nagpal, T. (2024). Medical Insurance Price Prediction Using Machine Learning. Journal of Electrical Systems, 20(7s), 2270–2279. https://doi.org/10.52783/jes.3962

Bischl, B., Binder, M., Lang, M., Pielok, T., Richter, J., Coors, S., Thomas, J., Ullmann, T., Becker, M., Boulesteix, A., Deng, D., & Lindauer, M. (2023). Hyperparameter Optimization: Foundations, Algorithms, Best Practices, and Open Challenges. WIREs Data Mining and Knowledge Discovery, 13(2). https://doi.org/10.1002/widm.1484

Boodhun, N., & Jayabalan, M. (2018). Risk Prediction in Life Insurance Industry Using Supervised Learning Algorithms. Complex & Intelligent Systems, 4(2), 145–154. https://doi.org/10.1007/s40747-018-0072-1

Daoud, E. Al. (2019). Comparison between XGBoost, Light GBM and CatBoost Using a Home Credit Dataset. World Academy of Science, Engineering and Technology International Journal of Computer and Information Engineering, 13(1), 6–10. https://doi.org/https://doi.org/10.5281/zenodo.3607805

Dutt, R. (2020). The Impact of Artificial Intelligence on Healthcare Insurances. In Artificial Intelligence in Healthcare (pp. 271–293). Elsevier. https://doi.org/10.1016/B978-0-12-818438-7.00011-3

Groll, A., Wasserfuhr, C., & Zeldin, L. (2022). Churn Modeling of Life Insurance Policies via Statistical and Machine Learning Methods Analysis of Important Features. ArXiv, 1(2202), 1–35.

Gupta, R. Y., Sai Mudigonda, S., Kandala, P. K., & Baruah, P. K. (2019). Implementation of a Predictive Model for Fraud Detection in Motor Insurance using Gradient Boosting Method and Validation with Actuarial Models. 2019 IEEE International Conference on Clean Energy and Energy Efficient Electronics Circuit for Sustainable Development (INCCES), 1–6. https://doi.org/10.1109/INCCES47820.2019.9167733

Hanafy, M., & Ming, R. (2021). Using Machine Learning Models to Compare Various Resampling Methods in Predicting Insurance Fraud. Journal of Theoretical and Applied Information Technology, 99(12), 2819–2833.

Hanafy, M., & Ming, R. (2022). Classification of the Insureds Using Integrated Machine Learning Algorithms: A Comparative Study. Applied Artificial Intelligence, 36(1). https://doi.org/10.1080/08839514.2021.2020489

Hancock, J., & Khoshgoftaar, T. M. (2020). Medicare Fraud Detection using CatBoost. 2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science (IRI), 97–103. https://doi.org/10.1109/IRI49571.2020.00022

Hudori, H. (2020). Resampling Neural Network Untuk Penanganan Class Imbalance Pada Prediksi Klaim Asuransi. Teknois : Jurnal Ilmiah Teknologi Informasi Dan Sains, 10(1), 57–64. https://doi.org/10.36350/jbs.v10i1.78

Kofi Immanuel Jones, & Swati Sah. (2023). The Implementation of Machine Learning in The Insurance Industry with Big Data Analytics. International Journal of Data Informatics and Intelligent Computing, 2(2), 21–38. https://doi.org/10.59461/ijdiic.v2i2.47

Kulkarni, M., Meshram, D. D., Patil, B., More, R., Sharma, M., & Patange, P. (2022). Medical Insurance Cost Prediction using Machine Learning. International Journal for Research in Applied Science and Engineering Technology, 10(12), 449–456. https://doi.org/10.22214/ijraset.2022.47923

Maier, M., Carlotto, H., Saperstein, S., Sanchez, F., Balogun, S., & Merritt, S. (2020). Improving the Accuracy and Transparency of Underwriting with Artificial Intelligence to Transform the Life‐Insurance Industry. AI Magazine, 41(3), 78–93. https://doi.org/10.1609/aimag.v41i3.5320

Manathunga, V., & Zhu, D. (2022). Unearned Premium Risk and Machine Learning Techniques. Frontiers in Applied Mathematics and Statistics, 8, 16. https://doi.org/10.3389/fams.2022.1056529

Narayana, K. L., Yogesh, & Kowshik, P. (2023). Medical Insurance Premium Prediction Using Regression Models. International Journal for Research Trends and Innovation, 8(4), 1512–1517. https://doi.org/https://www.ijrti.org/viewpaperforall?paper=IJRTI2304248

Novita, R., Yani, I., & Ali, G. (2022). Sistem Prediksi untuk Penentuan Jumlah Pemesanan Obat Menggunakan Regresi Linier: Prediction System for Determine The Number of Drug Orders using Linear Regression. MALCOM: Indonesian Journal of Machine Learning and Computer Science, 2(1), 62–70.

Nugraha, A. C., & Irawan, M. I. (2023). Komparasi Deteksi Kecurangan pada Data Klaim Asuransi Pelayanan Kesehatan Menggunakan Metode Support Vector Machine (SVM) dan Extreme Gradient Boosting (XGBoost). Jurnal Sains Dan Seni ITS, 12(1), 7. https://doi.org/10.12962/j23373520.v12i1.107032

Orji, U., & Ukwandu, E. (2024). Machine Learning for an Explainable Cost Prediction of Medical Insurance. Machine Learning with Applications, 15(July 2023), 100516. https://doi.org/10.1016/j.mlwa.2023.100516

Pattipeilopy, W. F., Wibowo, A., & Utari, D. R. (2017). Pemodelan Dan Prototipe Sistem Informasi Untuk Prediksi Pembaharuan Polis Asuransi Mobil Menggunakan Algoritma C.45. Prosiding SNATIF, 4(1), 791–799.

Permai, S. D., & Herdianto, K. (2023). Prediction of Health Insurance Claims Using Logistic Regression and XGBoost Methods. Procedia Computer Science, 227, 1012–1019. https://doi.org/10.1016/j.procs.2023.10.610

Pesantez-Narvaez, J., Guillen, M., & Alcañiz, M. (2019). Predicting Motor Insurance Claims Using Telematics Data—XGBoost versus Logistic Regression. Risks, 7(2), 70. https://doi.org/10.3390/risks7020070

Putatunda, S., & Rama, K. (2019). A Modified Bayesian Optimization Based Hyper-Parameter Tuning Approach for Extreme Gradient Boosting. 2019 Fifteenth International Conference on Information Processing (ICINPRO), 1–6. https://doi.org/10.1109/ICInPro47689.2019.9092025

Putra, T. A. J., Lesmana, D. C., & Purnaba, I. G. P. (2021). Penghitungan Premi Asuransi Kendaraan Bermotor Menggunakan Generalized Linear Models dengan Distribusi Tweedie. Jambura Journal of Mathematics, 3(2), 115–127. https://doi.org/10.34312/jjom.v3i2.10136

Quan, Z., & Valdez, E. A. (2018). Predictive Analytics of Insurance Claims using Multivariate Decision Trees. Dependence Modeling, 6(1), 377–407. https://doi.org/10.1515/demo-2018-0022

Reddy, T., & Premamayudu, B. (2019). Vehicle Insurance Model Using Telematics System with Improved Machine Learning Techniques: A Survey. Ingénierie Des Systèmes d Information, 24(5), 507–512. https://doi.org/10.18280/isi.240507

Roy, R., & George, K. T. (2017). Detecting Insurance Claims Fraud using Machine Learning Techniques. 2017 International Conference on Circuit ,Power and Computing Technologies (ICCPCT), 1–6. https://doi.org/10.1109/ICCPCT.2017.8074258

Sahai, R., Al-Ataby, A., Assi, S., Jayabalan, M., Liatsis, P., Loy, C. K., Al-Hamid, A., Al-Sudani, S., Alamran, M., & Kolivand, H. (2023). Insurance Risk Prediction Using Machine Learning. In Lecture Notes on Data Engineering and Communications Technologies (Vol. 165, Issue June, pp. 419–433). https://doi.org/10.1007/978-981-99-0741-0_30

Sahu, A., Sharma, G., Kaushik, J., Agarwal, K., & Singh, D. (2023). Health Insurance Cost Prediction by Using Machine Learning. SSRN Electronic Journal, 1381–1384. https://doi.org/10.2139/ssrn.4366801

Severino, M. K., & Peng, Y. (2021). Machine Learning Algorithms for Fraud Prediction in Property Insurance: Empirical Evidence Using Real-world Microdata. Machine Learning with Applications, 5(June), 100074. https://doi.org/10.1016/j.mlwa.2021.100074

Shrestha, N. (2020). Detecting multicollinearity in regression analysis. American Journal of Applied Mathematics and Statistics, 8(2), 39–42.

So, B. (2024). Enhanced Gradient Boosting for Zero-inflated Insurance Claims and Comparative Analysis of CatBoost , XGBoost , and LightGBM. Scandinavian Actuarial Journal, June, 1–23. https://doi.org/10.1080/03461238.2024.2365390

Sushant K, S. (2020). A Commentary on the Application of Artificial Intelligence in the Insurance Industry. Trends in Artificial Intelligence, 4(1), 75–79. https://doi.org/10.36959/643/305

Tumbel, N. J., & Ananto, N. (2024). Identification on Financial Fraud by Companies Using the Logistic Regression Model. YUME : Journal of Management, 7(2), 167–179. https://doi.org/https://doi.org/10.37531/yum.v7i2.6611

Vijayalakshmi, V., Selvakumar, A., & Panimalar, K. (2023). Implementation of Medical Insurance Price Prediction System using Regression Algorithms. 2023 5th International Conference on Smart Systems and Inventive Technology (ICSSIT), January, 1529–1534. https://doi.org/10.1109/ICSSIT55814.2023.10060926

Wang, H. D. (2020). Research on the Features of Car Insurance Data Based on Machine Learning. Procedia Computer Science, 166, 582–587. https://doi.org/10.1016/j.procs.2020.02.016

Yeo, A. C., Smith, K. A., Willis, R. J., & Brooks, M. (2003). A Comparison of Soft Computing and Traditional Approaches for Risk Classification and Claim Cost Prediction in the Automobile Insurance Industry. In Springer (pp. 249–261). https://doi.org/10.1007/978-3-540-36216-6_17

Author Biographies

Danica Recca Danendra, Department of Informatics, Faculty of Science and Technology, Universitas Pradita, Indonesia

Author Origin : Indonesia

Januponsa Dio Firizqi, Department of Informatics, Faculty of Science and Technology, Universitas Pradita, Indonesia

Author Origin : Indonesia

Downloads

Download data is not yet available.

How to Cite

Danendra, D. R., & Firizqi, J. D. . (2025). ANALYSIS AND PREDICTION OF HEALTH INSURANCE PREMIUM VALUE USING MACHINE LEARNING ALGORITHM. Multidiciplinary Output Research For Actual and International Issue (MORFAI), 5(2), 818–832. https://doi.org/10.54443/morfai.v5i2.2775

Similar Articles

1 2 3 4 > >> 

You may also start an advanced similarity search for this article.