ANALYSIS AND PREDICTION OF HEALTH INSURANCE PREMIUM VALUE USING MACHINE LEARNING ALGORITHM
Main Article Content
Danica Recca Danendra
Januponsa Dio Firizqi
Rising healthcare costs and administrative complexity in the health insurance sector underscore the need for an efficient predictive model to anticipate insurance premium prices. The study explores Machine Learning (ML) techniques to predict the value of health insurance premiums. Also, it aims to provide further insights to stakeholders to create strategies in premium pricing and risk management. This study uses the Kaggle.com datasets and a boosting regression algorithm to compare the accuracy and metric evaluation results in predicting the value of insurance premiums. Feature engineering techniques are applied to improve model performance, reduce over-fitting, and interpret the model to ensure the inclusion of relevant predictors by studying the strengths and limitations of each technique. They overcome this through feature selection, model interpret-ability, scalability, and generalization. Through this comprehensive review, the results of this study aim to provide valuable insights for practitioners, researchers, and policymakers, as well as facilitate informed decision-making in the context of determining the value of health insurance premiums through the use of ML methodologies.
Ahmad Nur Azam Ahmad Ridzuan, Aina Zafirah Azman, Fatin Alya Marzuki, Wan Shazmien Danieal Mohamed Faudzi, Siti Hajar Abd Aziz, & Norida Abu Bakar. (2024). Health Insurance Premium Pricing Using Machine Learning Methods. Journal of Advanced Research in Applied Sciences and Engineering Technology, 41(1), 134–141. https://doi.org/10.37934/araset.41.1.134141
Amor, E. N. (2023). Analisis Klasifikasi Dengan Metode Random Forest, LogitBoost, dan XGBoost untuk Memprediksi Status Klaim Asuransi. Repository UGM.
Awan, M. J., Mohd Rahim, M. S., Salim, N., Rehman, A., & Nobanee, H. (2022). Machine Learning-Based Performance Comparison to Diagnose Anterior Cruciate Ligament Tears. Journal of Healthcare Engineering, 2022(Mcl), 1–18. https://doi.org/10.1155/2022/2550120
Aydin, Z. E., & Ozturk, Z. K. (2021). XGBoost Feature Selection on Chronic Kidney Disease Diagnosis. International Conference on Data Science and Applications, June, 7.
Bentéjac, C., Csörgő, A., & Martínez-Muñoz, G. (2021). A Comparative Analysis of Gradient Boosting Algorithms. Artificial Intelligence Review, 54(3), 1937–1967. https://doi.org/10.1007/s10462-020-09896-5
Billa, M. M., & Nagpal, T. (2024). Medical Insurance Price Prediction Using Machine Learning. Journal of Electrical Systems, 20(7s), 2270–2279. https://doi.org/10.52783/jes.3962
Bischl, B., Binder, M., Lang, M., Pielok, T., Richter, J., Coors, S., Thomas, J., Ullmann, T., Becker, M., Boulesteix, A., Deng, D., & Lindauer, M. (2023). Hyperparameter Optimization: Foundations, Algorithms, Best Practices, and Open Challenges. WIREs Data Mining and Knowledge Discovery, 13(2). https://doi.org/10.1002/widm.1484
Boodhun, N., & Jayabalan, M. (2018). Risk Prediction in Life Insurance Industry Using Supervised Learning Algorithms. Complex & Intelligent Systems, 4(2), 145–154. https://doi.org/10.1007/s40747-018-0072-1
Daoud, E. Al. (2019). Comparison between XGBoost, Light GBM and CatBoost Using a Home Credit Dataset. World Academy of Science, Engineering and Technology International Journal of Computer and Information Engineering, 13(1), 6–10. https://doi.org/https://doi.org/10.5281/zenodo.3607805
Dutt, R. (2020). The Impact of Artificial Intelligence on Healthcare Insurances. In Artificial Intelligence in Healthcare (pp. 271–293). Elsevier. https://doi.org/10.1016/B978-0-12-818438-7.00011-3
Groll, A., Wasserfuhr, C., & Zeldin, L. (2022). Churn Modeling of Life Insurance Policies via Statistical and Machine Learning Methods Analysis of Important Features. ArXiv, 1(2202), 1–35.
Gupta, R. Y., Sai Mudigonda, S., Kandala, P. K., & Baruah, P. K. (2019). Implementation of a Predictive Model for Fraud Detection in Motor Insurance using Gradient Boosting Method and Validation with Actuarial Models. 2019 IEEE International Conference on Clean Energy and Energy Efficient Electronics Circuit for Sustainable Development (INCCES), 1–6. https://doi.org/10.1109/INCCES47820.2019.9167733
Hanafy, M., & Ming, R. (2021). Using Machine Learning Models to Compare Various Resampling Methods in Predicting Insurance Fraud. Journal of Theoretical and Applied Information Technology, 99(12), 2819–2833.
Hanafy, M., & Ming, R. (2022). Classification of the Insureds Using Integrated Machine Learning Algorithms: A Comparative Study. Applied Artificial Intelligence, 36(1). https://doi.org/10.1080/08839514.2021.2020489
Hancock, J., & Khoshgoftaar, T. M. (2020). Medicare Fraud Detection using CatBoost. 2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science (IRI), 97–103. https://doi.org/10.1109/IRI49571.2020.00022
Hudori, H. (2020). Resampling Neural Network Untuk Penanganan Class Imbalance Pada Prediksi Klaim Asuransi. Teknois : Jurnal Ilmiah Teknologi Informasi Dan Sains, 10(1), 57–64. https://doi.org/10.36350/jbs.v10i1.78
Kofi Immanuel Jones, & Swati Sah. (2023). The Implementation of Machine Learning in The Insurance Industry with Big Data Analytics. International Journal of Data Informatics and Intelligent Computing, 2(2), 21–38. https://doi.org/10.59461/ijdiic.v2i2.47
Kulkarni, M., Meshram, D. D., Patil, B., More, R., Sharma, M., & Patange, P. (2022). Medical Insurance Cost Prediction using Machine Learning. International Journal for Research in Applied Science and Engineering Technology, 10(12), 449–456. https://doi.org/10.22214/ijraset.2022.47923
Maier, M., Carlotto, H., Saperstein, S., Sanchez, F., Balogun, S., & Merritt, S. (2020). Improving the Accuracy and Transparency of Underwriting with Artificial Intelligence to Transform the Life‐Insurance Industry. AI Magazine, 41(3), 78–93. https://doi.org/10.1609/aimag.v41i3.5320
Manathunga, V., & Zhu, D. (2022). Unearned Premium Risk and Machine Learning Techniques. Frontiers in Applied Mathematics and Statistics, 8, 16. https://doi.org/10.3389/fams.2022.1056529
Narayana, K. L., Yogesh, & Kowshik, P. (2023). Medical Insurance Premium Prediction Using Regression Models. International Journal for Research Trends and Innovation, 8(4), 1512–1517. https://doi.org/https://www.ijrti.org/viewpaperforall?paper=IJRTI2304248
Novita, R., Yani, I., & Ali, G. (2022). Sistem Prediksi untuk Penentuan Jumlah Pemesanan Obat Menggunakan Regresi Linier: Prediction System for Determine The Number of Drug Orders using Linear Regression. MALCOM: Indonesian Journal of Machine Learning and Computer Science, 2(1), 62–70.
Nugraha, A. C., & Irawan, M. I. (2023). Komparasi Deteksi Kecurangan pada Data Klaim Asuransi Pelayanan Kesehatan Menggunakan Metode Support Vector Machine (SVM) dan Extreme Gradient Boosting (XGBoost). Jurnal Sains Dan Seni ITS, 12(1), 7. https://doi.org/10.12962/j23373520.v12i1.107032
Orji, U., & Ukwandu, E. (2024). Machine Learning for an Explainable Cost Prediction of Medical Insurance. Machine Learning with Applications, 15(July 2023), 100516. https://doi.org/10.1016/j.mlwa.2023.100516
Pattipeilopy, W. F., Wibowo, A., & Utari, D. R. (2017). Pemodelan Dan Prototipe Sistem Informasi Untuk Prediksi Pembaharuan Polis Asuransi Mobil Menggunakan Algoritma C.45. Prosiding SNATIF, 4(1), 791–799.
Permai, S. D., & Herdianto, K. (2023). Prediction of Health Insurance Claims Using Logistic Regression and XGBoost Methods. Procedia Computer Science, 227, 1012–1019. https://doi.org/10.1016/j.procs.2023.10.610
Pesantez-Narvaez, J., Guillen, M., & Alcañiz, M. (2019). Predicting Motor Insurance Claims Using Telematics Data—XGBoost versus Logistic Regression. Risks, 7(2), 70. https://doi.org/10.3390/risks7020070
Putatunda, S., & Rama, K. (2019). A Modified Bayesian Optimization Based Hyper-Parameter Tuning Approach for Extreme Gradient Boosting. 2019 Fifteenth International Conference on Information Processing (ICINPRO), 1–6. https://doi.org/10.1109/ICInPro47689.2019.9092025
Putra, T. A. J., Lesmana, D. C., & Purnaba, I. G. P. (2021). Penghitungan Premi Asuransi Kendaraan Bermotor Menggunakan Generalized Linear Models dengan Distribusi Tweedie. Jambura Journal of Mathematics, 3(2), 115–127. https://doi.org/10.34312/jjom.v3i2.10136
Quan, Z., & Valdez, E. A. (2018). Predictive Analytics of Insurance Claims using Multivariate Decision Trees. Dependence Modeling, 6(1), 377–407. https://doi.org/10.1515/demo-2018-0022
Reddy, T., & Premamayudu, B. (2019). Vehicle Insurance Model Using Telematics System with Improved Machine Learning Techniques: A Survey. Ingénierie Des Systèmes d Information, 24(5), 507–512. https://doi.org/10.18280/isi.240507
Roy, R., & George, K. T. (2017). Detecting Insurance Claims Fraud using Machine Learning Techniques. 2017 International Conference on Circuit ,Power and Computing Technologies (ICCPCT), 1–6. https://doi.org/10.1109/ICCPCT.2017.8074258
Sahai, R., Al-Ataby, A., Assi, S., Jayabalan, M., Liatsis, P., Loy, C. K., Al-Hamid, A., Al-Sudani, S., Alamran, M., & Kolivand, H. (2023). Insurance Risk Prediction Using Machine Learning. In Lecture Notes on Data Engineering and Communications Technologies (Vol. 165, Issue June, pp. 419–433). https://doi.org/10.1007/978-981-99-0741-0_30
Sahu, A., Sharma, G., Kaushik, J., Agarwal, K., & Singh, D. (2023). Health Insurance Cost Prediction by Using Machine Learning. SSRN Electronic Journal, 1381–1384. https://doi.org/10.2139/ssrn.4366801
Severino, M. K., & Peng, Y. (2021). Machine Learning Algorithms for Fraud Prediction in Property Insurance: Empirical Evidence Using Real-world Microdata. Machine Learning with Applications, 5(June), 100074. https://doi.org/10.1016/j.mlwa.2021.100074
Shrestha, N. (2020). Detecting multicollinearity in regression analysis. American Journal of Applied Mathematics and Statistics, 8(2), 39–42.
So, B. (2024). Enhanced Gradient Boosting for Zero-inflated Insurance Claims and Comparative Analysis of CatBoost , XGBoost , and LightGBM. Scandinavian Actuarial Journal, June, 1–23. https://doi.org/10.1080/03461238.2024.2365390
Sushant K, S. (2020). A Commentary on the Application of Artificial Intelligence in the Insurance Industry. Trends in Artificial Intelligence, 4(1), 75–79. https://doi.org/10.36959/643/305
Tumbel, N. J., & Ananto, N. (2024). Identification on Financial Fraud by Companies Using the Logistic Regression Model. YUME : Journal of Management, 7(2), 167–179. https://doi.org/https://doi.org/10.37531/yum.v7i2.6611
Vijayalakshmi, V., Selvakumar, A., & Panimalar, K. (2023). Implementation of Medical Insurance Price Prediction System using Regression Algorithms. 2023 5th International Conference on Smart Systems and Inventive Technology (ICSSIT), January, 1529–1534. https://doi.org/10.1109/ICSSIT55814.2023.10060926
Wang, H. D. (2020). Research on the Features of Car Insurance Data Based on Machine Learning. Procedia Computer Science, 166, 582–587. https://doi.org/10.1016/j.procs.2020.02.016
Yeo, A. C., Smith, K. A., Willis, R. J., & Brooks, M. (2003). A Comparison of Soft Computing and Traditional Approaches for Risk Classification and Claim Cost Prediction in the Automobile Insurance Industry. In Springer (pp. 249–261). https://doi.org/10.1007/978-3-540-36216-6_17