Imbalanced Dataset and Optimization Technique Use for Disease Treatment Default Prediction

##plugins.themes.academic_pro.article.main##

Owusu-Adjei Michael
James Ben Hayfron-Acquah
Twum Frimpong
Gaddafi Abdul-Salaam
Owusu-Debrah Nicholas
Kofi Fofie

Abstract

A critical component of predictive algorithm use is the performance evaluation metrics of algorithms used. Performance evaluations are influenced by output results obtained after extensive modeling. Many research works in machine learning domains are presently focused on improving predictive performance metrics of learning algorithms. Characteristically, real-world applications, especially in healthcare systems, have dataset classes that are extremely skewed towards one side (positive or negative). The challenge of predictive contributions from the minority class being overlooked by various machine learning algorithms is real. To address this challenge, different optimization techniques have been applied to enhance predictive performance metrics for evaluation. Many of these optimization techniques, such as under-sampling the majority class and over-sampling the minority class, lead to the loss or introduction of additional dataset information. Overcoming this challenge of information loss due to either oversampling or undersampling remains the focus of this research work. Therefore, this research work uses a novel optimization technique called class weight optimization to address this challenge. Results obtained showed that the optimum weight value for a trade-off between false positives and false negatives was 0.8553 for the minority class and 0.1447 for the majority class. However, in real-world applications like identifying disease treatment default patients, the ultimate goal is to identify Patients positive for default in treatment and this is achieved by a focus on learning techniques capable of improving recall score values. By using the class weight optimization technique, we obtained the following scores from performance evaluation: Precision score of 0.07, Recall score of 0.84 and f1 score of 0.13. Emphasis on achieving a higher recall score for our predictive model will ensure that a higher number of potential treatment default patients can be correctly identified for targeted interventions to reduce the risk of treatment default and its consequences.

##plugins.themes.academic_pro.article.details##

How to Cite
Owusu-Adjei Michael, James Ben Hayfron-Acquah, Twum Frimpong, Gaddafi Abdul-Salaam, Owusu-Debrah Nicholas, & Kofi Fofie. (2023). Imbalanced Dataset and Optimization Technique Use for Disease Treatment Default Prediction. The International Journal of Science & Technoledge, 11(5). https://doi.org/10.24940/theijst/2023/v11/i5/ST2305-005 (Original work published June 19, 2023)