Kajian Metode Pohon Model Logistik (Logistic Model Tree) dengan Penanganan Ketakseimbangan Data
DOI:
https://doi.org/10.29244/xplore.v11i2.922Keywords:
imbalanced data handling, logistic model tree, ROSE, SMOTE, undersamplingAbstract
Logistic model tree is a nonparametric modelling method that combines decision tree with linear logistic regression. Logistic model tree handles multicollinearity well, but is not immune to problems that arise due to data imbalance. This study was carried to compare the performance of undersampling, SMOTE, and ROSE in handling imbalanced data when used in tandem with logistic model tree. The data used in the simulation was obtained by generating random numbers following the Bernoulli distribution as the response variable and the Bivariate Normal distribution as the explanatory variables, based on five different imbalance levels. Comparisons done on the AUC value showed that logistic model trees built with methods to handle imbalanced data performed better than logistic model trees built without applying any such method on every level of tested data imbalance in classifying objects. Among those, logistic model trees built with ROSE performed better than logistic model trees built with other methods. On datasets with low level of imbalance, the performance of logistic model trees built with ROSE and undersampling do not significantly differ.