Study of Bagging Application in the Safe-Level Smote Method in Handling Unbalanced Classification
Kajian Penerapan Bagging pada Metode Safe-Level Smote dalam Penanganan Klasifikasi Kelas Tidak Seimbang
Keywords:imbalanced class, smote, safe-level smote, bagging, support vector machine
The problems of imbalanced class classification have been found in many real applications. It has potential to make the minority class instances tend to be classified into the majority class. This study examined the performance of bagging method’s application in safe-level SMOTE based on Support Vector Machine classifier. The data used consisted of three types based on the proportion of observations in the majority and minority classes. Each type of data has three variables, two independent variables and one variable dependent. The observations of independent variables were generated based on multivariate normal distribution, while dependent variables are binary. The results showed that the classifier has a high accuracy and sensitivity for all types of data for both in the imbalanced class and the balanced class (obtained by safe-level SMOTE and safe-level SMOTEBagging). Nevertheless, specificity was the main measure in assessing the performance of the classifier because it provides accuracy in classifying the minority class observations. The specificity increased when the number of observations between the two classes were approximately balance due to the implementation of safe-level SMOTE. The best performance of the Support Vector Machine in predicting minority class observations was achieved when bagging were applied in safe-level SMOTE. The specificity rate for all types of data were 77.93 percent, 78.46 percent, and 85.69 percent, respectively.
Aggarwal CC. (2015). Data Classification: Algorithms and Applications. CRC Press, Taylor & Francis Group, LLC.
Akbani, R., Kwek, S., & Japkowicz, N. (2004). Applying support vector machines to imbalanced datasets. In European conference on machine learning (pp. 39-50). Springer, Berlin, Heidelberg.
Breiman, L. (1996). Bagging predictors. Machine learning, 24(2), 123-140.
Bunkhumpornpat, C., Sinapiromsaran, K., & Lursinsap, C. (2009). Safe-level-smote: Safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. In Pacific-Asia conference on knowledge discovery and data mining (pp. 475-482). Springer, Berlin, Heidelberg.
Chawla, N. V. (2009). Data mining for imbalanced datasets: An overview. Data mining and knowledge discovery handbook, 875-886.
Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16, 321-357.
Han, H., Wang, W. Y., & Mao, B. H. (2005). Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In International conference on intelligent computing (pp. 878-887). Springer, Berlin, Heidelberg.
Hanifah, F. S., Wijayanto, H., & Kurnia, A. (2015). Smotebagging algorithm for imbalanced dataset in logistic regression analysis (case: Credit of bank x). Applied Mathematical Sciences, 9(138), 6857-6865.
He, H., & Garcia, E. A. (2009). Learning from imbalanced data. IEEE Transactions on knowledge and data engineering, 21(9), 1263-1284.
Hussein, A. S., Li, T., Yohannese, C. W., & Bashir, K. (2019). A-SMOTE: A new preprocessing approach for highly imbalanced datasets by improving SMOTE. International Journal of Computational Intelligence Systems, 12(2), 1412-1422.
Japkowicz, N. (2000). Learning from imbalanced data sets: a comparison of various strategies. In AAAI workshop on learning from imbalanced data sets (Vol. 68, pp. 10-15). AAAI Press Menlo Park, CA.
Mahdizadeh, M., & Eftekhari, M. (2013). Designing fuzzy imbalanced classifier based on the subtractive clustering and genetic programming. In 2013 13th Iranian Conference on Fuzzy Systems (IFSC) (pp. 1-6). IEEE.
Sahare, M., & Gupta, H. (2012). A review of multi-class classification for imbalanced data. International Journal of Advanced Computer Research, 2(3), 160.
Sartono, B., & Syafitri, U. D. (2010). Metode pohon gabungan: Solusi pilihan untuk mengatasi kelemahan pohon regresi dan klasifikasi tunggal. In Forum Statistika dan Komputasi, 15(1).
Zhou, Z. H. (2012). Ensemble methods: foundations and algorithms. CRC press.