Implementation of Ensemble Self-Organizing Maps for Missing Values Imputation

Authors

  • Titin Siswantining Department of Mathematics, Universitas Indonesia
  • Kathan Gerry Vivaldi Universitas Indonesia
  • Devvi Sarwinda Department of Mathematics, Universitas Indonesia
  • Saskya Mary Soemartojo Department of Mathematics, Universitas Indonesia
  • Ika Mattasari Universitas Indonesia
  • Herley Al-Ash Universitas Indonesia

DOI:

https://doi.org/10.29244/ijsa.v6i1p1-12

Keywords:

ensemble self-organizing maps, imputation, missing values, self-organizing maps

Abstract

The purpose of this study is to implement the ensemble self-organizing maps (E-SOM) method to impute missing values at the preprocessing data stage, which is an important stage when making predictions or classifications. The Ensemble Self-Organizing Maps (E-SOM) is the development of the SOM imputation method, in which the E-SOM method is implemented by applying an ensemble framework using several SOMs to improve generalization capabilities. In this study, the E-SOM imputation method is implemented in South African heart disease data using random forest as a classification model. The results of the model evaluation showed that for accuracy in testing data, the Random Forest model formed from E-SOM imputed data yields better accuracy values than the Random Forest model formed from SOM-imputed data for variations of 36, 49, 64, and 81 neurons, while for variation of 25 neurons both models produce the same accuracy value. From the variation of the number of ensembles applied, the E-SOM imputation method with a combination of 81 neurons and 15 ensemble numbers produced a Random Forest model with the most optimal value of accuracy.

Downloads

Download data is not yet available.

References

Yadav, M. L., & Roychoudhury, B. (2018). Handling missing values: A study of popular imputation packages in R. Knowledge-Based Systems, 160, 104-118.

Kang, H. (2013). The prevention and handling of the missing data. Korean journal of anesthesiology, 64(5), 402.

Sainani, K. L. (2015). Dealing with missing data. PM&R, 7(9), 990-994.

Nakagawa, S. (2015). Missing data: mechanisms, methods and messages. Ecological statistics: Contemporary theory and application, 81-105.

Crambes, C., & Henchiri, Y. (2019). Regression imputation in the functional linear model with missing values in the response. Journal of Statistical Planning and Inference, 201, 103-119.

Nishanth, K. J., & Ravi, V. (2016). Probabilistic neural network based categorical data imputation. Neurocomputing, 218, 17-25.

Folguera, L., Zupan, J., Cicerone, D., & Magallanes, J. F. (2015). Self-organizing maps for imputation of missing data in incomplete data matrices. Chemometrics and Intelligent Laboratory Systems, 143, 146-151.

Bustamam, A., Rivai, M. A., & Siswantining, T. (2018, October). Implementation of spectral clustering on microarray data of carcinoma using self organizing map (SOM). In AIP Conference Proceedings (Vol. 2023, No. 1, p. 020240). AIP Publishing LLC.

Köhler, A., Ohrnberger, M., & Scherbaum, F. (2010). Unsupervised pattern recognition in continuous seismic wavefield records using self-organizing maps. Geophysical Journal International, 182(3), 1619-1630.

Rustum, R., & Adeloye, A. J. (2007). Replacing outliers and missing values from activated sludge data using Kohonen self-organizing map. Journal of Environmental Engineering, 133(9), 909-916.

Cottrell, M., & Letrémy, P. (2007). Missing values: processing with the Kohonen algorithm. arXiv preprint math/0701152.

Saitoh, F. (2016, November). An ensemble model of self-organizing maps for imputation of missing values. In 2016 IEEE 9th International Workshop on Computational Intelligence and Applications (IWCIA) (pp. 9-14). IEEE.

Haykin S. (1999). Neural Networks: A comprehensive foundation, 2nd ed. Pearson Education, 465-477.

Zhou, Z. H. (2012). Ensemble methods: foundations and algorithms. CRC press.

Maimon, O. Z., & Rokach, L. (2014). Data mining with decision trees: theory and applications (Vol. 81). World scientific.

Breiman L. (2001), Random forest: machine learning, 45, 5-32.

Verikas, A., Vaiciukynas, E., Gelzinis, A., Parker, J., & Olsson, M. C. (2016). Electromyographic patterns during golf swing: Activation sequence profiling and prediction of shot effectiveness. Sensors, 16(4), 592.

United States. Stanford University., Datasets for “The Elements of Statistical Learningâ€, California. [Online]. Available: https://web.stanford.edu/~hastie/ElemStatLearn/datasets/SAheart.data. [Accessed: March 10, 2019].

Powers, D. M. (2020). Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. arXiv preprint arXiv:2010.16061.

Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: data mining, inference, and prediction. Springer Science & Business Media.

James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning (Vol. 112, p. 18). New York: springer.

Downloads

Published

2022-05-31

How to Cite

Siswantining, T., Vivaldi, K. G. ., Sarwinda, D. ., Soemartojo, S. M. ., Mattasari, I. ., & Al-Ash, H. (2022). Implementation of Ensemble Self-Organizing Maps for Missing Values Imputation. Indonesian Journal of Statistics and Its Applications, 6(1), 1–12. https://doi.org/10.29244/ijsa.v6i1p1-12

Issue

Section

Articles