Implementation of Ensemble Self-Organizing Maps for Missing Values Imputation

Titin Siswantining; Kathan Gerry  Vivaldi; Devvi  Sarwinda; Saskya Mary  Soemartojo; Ika  Mattasari; Herley Al-Ash

doi:10.29244/ijsa.v6i1p1-12

Authors

Titin Siswantining Department of Mathematics, Universitas Indonesia
Kathan Gerry Vivaldi Universitas Indonesia
Devvi Sarwinda Department of Mathematics, Universitas Indonesia
Saskya Mary Soemartojo Department of Mathematics, Universitas Indonesia
Ika Mattasari Universitas Indonesia
Herley Al-Ash Universitas Indonesia

DOI:

https://doi.org/10.29244/ijsa.v6i1p1-12

Keywords:

ensemble self-organizing maps, imputation, missing values, self-organizing maps

Abstract

The purpose of this study is to implement the ensemble self-organizing maps (E-SOM) method to impute missing values at the preprocessing data stage, which is an important stage when making predictions or classifications. The Ensemble Self-Organizing Maps (E-SOM) is the development of the SOM imputation method, in which the E-SOM method is implemented by applying an ensemble framework using several SOMs to improve generalization capabilities. In this study, the E-SOM imputation method is implemented in South African heart disease data using random forest as a classification model. The results of the model evaluation showed that for accuracy in testing data, the Random Forest model formed from E-SOM imputed data yields better accuracy values than the Random Forest model formed from SOM-imputed data for variations of 36, 49, 64, and 81 neurons, while for variation of 25 neurons both models produce the same accuracy value. From the variation of the number of ensembles applied, the E-SOM imputation method with a combination of 81 neurons and 15 ensemble numbers produced a Random Forest model with the most optimal value of accuracy.

Downloads

Download data is not yet available.

References

Yadav, M. L., & Roychoudhury, B. (2018). Handling missing values: A study of popular imputation packages in R. Knowledge-Based Systems, 160, 104-118.

Kang, H. (2013). The prevention and handling of the missing data. Korean journal of anesthesiology, 64(5), 402.

Sainani, K. L. (2015). Dealing with missing data. PM&R, 7(9), 990-994.

Nakagawa, S. (2015). Missing data: mechanisms, methods and messages. Ecological statistics: Contemporary theory and application, 81-105.

Crambes, C., & Henchiri, Y. (2019). Regression imputation in the functional linear model with missing values in the response. Journal of Statistical Planning and Inference, 201, 103-119.

Nishanth, K. J., & Ravi, V. (2016). Probabilistic neural network based categorical data imputation. Neurocomputing, 218, 17-25.

Folguera, L., Zupan, J., Cicerone, D., & Magallanes, J. F. (2015). Self-organizing maps for imputation of missing data in incomplete data matrices. Chemometrics and Intelligent Laboratory Systems, 143, 146-151.

Bustamam, A., Rivai, M. A., & Siswantining, T. (2018, October). Implementation of spectral clustering on microarray data of carcinoma using self organizing map (SOM). In AIP Conference Proceedings (Vol. 2023, No. 1, p. 020240). AIP Publishing LLC.

KÃ¶hler, A., Ohrnberger, M., & Scherbaum, F. (2010). Unsupervised pattern recognition in continuous seismic wavefield records using self-organizing maps. Geophysical Journal International, 182(3), 1619-1630.

Rustum, R., & Adeloye, A. J. (2007). Replacing outliers and missing values from activated sludge data using Kohonen self-organizing map. Journal of Environmental Engineering, 133(9), 909-916.

Saitoh, F. (2016, November). An ensemble model of self-organizing maps for imputation of missing values. In 2016 IEEE 9th International Workshop on Computational Intelligence and Applications (IWCIA) (pp. 9-14). IEEE.

Haykin S. (1999). Neural Networks: A comprehensive foundation, 2nd ed. Pearson Education, 465-477.

Zhou, Z. H. (2012). Ensemble methods: foundations and algorithms. CRC press.

Maimon, O. Z., & Rokach, L. (2014). Data mining with decision trees: theory and applications (Vol. 81). World scientific.

Breiman L. (2001), Random forest: machine learning, 45, 5-32.

Verikas, A., Vaiciukynas, E., Gelzinis, A., Parker, J., & Olsson, M. C. (2016). Electromyographic patterns during golf swing: Activation sequence profiling and prediction of shot effectiveness. Sensors, 16(4), 592.

United States. Stanford University., Datasets for â€œThe Elements of Statistical Learningâ€, California. [Online]. Available: https://web.stanford.edu/~hastie/ElemStatLearn/datasets/SAheart.data. [Accessed: March 10, 2019].

Powers, D. M. (2020). Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. arXiv preprint arXiv:2010.16061.

Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: data mining, inference, and prediction. Springer Science & Business Media.

James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning (Vol. 112, p. 18). New York: springer.

Implementation of Ensemble Self-Organizing Maps for Missing Values Imputation

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

indexing

Make a Submission

statcounter

Current Issue

Information

links