COMPARISON OF K-MEANS CLUSTERING METHOD AND K-MEDOIDS ON TWITTER DATA

  • Cahyani Oktarina Department of Statistics, IPB University, Indonesia
  • Khairil Anwar Notodiputro Department of Statistics, IPB University, Indonesia
  • Indahwati Indahwati Department of Statistics, IPB University, Indonesia
Keywords: text mining, clustering, k-means, k-medoids, twitter

Abstract

The presidential election is one of the political events that occur in Indonesia once in five years. Public satisfaction and dissatisfaction with political issues have led to an increase in the number of political opinion tweets. The purpose of this study is to examine the performance of the k-means and k-medoids method in the Twitter data and to tweet about the presidential election in 2019. The data used in this study are primary data taken from Muhyi's research, then mining the text against data obtained. Because this data has been processed by Muhyi to analyze the electability of the 2019 presidential candidate pairs, for this journal needs a preprocessing was carried out to analyze the tendency of tweets to side with the candidate pairs of one or two. The difference in the pre-processing of this research with previous research is that there is a cleaning of duplicate data and normalizing. The results of this study indicate that the optimal number of clusters resulting from the k-means method and the k-medoid method are different.

References

Arora, P., Deepali, & Varshney, S. (2016). Analysis of k-means and k-medoids algorithm for big data. Procedia Computer Science, 78: 507–512. https://doi.org/10.1016/j.procs.2016.02.095

Cebeci, Z., & Yildiz, F. (2015). Comparison of k-means and fuzzy c-means algorithms on different cluster structures. Agrárinformatika/Journal of Agricultural Informatics, 6(3): 13–23. https://doi.org/10.17700/jai.2015.6.3.196

Han, J., Pei, J., & Kamber, M. (2011). Data mining: concepts and techniques. Elsevier.

Hanna, A., Wells, C., Maurer, P., Friedland, L., Shah, D., & Matthes, J. (2013). Partisan alignments and political polarization online: A computational approach to understanding the French and US presidential elections. Proceedings of the 2nd Workshop on Politics, Elections and Data, 15–22.

Java, A., Song, X., Finin, T., & Tseng, B. (2007). Why we twitter: understanding microblogging usage and communities. Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 Workshop on Web Mining and Social Network Analysis, 56–65.

Johnson, R. A., & Wichern, D. W. (2007). Applied multivariate statistical analysis (Vol. 6). New Jersey (US): Pearson Education.

Kaplan, A. M., & Haenlein, M. (2010). Users of the world, unite! The challenges and opportunities of Social Media. Business Horizons, 53(1): 59–68. https://doi.org/10.1016/j.bushor.2009.09.003

Kaufman, L., & Rousseeuw, P. J. (2009). Finding groups in data: an introduction to cluster analysis. John Wiley & Sons.

Mattjik, A., & Sumertajaya, I. (2011). Sidik Peubah Ganda: Menggunakan SAS. Bogor (ID): IPB Press.

Muhyi, F. (2019). Penggunaan Twitter sebagai Penyedia Peubah Penyerta dalam Pendugaan Area Kecil [tesis]. Bogor (ID): IPB University.

Munková, D., Munk, M., & Vozár, M. (2013). Data pre-processing evaluation for text mining: transaction/sequence model. Procedia Computer Science, 18: 1198–1207. https://doi.org/10.1016/j.procs.2013.05.286

Simhachalam, B., & Ganesan, G. (2016). Performance comparison of fuzzy and non-fuzzy classification methods. Egyptian Informatics Journal, 17(2): 183–188.

Sivarathri, S., & Govardhan, A. (2014). Experiments on Hypothesis “Fuzzy K-Means is better than K-Means for Clustering.” International Journal of Data Mining & Knowledge Management Process, 4(5): 21–34. https://doi.org/10.5121/ijdkp.2014.4502

Tiwari, M., & Singh, R. (2012). Comparative investigation of k-means and k-medoid algorithm on iris data. International Journal of Engineering Research and Development, 4(8): 69–72.

Published
2020-02-28
Section
Articles