• Cahyani Oktarina Department of Statistics, IPB University, Indonesia
  • Khairil Anwar Notodiputro Department of Statistics, IPB University, Indonesia
  • Indahwati Indahwati Department of Statistics, IPB University, Indonesia
Keywords: text mining, clustering, k-means, k-medoids, twitter


The presidential election is one of the political events that occur in Indonesia once in five years. Public satisfaction and dissatisfaction with political issues have led to an increase in the number of political opinion tweets. The purpose of this study is to examine the performance of the k-means and k-medoids method in the Twitter data and to tweet about the presidential election in 2019. The data used in this study are primary data taken from Muhyi's research, then mining the text against data obtained. Because this data has been processed by Muhyi to analyze the electability of the 2019 presidential candidate pairs, for this journal needs a preprocessing was carried out to analyze the tendency of tweets to side with the candidate pairs of one or two. The difference in the pre-processing of this research with previous research is that there is a cleaning of duplicate data and normalizing. The results of this study indicate that the optimal number of clusters resulting from the k-means method and the k-medoid method are different.


Download data is not yet available.


Arora, P., Deepali, & Varshney, S. (2016). Analysis of k-means and k-medoids algorithm for big data. Procedia Computer Science, 78: 507–512.

Cebeci, Z., & Yildiz, F. (2015). Comparison of k-means and fuzzy c-means algorithms on different cluster structures. Agrárinformatika/Journal of Agricultural Informatics, 6(3): 13–23.

Han, J., Pei, J., & Kamber, M. (2011). Data mining: concepts and techniques. Elsevier.

Hanna, A., Wells, C., Maurer, P., Friedland, L., Shah, D., & Matthes, J. (2013). Partisan alignments and political polarization online: A computational approach to understanding the French and US presidential elections. Proceedings of the 2nd Workshop on Politics, Elections and Data, 15–22.

Java, A., Song, X., Finin, T., & Tseng, B. (2007). Why we twitter: understanding microblogging usage and communities. Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 Workshop on Web Mining and Social Network Analysis, 56–65.

Johnson, R. A., & Wichern, D. W. (2007). Applied multivariate statistical analysis (Vol. 6). New Jersey (US): Pearson Education.

Kaplan, A. M., & Haenlein, M. (2010). Users of the world, unite! The challenges and opportunities of Social Media. Business Horizons, 53(1): 59–68.

Kaufman, L., & Rousseeuw, P. J. (2009). Finding groups in data: an introduction to cluster analysis. John Wiley & Sons.

Mattjik, A., & Sumertajaya, I. (2011). Sidik Peubah Ganda: Menggunakan SAS. Bogor (ID): IPB Press.

Muhyi, F. (2019). Penggunaan Twitter sebagai Penyedia Peubah Penyerta dalam Pendugaan Area Kecil [tesis]. Bogor (ID): IPB University.

Munková, D., Munk, M., & Vozár, M. (2013). Data pre-processing evaluation for text mining: transaction/sequence model. Procedia Computer Science, 18: 1198–1207.

Simhachalam, B., & Ganesan, G. (2016). Performance comparison of fuzzy and non-fuzzy classification methods. Egyptian Informatics Journal, 17(2): 183–188.

Sivarathri, S., & Govardhan, A. (2014). Experiments on Hypothesis “Fuzzy K-Means is better than K-Means for Clustering.” International Journal of Data Mining & Knowledge Management Process, 4(5): 21–34.

Tiwari, M., & Singh, R. (2012). Comparative investigation of k-means and k-medoid algorithm on iris data. International Journal of Engineering Research and Development, 4(8): 69–72.

How to Cite
Oktarina, C., Notodiputro, K., & Indahwati, I. (2020). COMPARISON OF K-MEANS CLUSTERING METHOD AND K-MEDOIDS ON TWITTER DATA. Indonesian Journal of Statistics and Its Applications, 4(1), 189-202.