Comparison of Soft and Hard Clustering: A Case Study on Welfare Level in Cities on Java Island

Analisis cluster dengan menggunakan hard clustering dan soft clustering untuk pengelompokkan tingkat kesejahteraan kabupaten/kota di pulau Jawa


  • Nurafiza Thamrin Politeknik Statistika STIS, Indonesia
  • Arie Wahyu Wijayanto Politeknik Statistika STIS, Indonesia



The National Medium Term Development Plan 2020-2024 states that one of the visions of national development is to accelerate the distribution of welfare and justice. Cluster analysis is analysis that grouping of objects into several smaller groups where the objects in one group have similar characteristics. This study was conducted to find the best clustering method and to classify cities based on the level of welfare in Java. In this study, the cluster analysis that used was hard clustering such as K-Means, K-Medoids (PAM and CLARA), and Hierarchical Agglomerative as well as soft clustering such as Fuzzy C Means. This study use elbow method, silhouette method, and gap statistics to determine the optimal number of clusters. From the evaluation results of the silhouette coefficient, dunn index, connectivity coefficient, and Sw/Sb ratio, it was found that the best cluster analysis was Agglomerative Ward Linkage which produced three clusters. The first cluster consists of 27 cities with moderate welfare, the second cluster consists of 16 cities with high welfare, the third cluster consists of 76 cities with low welfare. With the best clustering results, the government of cities in Java shall be able to make a better policies of welfare based on the dominant indicators found in each cluster.


Download data is not yet available.


Alwi, W., & Hasrul, M. (2018). Analisis Klaster Untuk Pengelompokkan Kabupaten/Kota Di Provinsi Sulawesi Selatan Berdasarkan Indikator Kesejahteraan Rakyat. Jurnal MSA (Matematika Dan Statistika Serta Aplikasinya), 6(1), 35.

Arora, P., Varshney, S., & others. (2016). Analysis of K-Means and K-Medoids algorithm for big data. Procedia Computer Science, 78, 507–512.

BAPPENAS. (2019). Narasi RPJMN 2020-2024. BAPPENAS.

BPS. (2018). Indikator Kesejahteraan Rakyat 2018. BPS.

Brock, G., Pihur, V., Datta, S., Datta, S., & others. (2011). ClValid, an R package for cluster validation. Journal of Statistical Software (Brock et al., March 2008).

Clayman, C. L., Srinivasan, S. M., & Sangwan, R. S. (2020). K-means Clustering and Principal Components Analysis of Microarray Data of L1000 Landmark Genes. Procedia Computer Science, 168, 97–104.

Govender, P., & Sivakumar, V. (2020). Application of k-means and hierarchical clustering techniques for analysis of air pollution: A review (1980–2019). Atmospheric Pollution Research, 11(1), 40–56.

Grekousis, G., & Thomas, H. (2012). Comparison of two fuzzy algorithms in geodemographic segmentation analysis: The Fuzzy C-Means and Gustafson–Kessel methods. Applied Geography, 34, 125–136.

Gupta, T., & Panda, S. P. (2018). A comparison of k-means clustering algorithm and clara clustering algorithm on iris dataset. International Journal of Engineering & Technology, 7(4), 4766–4768.

Hadi, B. S. (2017). Pendekatan Modified Particle Swarm Optimization dan Artificial Bee Colony pada Fuzzy Geographically Weighted Clustering (Studi Kasus pada Faktor Stunting Balita di Provinsi Jawa Timur) [PhD Thesis]. Institut Teknologi Sepuluh Nopember.

Hidayatullah, K. H. (2014). Analisis Klaster Untuk Pengelompokan Kabupaten/Kota di Provinsi Jawa Tengah Berdasarkan Indikator Kesejahteraan Rakyat. Jurnal Statistika Universitas Muhammadiyah Semarang, 2(1).

Izakian, H., & Abraham, A. (2011). Fuzzy C-means and fuzzy swarm for fuzzy clustering problem. Expert Systems with Applications, 38(3), 1835–1838.

Izzuddin, A. (2015). Optimasi Cluster pada Algoritma K-Means dengan Reduksi Dimensi Dataset Menggunakan Principal Component Analysis untuk Pemetaan Kinerja Dosen. Energy, 5(2), 41–46.

Johnson, R. A., Wichern, D. W., & others. (2002). Applied multivariate statistical analysis (Vol. 5). Prentice hall Upper Saddle River, NJ.

Kaufman, L., & Rousseeuw, P. J. (1990). Partitioning around medoids (program pam). Finding Groups in Data: An Introduction to Cluster Analysis, 344, 68–125.

Kaufman, L., & Rousseeuw, P. J. (2009). Finding groups in data: An introduction to cluster analysis (Vol. 344). John Wiley & Sons.

Muntaner, C., Chung, H., Benach, J., & Ng, E. (2012). Hierarchical cluster analysis of labour market regulations and population health: A taxonomy of low-and middle-income countries. BMC Public Health, 12(1), 286.

Rahayu, G., & Mustakim, M. (2017). Principal Component Analysis untuk Dimensi Reduksi Data Clustering Sebagai Pemetaan Persentase Sertifikasi Guru di Indonesia. Seminar Nasional Teknologi Informasi Komunikasi Dan Industri, 201–208.

Silvi, R. (2018). Analisis Cluster dengan Data Outlier Menggunakan Centroid Linkage dan K-Means Clustering untuk Pengelompokan Indikator HIV/AIDS di Indonesia. JURNAL MATEMATIKA MANTIK, 4(1), 22–31.

Soemartini, S., & Supartini, E. (2017). Analisis K-Means Cluster Untuk Pengelompokan Kabupaten/Kota di Jawa Barat Berdasarkan Indikator Masyarakat.

Supranto, J. (2010). Analisis Multivariat Arti dan Interprestasi, cet. Kedua. Jakarta: Rineka Cipta.

Wijayanto, A. W., & Takdir. (2014). Fighting cyber crime in email spamming: An evaluation of fuzzy clustering approach to classify spam messages. 2014 International Conference on Information Technology Systems and Innovation (ICITSI), 19–24.




How to Cite

Thamrin, N., & Wijayanto, A. W. (2021). Comparison of Soft and Hard Clustering: A Case Study on Welfare Level in Cities on Java Island: Analisis cluster dengan menggunakan hard clustering dan soft clustering untuk pengelompokkan tingkat kesejahteraan kabupaten/kota di pulau Jawa. Indonesian Journal of Statistics and Its Applications, 5(1), 141–160.