• Andrew Donda Munthe Badan Pusat Statistik (BPS)
  • I Made Sumertajaya Department of Statistics, IPB
  • Utami Dyah Syafitri Department of Statistics, IPB
Keywords: clustering, K-prototype algorithm, two step cluster, villages


Statistic Indonesia (BPS) noted that in 2014 there were 3.270 villages in Nusa Tenggara Timur Province. Most of them have a high percentage of poverty. Therefore, the village clustering based on poverty indicators is very important. The clustering algorithm that can be used on large data size and with mixed variables are Two Step Cluster (TSC) and K-Prototypes. The purpose of this research is to compare of TSC and K-Prototypes algorithm for village clustering in Nusa Tenggara Timur Province based on poverty indicators. The data were taken from 2014 village potential data (PODES 2014) collected by BPS. The best selection criteria for the cluster is the minimum ratio between variance within groups and variance between groups. The result showed that the best clustering algorithm was TSC which had the smallest ratio (2.6963). The best clustering showed that villages in Nusa Tenggara Timur Province divided into six groups with different characteristics.


Aminah AS. 2017. Penerapan Metode Spasial Emperical Best Linear Unbiased Prediction Dengan Prosedur Restricted Maximum Likelihood dan Bootstrap Untuk Estimasi Persentase Penduduk Miskin Tingkat Kecamatan di Kabupaten Wonosobo [tesis]. Bandung (ID): Universitas Padjajaran.

Bacher J, Wenzig K, Vogler M. 2004. SPSS TwoStep Cluster - A First Evaluation. Lehrstuhl fur Soziologie Arbeits- und Diskussionpapiere 2:p.1–20.

[BPS] Badan Pusat Statistik Provinsi Nusa Tenggara Timur. 2014. Statistik Potensi Desa Provinsi Nusa Tenggara Timur Tahun 2014. Kupang (ID): Badan Pusat Statistik Provinsi Nusa Tenggara Timur.

[BPS] Badan Pusat Statistik. 2016. Perhitungan dan Analisis Kemiskinan Makro Indonesia 2016. Jakarta (ID): BPS.

[BPS] Badan Pusat Statistik. 2018. Profil Kemiskinan di Indonesia September 2017 No. 05/01/Th.XXI, 2 Januari 2018. Jakarta (ID): BPS.

Bunkers MJ, Miller JR, DeGaetano AT. 1996. Definition of climate regions in the northern plains using an objective cluster modification technique. Journal of Climate 9(1):p.130–146.

Chaudhary K, Sharma A. 2014. Implementation of Two Steps Clustering Using Telecommunication System. International Journal of Information Technology 7(2):p.42–48.

Chernoff H. 1973. The Use of Faces to Represent Points in k-Dimensional Space Graphically. Journal of the American Statistical Association 68(342):p.361–368.

Gan G, Ma C, Wu J. 2007. Data Clustering Theory, Algorithms, and Applications. Virginia (US) : American Statistical Association (ASA).

Green RH. 1978. Basic Human Needs : Concept or Slogan, Synthesis or Smokescreen? The IDS Bulletin 9(4).

Huang Z. 1998. Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values. Data Mining and Knowledge Discovery 2(3):p.283–304.

[IBM] International Business Machines. 2013. IBM SPSS Statistics 22 Algorithms. New York (US): IBM Corporation.

[Kemendes PDTT] Kementerian Desa, Pembangunan Daerah Tertinggal, dan Transmigrasi. 2015. Indeks Desa Membangun 2015. Jakarta (ID): Kemendes PDTT.

Sulistiyono D. 2016. Bootstrap Spasial Empirical Best Linear Unbiased Prediction Untuk Pemetaan Kemiskinan Tingkat Desa di Kabupaten Pati [tesis]. Surabaya (ID): Institut Teknologi Sepuluh November.

Zhexue H. 1997. Clustering Large Data Sets With Mixed Numeric and Categorical Values. Proceeding of the First Pacific Asia Knowledge Discovery and Data Mining Conference. pp. 21–34.