ALTERNATIF PENGGEROMBOLAN DATA DERET WAKTU DENGAN KONDISI TERDAPAT DATA KOSONG

Studi Kasus Penggerombolan Provinsi di Indonesia Berdasarkan Data Deret Waktu Rasio Gini tahun 2007 – 2017

  • Yusma Yanti Department of Computer Sciences, Pakuan University, Indonesia
  • Septian Rahardiantoro Department of Statistics, IPB
Keywords: clustering time series, correlation matrix, euclid distance, gini ratio

Abstract

Panel data describes a condition in which there are many observations with each observation observed periodically over a period of time. The observation clustering context based on this data is known as Clustering of Time Series Data. Many methods are developed based on fluctuating time series data conditions. However, missing data causes problems in this analysis. Missing data is the unavailability of data value on an observation because there is no information related to it. This study attempts to provide an alternative method of clustering observations on data with time series containing missing data by utilizing correlation matrices converted into Euclid distance matrices which are subsequently applied by the hierarchical clustering method. The simulation process was done to see the goodness of alternative method with common method used in data with 0%, 10%, 20% and 40% missing data condition. The result was obtained that the accuracy of the observation bundling on the proposed alternative method is always better than the commonly used method. Furthermore, the implementation was done on the annual gini ratio data of each province in Indonesia in 2007 to 2017 which contained missing data in North Kalimantan Province. There were 2 clusters of province with different characteristics.

References

Alfakih, A. Y., & Wolkowicz, H. (2002). Two theorems on Euclidean distance matrices and Gale transform. Linear Algebra and its Applications, 340(1-3), 149-154.
BPS. (2017, Des 17). Koefisien Gini. Diambil dari https://sirusa.bps.go.id/index.php?r=indikator/view&id=22.
Griliches, Z., & Intriligator, M. D. (1983). Handbook of Econometrics, Volume III. Elsevier Science Publishers BV.
Johnson, R. A., & Wichern, D. W. (2007). Applied Multivariate Statistical Analysis, 6th Edition. Pearson.
Liao, T. W. (2005). Clustering of time series data—a survey. Pattern recognition, 38(11), 1857-1874.
Mattjik, A. A., Sumertajaya, I., Wibawa, G. N. A., & Hadi, A. F. (2011). Sidik peubah ganda dengan menggunakan SAS. Departemen Statistika, IPB.
Montero, P., & Vilar, J. A. (2014). TSclust: An R package for time series clustering. Journal of Statistical Software, 62(1), 1-43.
Sartono, B., Rahardiantoro, S., Suhaeni, C., Irianto, E. A., & Maulidya, U. (2017, March). Segmentation of Sharia Rural Banks based on Growth of the Business Performance. In IOP Conference Series: Earth and Environmental Science (Vol. 58, No. 1, p. 012005). IOP Publishing.
Ward Jr, J. H. (1963). Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301), 236-244.
Published
2018-04-30
Section
Articles