EVALUASI KINERJA METODE CLUSTER ENSEMBLE DAN LATENT CLASS CLUSTERING PADA PEUBAH CAMPURAN
Keywords:clustering, cluster ensemble, LCC, mixed data, potential village data
Most of the traditional clustering algorithms are designed to focus either on numeric data or on categorical data. The collected data in the real-world often contain both numeric and categorical attributes. It is difficult for applying traditional clustering algorithms directly to these kinds of data. So, the paper aims to show the best method based on the cluster ensemble and latent class clustering approach for mixed data. Cluster ensemble is a method to combine different clustering results from two sub-datasets: the categorical and numerical variables. Then, clustering algorithms are designed for numerical and categorical datasets that are employed to produce corresponding clusters. On the other side, latent class clustering is a model-based clustering used for any type of data. The numbers of clusters base on the estimation of the probability model used. The best clustering method recommends LCC, which provides higher accuracy and the smallest standard deviation ratio. However, both LCC and cluster ensemble methods produce evaluation values that are not much different as the application method used potential village data in Bengkulu Province for clustering.
Bunkers, M. J., Miller Jr, J. R., & DeGaetano, A. T. (1996). Definition of climate regions in the Northern Plains using an objective cluster modification technique. Journal of Climate, 9(1): 130–146.
Chiu, T., Fang, D., Chen, J., Wang, Y., & Jeris, C. (2001). A robust and scalable clustering algorithm for mixed type attributes in large database environment. Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 263–268. San Francisco (US): ACM Press.
He, Z., Xu, X., & Deng, S. (2005). Clustering mixed numeric and categorical data: A cluster ensemble approach. ArXiv Preprint Cs/0509011, 1–14.
Johnson, R. A., & Wichern, D. W. (2007). Applied multivariate statistical analysis (Vol. 5). Prentice hall Upper Saddle River, NJ.
Strehl, A., & Ghosh, J. (2002). Cluster ensembles—a knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research, 3(Dec): 583–617.