Ensemble Learning For Television Program Rating Prediction
Keywords:ensemble method, extreme gradient boosting, random forest, television rating, time series
Rating is one of the most frequently used metrics in the television industry to evaluate television programs or channels. This research is an attempt to develop a prediction model of television program ratings using rating data gathered from UseeTV (interned-based television service from Telkom Indonesia). The machine learning methods (Random Forest and Extreme Gradient Boosting) were tried out utilizing a set of rating data from 20 television programs collected from January 2018 to August 2019 (train dataset) and evaluated using September 2019 rating data (test dataset). Research results show that Random Forest gives a better result than Extreme Gradient Boosting based on evaluation metrics: Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and Mean Absolute Percentage Error (MAPE). On the training dataset, prediction using Random Forest produced lower RMSE and MAE scores than Extreme Gradient Boosting in all programs, while on the testing dataset, Random Forest produced lower RMSE and MAE scores in 16 programs compared with Extreme Gradient Boosting. According to MAPE score, Random Forest produced more good quality prediction (4 programs in the training dataset, 16 programs in the testing dataset) than Extreme Gradient Boosting method (1 program in the training dataset, 12 programs in the testing dataset) both in training and testing dataset.
Kusnandar V B 2019. “Hanya 13% Masyarakat yang Masih Mendengarkan Radio” [Online]. Available: https://databoks.katadata.co.id/datapublish/2019/10/23/hanya-13-persen-masya rakat-yang-masih-mendengarkan-radio. [Accessed: 31 May 2020].
Eriyanto and Mutmainnnah N 2020. “Television” [Online]. Available: https://medialandscapes. org/country/indonesia/media/television. [Accessed: 31 May 2020].
Danaher P J, Dagger S T, and Smith M S 2011. Forecasting Television Rating International Journal of Forecasting vol 27 (Australia: Elsevier) pp 1215-1240.
Sereday S and Cui J 2017. Using Machine Learning to Predicting Future TV Ratings. Nielsen Journal of Measurement vol 1 (New York: Nielsen) pp 1-12.
Yusuf I A and Utami P 2007. “Kontroversi Rating di Belantara Indsutri Televisi” Jurnal Komunikasi vol 2 pp 221–234.
Zhou Z H 2009. Ensemble Learning Encyclopedia of Biometrics (Boston: Springer) pp. 1-5
Chen T and Guestrin C 2016. XGBoost: A Scalable Tree Boosting System Proceeding of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Fransico, August 2016 pp 785-794
Breiman L 2001. Random Forest Machine Learning vol 45, R E Schapire (Netherlands: Kluwer Academic) pp 7-32.
Zhang S, Zhang C, and Yang Q 2003. Data Preparation for Data Mining. Applied Artificial Intelligence vol 17 (Australia: Taylor & Francis) pp 375-381.
Sisaridis D and Markowitch O 2017. Feature Extraction and Feature Selection: Reducing Data Complexity with Apache Spark International Journal of Network Security & Its Application vol 9 pp 39–51.
Saradhi R P and Nelaturi N 2018. An Advanced Sales Forecasting System Using XGBoost Algorithm International Journal of Electrical Electronics Computers & Mechanical Engineering vol 8 pp. 221–234.
Albion C 2018. Machine Learning with Python Cookbook. California: O’Reilly
Bruce P and Bruce A 2017. Practical Statistics for Data Scientist. California: O’Reilly
Botchkarev A 2018. Evaluating Performance of Regression Machine Learning Model Using Multiple Error Metrics in Azure Machine Learning Studio SSRN Electronic Journal pp 1-16.
Moreno J J M, Pol A P, Abad A S, and Blasco BC 2013. Using the R-MAPE Index as a Resistant Measure of Forecast Accuracy. Psicothema vol 25 (Spain: Unversidad de las Islas Baleares) pp 500-506.