Improving Classification Model Performances using an Active Learning Method to Detect Hate Speech in Twitter

Peningkatan Kinerja Model Klasifikasi dengan Pembelajaran Aktif dalam Mendeteksi Ujaran Kebencian di Twitter

Authors

  • Muhammad Ilham Abidin Department of Statistics, IPB University, Indonesia
  • Khairil Anwar Notodiputro Department of Statistics, IPB University, Indonesia
  • Bagus Sartono Department of Statistics, IPB University, Indonesia

DOI:

https://doi.org/10.29244/ijsa.v5i1p26-38

Abstract

Efforts from the police to address hate speech on social media such as Twitter will not be sufficient to rely solely on manual checks. Therefore, it is necessary to use statistical modelling like the classification model to detect hate speech automatically. Classification is a type of predictive modelling to produce accurate predictions based on labelled data. Generally, the available data are usually unlabelled implying that the labelling process needs to be done beforehand. Data labelling is time consuming, high cost, and often fails to produce correct labels. This research aims to improve the performances of classification models by adding a small amount of data through the so called active learning method. The results showed that there was no significant difference in the performances of logistic regression and naïve bayes classification models in detecting hate speech. However, the results also showed that adding data through the active learning method substantially improved the logistics regression performance in detecting hate speech when compared to data addition based on a simple random sampling method. Therefore, the performances of classification models in detecting hate speech on Twitter could be improved by using an active learning method.

Downloads

Download data is not yet available.

References

Hastie, T., Tibshirani, R., Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd ed. New York (US): Springer.

Hu, R. (2011). Active Learning for Text Classification [disertasi]. Ireland (US): Dublin Institute of Technology.

James, G., Witten, D., Hastie, T., Tibshirani, R. (2013). An Introduction to Statistical Learning with Applications in R. New York (US): Springer.

Kuhn, M., Johnson, K. (2013). Applied Predictive Modeling. 1st ed. New York (US): Springer.

Manning, C. D., Raghavan, P., Schütze, H. (2008). An Introduction to Information Retrieval. England: Cambridge University Press Cambridge.

Medistiara, Y. (2017). Selama 2017 polri tangani 3.325 kasus ujaran kebencian. 2017. [Internet]. [diunduh 2019 Nov 10]; Tersedia pada: https://news.detik.com/berita/d-3790973/selama-2017-polri-tangani-3325-kasus-ujaran-kebencian.

Sudut Hukum. (2016). Tinjauan tentang ujaran kebencian (hate speech). [Internet]. [diunduh 2019 Nov 10]; Tersedia pada: https://suduthukum.com/2016/11/tinjauan-tentang-ujaran-kebencian-hate.html.

Ying, X. (2019). An Overview of Overfitting and its Solutions. Journal of Physics: Conference Series. 1168(2): 22.

Downloads

Published

2021-03-31

How to Cite

Abidin, M. I., Notodiputro, K. A. ., & Sartono, B. . (2021). Improving Classification Model Performances using an Active Learning Method to Detect Hate Speech in Twitter: Peningkatan Kinerja Model Klasifikasi dengan Pembelajaran Aktif dalam Mendeteksi Ujaran Kebencian di Twitter. Indonesian Journal of Statistics and Its Applications, 5(1), 26–38. https://doi.org/10.29244/ijsa.v5i1p26-38

Issue

Section

Articles