Application of Adaptive Synthetic Nominal and Extreme Gradient Boosting Methods in Determining Factors Affecting Obesity: A Case Study of Indonesian Basic Health Research Survey 2013

Aplikasi Metode Adaptive Synthetic Nominal dan Extreme Gradient Boosting dalam Menentukan Faktor yang Memengaruhi Obesitas: Studi Kasus Riset Kesehatan Dasar Indonesia 2013


  • Yoris Rombe Department of Statistics, Universitas Hasanuddin, Indonesia
  • Sri Astuti Thamrin Department of Statistics, Universitas Hasanuddin, Indonesia
  • Armin Lawi Department of Mathematics, Universitas Hasanuddin, Indonesia & Institut Teknologi Bacharuddin Jusuf Habibie, Indonesia



ADASYN-N, feature important, information gain, obesity, XGBoots


Obesity is the accumulation of excessive body fat and can be harmful to health. According to recent studies, several factors that contribute to the increasing prevalence of obesity in Indonesia include poor diet, lack of consumption of vegetables and fruits, high consumption of fast food, area of residence, and lack of physical activity. In addition, psychological factors, high consumption of alcohol and cigarettes, cultural differences, and stress factors also trigger obesity. The rapid development of the medical field cannot be separated from the availability of data that is increasingly easy to access and increasing knowledge in the medical field. This makes machine learning increasingly needed for pattern recognition from very large medical data, including obesity data. In this study, the factors that influence obesity status in Indonesia will be determined. In order to achieve this, Extreme Gradient Boosting (XGBoost) was used. This method is one of the classification methods that has better scalability and more efficient over its previous methods. Besides that, to overcome the imbalanced data, Adaptive Synthetic Nominal Algorithm (ADASYN-N) is used in order to balance the data and improve its prediction accuracy. Both the ADASYN-N and XGBoost methods will be applied to obesity data from the Indonesian Basic Health Research Survey in 2013. This study shows that female is more at risk in determining obesity status in Indonesia based on the highest gain value (37%). In addition, age 35-54 years, strenuous activity, and eating vegetables for 6 days are also risk factors of obesity.


Download data is not yet available.


Alkhalaf, M., Yu, P., Shen, J., & Deng, C. (2022). A review of the application of machine learning in adult obesity studies. Applied Computing and Intelligence, 2(1): 32–48.

Charbuty, B., & Abdulazeez, A. (2021). Classification Based on Decision Tree Algorithm for Machine Learning. Journal of Applied Science and Technology Trends, 2(01): 20–28.

Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research, 16: 321–357.

Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785–794.

Fithriasari, K., Hariastuti, I., & Wening, K. S. (2020). Handling Imbalance Data in Classification Model with Nominal Predictors. International Journal of Computing Science and Applied Mathematics, 6(1): 33.

Haibo He, Yang Bai, Garcia, E. A., & Shutao Li. (2008). ADASYN: Adaptive synthetic sampling approach for imbalanced learning. 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), 1322–1328.

Jukic, S., Saracevic, M., Subasi, A., & Kevric, J. (2020). Comparison of Ensemble Machine Learning Methods for Automated Classification of Focal and Non-Focal Epileptic EEG Signals. Mathematics, 8(9): 1481.

Morgenstern, J. D., Rosella, L. C., Costa, A. P., de Souza, R. J., & Anderson, L. N. (2021). Perspective: Big Data and Machine Learning Could Help Advance Nutritional Epidemiology. Advances in Nutrition, 12(3): 621–631.

Oddo, V. M., Maehara, M., & Rah, J. H. (2019). Overweight in Indonesia: an observational study of trends and risk factors among adults and children. BMJ Open, 9(9): e031198.

Rahayu, S., Adji, T. B., & Setiawan, N. A. (2017). Analisis Perbandingan Metode Over-Sampling Adaptive Synthetic-Nominal (ADASYN-N) dan Adaptive Synthetic-kNN (ADSYN-kNN) untuk Data dengan Fitur Nominal-Multi Categories. 5.

Sari, K., & Rosha, B. Ch. (2016). Several dominants risk factors related to obesity in urban childbearing age women in Indonesia. Health Science Journal of Indonesia, 6(1Jun): 63–68.

Song, Y., & Lu, Y. (2015). Decision tree methods: applications for classification and prediction. 27(2): 7.

Thamrin, S. A., Arsyad, D. S., Kuswanto, H., Lawi, A., & Nasir, S. (2021). Predicting Obesity in Adults Using Machine Learning Techniques: An Analysis of Indonesian Basic Health Research 2018. Frontiers in Nutrition, 8: 669155.

Zhou, Z.-H. (2012). Ensemble Methods: Foundations and Algorithms (0 ed.).




How to Cite

Rombe, Y., Thamrin, S. A., & Lawi, A. (2022). Application of Adaptive Synthetic Nominal and Extreme Gradient Boosting Methods in Determining Factors Affecting Obesity: A Case Study of Indonesian Basic Health Research Survey 2013: Aplikasi Metode Adaptive Synthetic Nominal dan Extreme Gradient Boosting dalam Menentukan Faktor yang Memengaruhi Obesitas: Studi Kasus Riset Kesehatan Dasar Indonesia 2013. Indonesian Journal of Statistics and Its Applications, 6(2), 309–317.


