On the Statistical Learning Analysis of Rain Gauge Data over the Natuna Islands
Keywords:bayesian structural time series, cubic interpolation, isolation forest, Lomb-Scargle PSD, observational tropical meteorology
Located in the middle of South China Sea with distance more than 700 m to nearby main lands, Natuna Islands settings remain the focus of scientific conversation. This article presents state-of-the-art statistical learning methods for analyzing rain gauge data over the Natuna Islands. By using shape preserving piecewise cubic interpolation, we managed to interpolate 671 null values from the daily precipitation data. Dominant periodicity analysis of daily precipitation signals using Lomb-Scargle Power Spectral Density shows annual, intraseasonal, and interannual precipitation patterns over the Natuna Islands. Unsupervised anomaly analysis using the Isolation Forest algorithm shows there are 146 anomaly daily precipitation data points. We also conducted an experiment to predict the accumulation of monthly precipitation over the Natuna Islands using the Bayesian structural time series algorithm. The results show that the local linear trend with seasonality model is able to model the value of accumulated monthly precipitation for a twelve-month prediction horizon. The work presented here has profound implications for rainfall observations in this area.
Aldrian, E. and Susanto, R. D. (2003). Identification of three dominant rainfall regions within indonesia and their relationship to sea surface temperature, International Journal of Climatology 23(12): 1435–1452.
Ali, M. (2020). PyCaret: An open source, low-code machine learning library in Python. PyCaret version 2.3. URL: https://www.pycaret.org
Azhari, F., Sukoco, N. B. and Fatoni, K. I. (2021). Studi karakteristik parameter meteorologi dan gelombang untuk operasi amfibi di perairan singkawang kalimantan barat (study of the characteristics of meteorological and wave parameters for amphibious operations in singkawang, west kalimantan), Jurnal Chart Datum 6(1): 1–9.
Barker, P. M. and McDougall, T. J. (2020). Two interpolation methods using multiply- rotated piecewise cubic hermite interpolating polynomials, Journal of Atmospheric and Oceanic Technology 37(4): 605–619.
Calheiros, R. N., Ramamohanarao, K., Buyya, R., Leckie, C. and Versteeg, S. (2017). On the effectiveness of isolation-based anomaly detection in cloud data centers, Concurrency and Computation: Practice and Experience 29(18): e4169.
Dai, X., Jiang, S., Wu, C. and Jia, H. (2021). Gps time series analysis of chengdu station based on the lomb-scargle algorithm, Arabian Journal of Geosciences 14(17): 1–8.
Davidson, N. E., Mcbride, J. L. and McAvaney, B. J. (1984). Divergent circulations during the onset of the 1978–79 australian monsoon, Monthly Weather Review 112(9): 1684 – 1696.
Frost, R., Armstrong, B. C. and Christiansen, M. H. (2019). Statistical learning research: A critical review and possible new directions., Psychological Bulletin 145(12): 1128.
Johnson, D. (1997). Drawn into the fray: Indonesia’s natuna islands meet china’s long gaze south, Asian Affairs: An American Review 24(3): 153–161.
Kurniaty, R., Ikaningtyas and Ruslijanto, P. A. (2018). Analysis on traditional fishing grounds in indonesia`s natuna waters under international law, IOP Conference Series: Earth and Environmental Science 137: 012039.
Lee, M.-K., Moon, S.-H., Yoon, Y., Kim, Y.-H. and Moon, B.-R. (2018). Detecting anomalies in meteorological data using support vector regression, Advances in Meteorology 2018: 1–14.
Li, G. and Jung, J. J. (2021). Dynamic graph embedding for outlier detection on multiple meteorological time series, PLOS ONE 16(2): 1–14. URL: https://doi.org/10.1371/journal.pone.0247119
Liu, F., Ting, K. M. and Zhou, Z.-H. (2012). Isolation-based anomaly detection, ACM Trans. Knowl. Discov. Data 6(1).
Lomb, N. R. (1976). Least-squares frequency analysis of unequally spaced data, Astrophysics and Space Science 39: 447–462.
McKinney, W. (2010). Data Structures for Statistical Computing in Python, in S. van der Walt and J. Millman (eds), Proceedings of the 9th Python in Science Conference, pp. 56–61.
Narulita, I., Fajary, F. R., Syahputra, M. R., Kusratmoko, E. and Djuwansah, M. R. (2021). Spatio-temporal rainfall variability of equatorial small island: case study bintan island, indonesia, Theoretical and Applied Climatology 144(625–641): 625–641.
Poyser, O. (2019). Exploring the dynamics of bitcoin’s price: a bayesian structural time series approach, Eurasian Economic Review 9(1): 29–60.
Press, W. H., Teukolsky, S. A., Vetterling, W. T. and Flannery, B. P. (2007). Numerical Recipes 3rd Edition: The Art of Scientific Computing, 3 edn, Cambridge University Press, USA.
Puggini, L. and McLoone, S. (2018). An enhanced variable selection and isolation forest based methodology for anomaly detection with oes data, Engineering Applications of Artificial Intelligence 67: 126–135.
Qin, J., Liang, J., Chen, T., Lei, X. and Kang, A. (2019). Simulating and predicting of hydrological time series based on tensorflow deep learning., Polish Journal of Environmental Studies 28(2).
Qin, Y. and Lou, Y. (2019). Hydrological time series anomaly pattern detection based on isolation forest, 2019 IEEE 3rd Information Technology, Networking, Electronic and Automation Control Conference (ITNEC), pp. 1706–1710.
Qiu, J., Jammalamadaka, S. R. and Ning, N. (2018). Multivariate bayesian structural time series model., J. Mach. Learn. Res. 19(1): 2744–2776.
Rabbath, C. and Corriveau, D. (2019). A comparison of piecewise cubic hermite interpolating polynomials, cubic splines and piecewise linear functions for the approximation of projectile aerodynamics, Defence Technology 15(5): 741–757.
Scargle, J. (1983). Studies in astronomical time series analysis. ii - statistical aspects of spectral analysis of unevenly spaced data, The Astrophysical Journal 263: 835 – 853.
Schmidt, J. E., Burley, J. L., Elmore, B., Fiorino, S. T., Keefer, K. J. and Van Zandt, N. R. (2018). 4d weather cubes and defense applications, Defense Innovation Handbook 432.
Schmitt, E., Tull, C. and Atwater, P. (2018). Extending bayesian structural time-series estimates of causal impact to many-household conservation initiatives, The Annals of Applied Statistics 12(4): 2517–2539.
Scott, S. L. (2020). bsts: Bayesian Structural Time Series. R package version 0.9.5. URL: https://CRAN.R-project.org/package=bsts
Scott, S. and Varian, H. (2014). Predicting the present with bayesian structural time series, International Journal of Mathematical Modelling and Numerical Optimisation 5: 4 – 23.
Sudirman, A., Mooy, J. and M. F. Malufti, R. A. R. (2013). Militarising the natuna islands for indonesia’s gunboat diplomacy, Central European Journal of International and Security Studies 13(4): 12 – 33.
Trauth, M. H. (2015). Time-Series Analysis, Springer Berlin Heidelberg, Berlin, Heidelberg, pp. 151–213.
Tuite, J. J. and Harley, R. W. (2013). Meteorological and intelligence evidence of long- distance transit of chemical weapons fallout from bombing early in the 1991 persian gulf war, Neuroepidemiology 40: 160 – 177.
Uieda, L., Tian, D., Leong, W. J., Toney, L., Schlitzer, W., Yao, J., Grund, M., Jones, M., Materna, K., Newton, T., Ziebarth, M. and Wessel, P. (2021). PyGMT: A Python interface for the Generic Mapping Tools. URL: https://doi.org/10.5281/zenodo.4592991
Vapnik, V. (1999). An overview of statistical learning theory, IEEE Transactions on Neural Networks 10(5): 988–999.
Wolberg, G. and Alfy, I. (1999). Monotonic cubic spline interpolation, pp. 188–195. Xavier, P., Lim, S. Y., Abdullah, M. F. A. B., Bala, M., Chenoli, S. N., Handayani, A. S., Marzin, C., Permana, D., Tangang, F., Williams, K. D. and Yik, D. J. (2020). Seasonal dependence of cold surges and their interaction with the madden–julian oscillation over southeast asia, Journal of Climate 33(6): 2467 – 2482.
Xu, H., Song, Y., Goldsmith, Y. and Lang, Y. (2019). Meridional itcz shifts modulate tropical/subtropical asian monsoon rainfall, Sci. Bull 64(23): 1737–1739.
Yao, C., Ma, X., Chen, B., Zhao, X. and Bai, G. (2019). Distribution forest: An anomaly detection method based on isolation forest, in P.-C. Yew, P. Stenstr¨om, J. Wu, X. Gong and T. Li (eds), Advanced Parallel Processing Technologies, Springer International Publishing, Cham, pp. 135–147.
Zhang, T., Wang, E. and Zhang, D. (2019). Predicting failures in hard drivers based on isolation forest algorithm using sliding window, Journal of Physics: Conference Series 1187: 042084.
Zhong, S., Fu, S., Lin, L., Fu, X., Cui, Z. and Wang, R. (2019). A novel unsupervised anomaly detection for gas turbine using isolation forest, 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pp. 1–6.