Indonesian Journal of Statistics and Its Applications <p><strong>Indonesian Journal of Statistics and Its Applications (<a href=";1510202061&amp;1&amp;&amp;2017">eISSN:2599-0802</a>) (formerly named <a href="" target="_blank" rel="noopener">Forum Statistika dan Komputasi</a>), </strong><strong>established since 2017</strong><strong>, </strong>publishes scientific papers in the area of statistical science and the applications. The published papers should be research papers with, but not limited to, the following topics: experimental design and analysis, survey methods and analysis, operation research, data mining, statistical modeling, computational statistics, time series and econometrics, and statistics education. All papers were reviewed by peer reviewers consisting of experts and academicians across universities and agencies. This journal is <strong>nationally accredited (SINTA 3)</strong> by Directorate General of Research and Development Strengthening (DGRDS), Ministry of Research, Technology and Higher Education of the Republic of Indonesia No.: <a href="" target="_blank" rel="noopener">14/E/KPT/2019, dated 10 May 2019</a>. </p> <p><strong>Scope:</strong><br />Indonesian Journal of Statistics and Its Applications is a refereed journal committed to Statistics and its applications.</p> <p><strong>Issue</strong><em> </em><strong>Released</strong>: <em>March (No 1), July (No 2), and November (No 3). </em></p> Departemen Statistika, IPB University dengan Forum Perguruan Tinggi Statistika (FORSTAT) en-US Indonesian Journal of Statistics and Its Applications 2599-0802 Proposing Additional Indicators for Indonesia Youth Development Index with Smaller Level Analysis <p>South Kalimantan is a province in Indonesia with many youths and has the lowest score in Indonesia Youth Development Index (YDI) 2017. However, the lowest score is the gender and discrimination dimension which incomplete to be analyzed because there are some indicators that are not included in the dimension. To solve the problems, it is necessary to build a measurement that can monitor a smaller level. Through this research, the author provides a measurement for describing the level of youth development in classifications for South Kalimantan in 2018. This index is built with the analysis factor method. It consists of five dimensions used in Indonesian YDI 2017 with some additional indicators. The result of this research shows that the index is a valid measure due to its significant correlation with Indonesia YDI 2017. The other result is the youth living in urban areas tend to have a higher index than youth who live in rural areas. While the youth who are male, also tend to have a higher development index than the female population. The suggestion for the South Kalimantan government is to improve the youth, the development priority for every classification can be started from the classification and dimension of youth index with the lowest achievement.</p> Suryo Adhi Rakhmawan Copyright (c) 2021 Indonesian Journal of Statistics and Its Applications 2021-06-25 2021-06-25 5 2 220 227 10.29244/ijsa.v5i2p220-227 K-prototypes Algorithm for Clustering Schools Based on The Student Admission Data in IPB University <p>The new student admissions was regularly held every year by all grades of education, including in IPB University. Since 2013, IPB University has a track record of every school that has succeeded in sending their graduates, even until they successfully completed their education at IPB University. It was recorded that there were 5,345 schools that included in the data. It was necessary to making every school in the data into the clusters, so IPB could see which schools were classified as good or not good in terms of sending their graduates to continue their education at IPB based on the characteristics of the clusters. This study using the k-prototypes algorithm because it can be used on the data that consisting of categorical and numerical data (mixed type data). The k-prototypes algorithm could maintain the efficiency of the k-means algorithm in handling large data sizes, but eliminated the limitations of k-means. The results showed that the optimal number of clusters in this study were four clusters. The fourth cluster (421 school members) was the best cluster related to the student admission at IPB University. On the other hand, the third cluster (391 school members) was the worst cluster in this study.</p> Sri Sulastri Lismayani Usman Utami Dyah Syafitri Copyright (c) 2021 Indonesian Journal of Statistics and Its Applications 2021-06-27 2021-06-27 5 2 228 242 10.29244/ijsa.v5i2p228-242 Comparison of Short-Term Load Forecasting Based on Kalimantan Data <p>This paper investigates a case study on short term forecasting for East &nbsp;Kalimantan, with emphasis on special days, such as public holidays. A time series of load demand electricity&nbsp; recorded at hourly intervals contains more than one seasonal pattern.&nbsp; There is a great attraction in using a modelling time series method that is able to capture triple seasonalities.&nbsp; The Triple SARIMA model has been adapted for this purpose and competitive for modelling load.&nbsp;&nbsp;Using the least squares method to estimate the coefficients in a triple SARIMA model, followed by model building, model assumptions&nbsp; and comparing model criteria, we propose and demonstration&nbsp; the triple Seasonal Autoregressive Integrated Moving Average model <strong>&nbsp;</strong>with AIC 290631.9 and SBC 290674.2 as the best model for this study. The Triple seasonal ARIMA is one of the alternative strategy to propose accurate forecasts of&nbsp; electricity load Kalimantan data for planning, operation&nbsp; maintenance and&nbsp; market related activities.</p> Syalam Ali Wira Dinata Muhammad Azka Primadina Hasanah Suhartono Suhartono Moh Danil Hendry Gamal Copyright (c) 2021 Indonesian Journal of Statistics and Its Applications 2021-06-30 2021-06-30 5 2 243 259 10.29244/ijsa.v5i2p243-259 Analysis of Multiple Correspondence Against Crimes in Sleman Regency <p>Crime is bad behavior, from social and religious norms and it makes psychology and economics harm. Stealing, ill-treatment, embezzlement, deception, deception/embezzlement, and adultery are the most crime in the last 9 months. Therefore, for identify the type of crime in the community we need a method to see the tendency of a category using multiple correspondence analysis methods. Analysis of multiple correspondences is one of the descriptive statistics that use to describe a pattern of relationships from contingency’s table with the aim of finding liability between categories. The results of the correspondence analysis are that the tendency of criminal suspect to be related to this types of crime of stealing and ill-treatment to be done by students or students less than 25 years old and were male, suspect of deception and adultery tends to be done by women over 40 years old and does not work, and suspect of embezzlement tends by workers and their ages around 25 to 40 years. The liability of the relation between criminal incidents and the types of crime is the types of crime of ill-treatment and adultery that are most prone to occur in shops with vulnerable hours 00:00-05:59 and 18:00-23:59.</p> E Widodo R Maggandari Copyright (c) 2021 Indonesian Journal of Statistics and Its Applications 2021-06-30 2021-06-30 5 2 260 272 10.29244/ijsa.v5i2p260-272 Response Surface Model with Comparison of OLS Estimation and MM Estimation <p>Response Surface Method (RSM) is a collection of statistical techniques in the form of experiments and regression, as well as mathematics that is useful for developing, improving, and optimizing processes. In general, the determination of models in RSM is estimated by linear regression with Ordinary Least Square (OLS) estimation. However, OLS estimation is very weak in the presence of data identified as outliers, so in determining the RSM model a strong and resistant estimation is needed namely robust regression. One estimation method in robust regression is the Method of Moment (MM) estimation. This study aims to compare the OLS estimation and MM estimation method to get the optimal point of response in this case study. Comparison of the best estimation models using the parameters MSE and R^2 adj. The results of MM estimation give better results to the optimal response results in this case study.</p> Salsabila Basalamah Edy Widodo Copyright (c) 2021 Indonesian Journal of Statistics and Its Applications 2021-06-30 2021-06-30 5 2 273 283 10.29244/ijsa.v5i2p273-283 Forecasting Currency in East Java: Classical Time Series vs. Machine Learning <p>Most research about the inflow and outflow currency in Indonesia showed that these data contained both linear and nonlinear patterns with calendar variation effect. The goal of this research is to propose a hybrid model by combining ARIMAX and Deep Neural Network (DNN), known as hybrid ARIMAX-DNN, for improving the forecast accuracy in the currency prediction in East Java, Indonesia. ARIMAX is class of classical time series models that could accurately handle linear pattern and calendar variation effect. Whereas, DNN is known as a machine learning method that powerful to tackle a nonlinear pattern. Data about 32 denominations of inflow and outflow currency in East Java are used as case studies. The best model was selected based on the smallest value of RMSE and sMAPE at the testing dataset. The results showed that the hybrid ARIMAX-DNN model improved the forecast accuracy and outperformed the individual models, both ARIMAX and DNN, at 26 denominations of inflow and outflow currency. Hence, it can be concluded that hybrid classical time series and machine learning methods tend to yield more accurate forecasts than individual models, both classical time series and machine learning methods.</p> J A Putri Suhartono Suhartono H Prabowo N A Salehah D D Prastyo Setiawan Setiawan Copyright (c) 2021 Indonesian Journal of Statistics and Its Applications 2021-06-30 2021-06-30 5 2 284 303 10.29244/ijsa.v5i2p284-303 Sentiment Analysis on Overseas Tweets on the Impact of COVID-19 in Indonesia <p>This study aims to conduct analysis to determine the trend of sentiment on tweets about Covid-19 in Indonesia from the Twitter accounts overseas on big data perspective. The data was obtained from Twitter in the period of April 2020, with the word query "Indonesian Corona Virus" from foreign user accounts in English. The process of retrieving data comes from Twitter tweets by crawling the text using Twitter's API (Application Programming Interface) by employing Python programming language. Twitter was chosen because it is very fast and easy to spread through status updates from and among the user accounts. The number of tweets obtained was 8,740 in text format, with a total engagement of 217,316. The data was sorted from the tweets with the largest to smallest engagement, then cleaned from unnecessary fonts and symbols as well as typo words and abbreviations. The sentiment classification was carried out by analytical tools, extracting information with text mining, into positive, negative, and neutral polarity. To sharpen the analysis, the cleaned data was selected only with the largest engagement until those with 100 engagements; then was grouped into 30 sub-topics to be analyzed. The interesting facts are found that most tweets and sub-topics were dominated by the negative sentiment; and some unthinkable sub-topics were talked by many users.</p> Tigor Nirman Simanjuntak Setia Pramana Copyright (c) 2021 Indonesian Journal of Statistics and Its Applications 2021-06-30 2021-06-30 5 2 304 313 10.29244/ijsa.v5i2p304-313 Development of Automated Environmental Data Collection System and Environment Statistics Dashboard <p>Environmental data such as pollutants, temperature, and humidity are data that have a role in the agricultural sector in predicting rainfall conditions. In fact, pollutant data is common to be used as a proxy to see the density of industry and transportation. With this need, it is necessary to have automated data from outside websites that are able to provide data faster than satellite confirmation. Data sourced from IQair, can be used as a benchmark or confirmative data for weather and environmental statistics in Indonesia. Data is taken by scraping method on the website. Scraping is done on the API available on the website. Scraping is divided into 2 stages, the first is to determine the location in Indonesia, the second is to collect statistics such as temperature, humidity, and pollutant data (AQI). The module used in python is the scrapy module, where the crawling is effective starting from May 2020. The data is recorded every three hours for all regions of Indonesia and directly displayed by the Power BI-based dashboard. We also illustrated that AQI data can be used as a proxy for socio-economic activity and also as an indicator in monitoring green growth in Indonesia.</p> Dede Yoga Paramartha Ana Lailatul Fitriyani Setia Pramana Copyright (c) 2021 Indonesian Journal of Statistics and Its Applications 2021-06-30 2021-06-30 5 2 314 325 10.29244/ijsa.v5i2p314-325 Projection Pursuit Regression (PPR) on Statistical Downscaling Modeling for Daily Rainfall Forecasting <p>Rainfall forecasting has an important role in people's lives. Rainfall forecasting in Indonesia has complex problems because it is located in a tropical climate. Rainfall prediction in Indonesia is difficult due to the complex topography and interactions between the oceans, land and atmosphere. With these conditions, an accurate rainfall forecasting model on a local scale is needed, of course taking into account the information about the global atmospheric circulation obtained from the General Circulation Model (GCM) output. GCM may still be used to provide local or regional scale information by adding Statistical Downscaling (SD) techniques. SD is a regression-based model in determining the functional relationship between the response variable and the predictor variable. Rainfall observations obtained from the Meteorology Climatology and Geophysics Council (BMKG) are a response variable in this study. The predictor variable used in this study is the global climate output from GCM. This research was conducted in a place, namely Kupang City, East Nusa Tenggara because it has low rainfall. The Projection Pursuit Regression (PPR) will be used in this SD method for this study. In PPR modeling, optimization needs to be done and model validation is carried out with the smallest Root Mean Square Error (RMSE) criteria. The expected results must have a pattern between the results of forecasts and observations showing or approaching the observational data. The PPR model is a good model for predicting rainfall because The results of the forecast and observation show that the results of the rainfall forecast are observational data.</p> Rio Pradani Putra Dian Anggraeni Alfian Futuhul Hadi Copyright (c) 2021 Indonesian Journal of Statistics and Its Applications 2021-06-30 2021-06-30 5 2 326 332 10.29244/ijsa.v5i2p326-332 Online Marketplace Data to Figure COVID-19 Impact on Micro and Small Retailers in Indonesia <p>The COVID-19 outbreak is not only talking about health crises but also social and economic crises all over the world. In Indonesia, the outbreak has shaken almost all business sectors, however it seems to bring a silver lining for e-commerce sectors since the pandemic has developed online shopping habits. During the pandemic, the impact of COVID-19 on the Indonesian economy needs to be updated from time to time to be used on quick policymaking. Therefore, big data plays an important role to provide the information relatively fast. This paper aims to describe how big data i.e., marketplace data, could be used to figure the impact of COVID-19 outbreak on micro and small retailers in Indonesia. The dataset was collected regularly from a marketplace website in Indonesia from January to June 2020. To see the changing of sales during the COVID-19 period, the sales before and after social distancing policy implementation are compared. The result showed that the online marketplace in Indonesia is dominated by micro retailers based on the number of products sold in the marketplace. The total revenue of micro retailers gives a significant increase during the pandemic. Whereas for medium retailers, the increase in total revenue is seen to be lower than micro retailers’ total revenue. It indicates a positive sign for the growth of micro retailers in the online marketplace.</p> Dhiar Niken Larasati Usman Bustaman Setia Pramana Copyright (c) 2021 Indonesian Journal of Statistics and Its Applications 2021-06-30 2021-06-30 5 2 333 342 10.29244/ijsa.v5i2p333-342 LQ45 Stock Portfolio Selection using Black-Litterman Model in Pandemic Time Covid-19 <p>The world was shocked by the emergence of a virus that spread very quickly to several countries including Indonesia at the end of 2019. This virus infection is called Corona Virus Disease 2019 (Covid-19). The outbreak of Covid-19 not only threatens human lives but also disrupts various economic, financial, and business activities, especially in Indonesia. A stock portfolio is a collection of financial assets in a unit that is held or created by an investor, investment company, or financial institution. The Black-Litterman model of the stock portfolio is a portfolio model that involves the CAPM equilibrium return and investor views. The purpose of this study is to determine the stock portfolio with the Black-Litterman model using company data listed in the LQ45 stock index from January 2020 to June 2020. Four of the twenty-nine LQ45 stocks were selected as assets in the stock portfolio. The stock portfolio containing the four stocks, namely ICBP, KLBF, MNCN, and TLKM with the Black-Litterman&nbsp;model resulted in an expected return of 2.07% and a risk of 2.82%.</p> Siska Yosmar S Damayanti S Febrika Copyright (c) 2021 Indonesian Journal of Statistics and Its Applications 2021-06-30 2021-06-30 5 2 343 354 10.29244/ijsa.v5i2p343-354 Nowcasting Indonesia’s GDP Growth Using Machine Learning Algorithms <p>GDP is very important to be monitored in real time because of its usefulness for policy making. We built and compared the ML models to forecast real-time Indonesia's GDP growth. We used 18 variables that consist a number of quarterly macroeconomic and financial market statistics. We have evaluated the performance of six popular ML algorithms, such as Random Forest, LASSO, Ridge, Elastic Net, Neural Networks, and Support Vector Machines, in doing real-time forecast on GDP growth from 2013:Q3 to 2019:Q4 period. We used the RMSE, MAD, and Pearson correlation coefficient as measurements of forecast accuracy. The results showed that the performance of all these models outperformed AR (1) benchmark. The individual model that showed the best performance is random forest. To gain more accurate forecast result, we run forecast combination using equal weighting and lasso regression. The best model was obtained from forecast combination using lasso regression with selected ML models, which are Random Forest, Ridge, Support Vector Machine, and Neural Network.</p> Nadya Dwi Muchisha Novian Tamara Andriansyah Andriansyah Agus M Soleh Copyright (c) 2021 Indonesian Journal of Statistics and Its Applications 2021-06-30 2021-06-30 5 2 355 368 10.29244/ijsa.v5i2p355-368 Clustering with Euclidean Distance, Manhattan - Distance, Mahalanobis - Euclidean Distance, and Chebyshev Distance with Their Accuracy <p>There are several algorithms to solve many problems in grouping data. Grouping data is also known as clusterization, clustering takes advantage to solve some problems especially in business. In this note, we will modify the clustering algorithm based on distance principle which background of K-means algorithm (Euclidean distance). Manhattan, Mahalanobis-Euclidean, and Chebyshev distance will be used to modify the K-means algorithm. We compare the clustered&nbsp; result related to their accuracy, we got Mahalanobis - Euclidean distance gives the best accuracy on our experiment data, and some results are also given in this note.</p> Said Al Afghani Widhera Yoza Mahana Putra Copyright (c) 2021 Indonesian Journal of Statistics and Its Applications 2021-06-30 2021-06-30 5 2 369 376 10.29244/ijsa.v5i2p369-376 Ensemble Learning For Television Program Rating Prediction <p>Rating is one of the most frequently used metrics in the television industry to evaluate television programs or channels. This research is an attempt to develop a prediction model of television program ratings using rating data gathered from UseeTV (interned-based television service from Telkom Indonesia). The machine learning methods (Random Forest and Extreme Gradient Boosting) were tried out utilizing a set of rating data from 20 television programs collected from January 2018 to August 2019 (train dataset) and evaluated using September 2019 rating data (test dataset). Research results show that Random Forest gives a better result than Extreme Gradient Boosting based on evaluation metrics: Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and Mean Absolute Percentage Error (MAPE). On the training dataset, prediction using Random Forest produced lower RMSE and MAE scores than Extreme Gradient Boosting in all programs, while on the testing dataset, Random Forest produced lower RMSE and MAE scores in 16 programs compared with Extreme Gradient Boosting. According to MAPE score, Random Forest produced more good quality prediction (4 programs in the training dataset, 16 programs in the testing dataset) than Extreme Gradient Boosting method (1 program in the training dataset, 12 programs in the testing dataset) both in training and testing dataset.</p> Iqbal Hanif Regita Fachri Septiani Copyright (c) 2021 Indonesian Journal of Statistics and Its Applications 2021-06-30 2021-06-30 5 2 377 395 10.29244/ijsa.v5i2p377-395 Classification of Bidikmisi Scholarship Acceptance using Neural Network Based on Hybrid Method of Genetic Algorithm <p class="Abstract" style="margin-bottom: 28.4pt;"><span lang="EN-GB">A Neural network is a series of algorithms that endeavours to recognize underlying relationships in a set of data through processes that mimic the way human brains operate. In the case of classification, this method can provide a fit model through various factors, such as the variety of the optimal number of hidden nodes, the variety of relevant input variables, and the selection of optimal connection weights. One popular method to achieve the optimal selection of connection weights is using a Genetic Algorithm (GA), the basic concept is to iterate over Darwin's evolution. This research presents the Neural Network method with the Backpropagation Neural Network (BPNN) and the combined method of BPNN with GA, where GA is used to initialize and optimize the connection weight of BPNN. Based on accuracy value, the BPNN method combined with GA provides better classification, which is 90.51%, in the case of Bidikmisi Scholarship classification in East Java.</span></p> N Cahyani Sinta Septi Pangastuti K Fithriasari Irhamah Irhamah N Iriawan Copyright (c) 2021 Indonesian Journal of Statistics and Its Applications 2021-06-30 2021-06-30 5 2 396 404 10.29244/ijsa.v5i2p396-404 Estimation of Value at Risk by Using GJR-GARCH Copula Based on Block Maxima <p>This paper will discuss the risk estimation of a portfolio based on value at risk (VaR) using a copula-based asymmetric Glosten – Jagannathan – Runkle - Generalized Autoregressive Conditional Heteroskedasticity (GJR-GARCH). There is non-linear correlation for dependent model structure among the variables that lead to the inaccurate VaR estimation so that we use copula functions to model the joint probability of large market movements. Data is GEV distributed. Therefore, we use Block Maxima consisting of fitting an extreme value distribution as a tail distribution to count VaR. The results show VaR can estimate the risk of portfolio return reasonably because the model has captured the data properties. Data volatility can be accommodated by GJR-GARCH, Copula can capture dependence between stocks, and Block maxima can accommodate extreme tail behavior of the data.</p> Hasna Afifah Rusyda Fajar Indrayatna Lienda Noviyanti Copyright (c) 2021 Indonesian Journal of Statistics and Its Applications 2021-06-30 2021-06-30 5 2 405 414 10.29244/ijsa.v5i2p405-414