Xplore: Journal of Statistics https://journal.stats.id/index.php/xplore <p><strong>Xplore: Journal of Statistics</strong> <strong>(</strong><strong><a href="http://u.lipi.go.id/1543898192">eISSN: 2655-2744</a></strong><strong>) </strong>adalah salah satu jurnal yang dikelola oleh <a href="http://www.stat.ipb.ac.id/"><strong>Department of Statistics, IPB University</strong>,</a> berkolaborasi dengan "<strong>Forum Pendidikan Tinggi Statistika Indonesia (<a href="https://forstat.org/jurnal/">FORSTAT</a>)</strong>" and "<strong>Ikatan Statistisi Indonesia (<a href="http://isi-indonesia.org/isi/frontend/web/jurnal-ilmiah">ISI</a>)</strong>".</p> <p><strong>Xplore: <em>Journal of Statistics</em> </strong>diterbitkan berkala 3 (tiga) kali dalam setahun yang memuat tulisan ilmiah yang berhubungan dengan bidang statistika. Artikel yang dimuat berupa hasil penelitian atau kajian pustaka dalam bidang statistika dan atau penerapannya.</p> <p><a href="http://u.lipi.go.id/1348816435" target="_blank" rel="noopener">ISSN: 2302-5751.</a></p> <p>Mulai Desember 2018, Xplore: Journal of Statistics mendapatkan ISSN baru untuk media online (eISSN:2655-2744) sesuai dengan SK no. 0005.26552744/JI.3.1/SK.ISSN/2018.12 - 13 Desember 2018. Maka sesuai ketentuan pada SK tersebut, edisi Xplore: Journal of Statistics mulai Desember 2018 akan dimulai menjadi Volume 7 dan No 3. </p> <p><a href="http://u.lipi.go.id/1543898192" target="_blank" rel="noopener">eISSN: 2655-2744</a></p> Department of Statistics, IPB en-US Xplore: Journal of Statistics 2302-5751 Peramalan Harga Batu Bara Acuan Menggunakan Metode Autoregressive Integrated Moving Average dan Fungsi Transfer https://journal.stats.id/index.php/xplore/article/view/1100 <p>Indonesia as one of the largest coal producing countries in the world has an <br>important role in coal global demand. Currently, most countries in Europe are <br>turning to coal as a source of electricity. This is due to Covid-19 pandemic and the <br>conflict between Russia and Ukraine which endangers energy sources. Therefore, <br>forecasting coal prices in the future is needed to determine the right policy in <br>dealing with the large demand for coal. Coal price fluctuation are influenced by <br>several factors such as the prices of the other commodities instance natural gas price. <br>The natural gas price factor will be modeled in coal price forecasting using the <br>transfer function method as the input series. This study compares the ARIMA and <br>Transfer Function in coal price forecasting. The results showed that MAPE values <br>of ARIMA and transfer function method are 23,14% and 17,66%. Based on MAPE <br>values that forecasting using the transfer function method has a better ability than <br>ARIMA method in forecasting coal prices.</p> Suci Pujiani Prahesti Itasia Dina Sulvianti Yenni Angraini Copyright (c) 2023 Xplore: Journal of Statistics 2023-01-15 2023-01-15 12 1 1 11 10.29244/xplore.v12i1.1100 Penerapan Metode Generalized Auto-Regressive Conditional Heteroscedasticity untuk Peramalan Harga Minyak Mentah Dunia https://journal.stats.id/index.php/xplore/article/view/1096 <p>Crude oil is one of the commodities that are needed in various fields. World crude oil prices that continue to fluctuate, of course, have a big influence on the country's economy. Crude oil price data collected is time series or the collection process is carried out from time to time with monthly periods. Therefore, we need a system that can forecast future world crude oil prices which are expected to be taken into consideration by the government for decision making. One method that can be used to predict world crude oil prices is ARIMA (Auto-Regressive Integrated Moving Average) and GARCH (Generalized Auto-Regressive Conditional Heteroskedasticity) model. After modeling, it is proven that the world crude oil price data for the period January 2002 to June 2022 has a heteroscedasticity effect that cannot be overcome if only using the ARIMA model. The results of data processing show that the ARIMA (0,1,2) followed by the ARCH (2) is the best model with a MAPE value of 5,32%. The accuracy values obtained are classifield as very good for forecasting world crude oil prices.</p> Putri Zainal Yenni Angraini Akbar Rizki Copyright (c) 2023 Xplore: Journal of Statistics 2023-01-15 2023-01-15 12 1 12 21 10.29244/xplore.v12i1.1096 Perbandingan Metode Hot-deck, Regression dan K-Nearest Neighbor Imputation dalam Pendugaan Data Hilang pada Dapodik Tahun 2020 https://journal.stats.id/index.php/xplore/article/view/1056 <div><span lang="EN-US">Data Pokok Pendidikan (</span><span lang="IN">D</span><span lang="EN-US">apodik)&nbsp;</span>is a nation-wide data collection system&nbsp;<span lang="EN-US">that contains data on education units. Missing value&nbsp;</span>in</div> <div><span lang="EN-US">Dapodik&nbsp;</span>cause the loss of&nbsp;<span lang="EN-US">important </span><span lang="IN">information</span>.&nbsp;<span lang="IN">To solve this problem </span><span lang="EN-US">can use&nbsp;</span>imputation.&nbsp;<span lang="IN">Imputation is a procedure to predict the missing value with a certain method</span>. This study aims to compare three imputation methods which are&nbsp;H<span lang="IN">ot-deck&nbsp;</span><span lang="EN-US">imputation</span><span lang="IN">,&nbsp;</span>Regression Imputation and&nbsp;<span lang="IN">K-Nearest Neighbor imputation (KNNI)</span>. Simulation for generating missing value was carried out by dividing the percentage of &nbsp;2%, 3%, 4% and 5%, then imputed with the three methods. The best model is determined based on the lowest value of RMSE and MAPE. The best imputation method based on the lowest RMSE and MAPE values is a regression imputation</div> Inayatul Izzati Diana Yusuf Budi Susetyo La Ode Abdul Rahman Copyright (c) 2023 Xplore: Journal of Statistics 2023-01-15 2023-01-15 12 1 22 35 10.29244/xplore.v12i1.1056 Perbandingan Performa Metode Pohon Model Logistik dan Random Forest pada Pengklasifikasian Data https://journal.stats.id/index.php/xplore/article/view/858 <p>Multicollinearity and missing data are two common problems in big data. Missing data could decrease the prediction accuracy. Logistic model tree (LMT) is used to handle multicollinearity because multicollinearity does not affect the decision tree. Random forest can be used to decrease variance in prediction case. This study aimed to study the comparison of two methods, LMT and random forest, in multicollinearity and missing data in various cases using simulation study and real data as dataset. Evaluation model is based on classification accuracy and AUC measurement. The result stated that random forest had better performance if the multicollinearity level is moderate. LMT with omitted missing data is proven to have better performance for big data and when a high percentage of missing data occurred, and the multicollinearity level is severe. The next step is analysed real data with different sample size. The result stated that random forest have better performance. Omitted missing data have better performance in classification “breast cancer” data which consist 0,3 % missing data.</p> Purnama Sari Kusman Sadik Mulianto Raharjo Copyright (c) 2023 Xplore: Journal of Statistics 2023-01-15 2023-01-15 12 1 36 49 10.29244/xplore.v12i1.858 Penerapan Bernoulli Naïve Bayes untuk Analisis Sentimen Pengguna Twitter terhadap Layanan Online Food Delivery di Indonesia https://journal.stats.id/index.php/xplore/article/view/1110 <p>Online food delivery is one of the drivers of the digital economy that all societies today are interested in. The trend of these services has intensified as changes in people's behavior and lifestyle in the Covid-19 pandemic. The digital platforms of food delivery services in Indonesia are GoFood, ShopeeFood, and GrabFood, present ease in both competitive transactions and multiple options by consumers. Its widespread use of these platforms certainly generates a variety of reviews and public opinion; one is through tweets on Twitter. This study aims to classify the sentiments on the various reviews into the label of positive and negative sentiments using the Bernoulli Naïve Bayes algorithm. The majority of reviews from March 15, 2022 to March 30, 2022 were positive sentiments, which indicated that people gave a positive impression during these online food delivery service. The results of this study show that Bernoulli Naïve Bayes with the feature selection of information gain generates a good performance in classifying sentiment labels based on accuracy scores obtained at 89%, 87%, 86%, and 85% in all data and each online food delivery platform (GoFood, ShopeeFood, and GrabFood).</p> Dea Fisyahri Akhilah Putri Ir. Mohammad Masjkur, M.S. Indahwati Indahwati Copyright (c) 2023 Xplore: Journal of Statistics 2023-01-15 2023-01-15 12 1 50 62 10.29244/xplore.v12i1.1110 Algoritme Support Vector Machine untuk Analisis Sentimen Berbasis Aspek Ulasan Game Online Mobile Legends: Bang-Bang https://journal.stats.id/index.php/xplore/article/view/1064 <p>The presence of the digital technology era is facilitated by an internet connection that is easily accessible and provides many features and entertainment, one of which is online games. Mobile Legends: Bang-Bang is a Multiplayer Online Battle Arena (MOBA)-type online game that has been popular since its launch in 2016. Currently, Mobile Legends: Bang-Bang is still the top free game on the Google Play Store. This popularity is inseparable from user reviews that provide different information and sentiment. This research will identify the sentiment of application user reviews based on aspects of gameplay, performance, visualization, and player. The classification method used in this study is the Support Vector Machine (SVM). The online game application Mobile Legends: Bang-Bang tends to have negative sentiment from aspects of gameplay, performance, and player. However, from the visualization aspect, they tend to have positive sentiment. The results of the evaluation of the model based on the value of accuracy, F1-score, and AUC, it was found that the gameplay, Performance, and Player aspects gave better classification results than the Visualization aspect.</p> Mar Atul Aji Tyas Utami Pika Silvianti Muhammad Masjkur Copyright (c) 2023 Xplore: Journal of Statistics 2023-01-15 2023-01-15 12 1 63 77 10.29244/xplore.v12i1.1064 Identifikasi Peubah yang Berpengaruh terhadap Ketidaklulusan Mahasiswa Program Sarjana BUD IPB dengan Regresi Logistik Biner https://journal.stats.id/index.php/xplore/article/view/1055 <p>One of the entrances available at the Bogor Agricultural University (IPB) is the Regional Representatives Scholarship (BUD). Not all BUD IPB students were able to complete their studies because they <em>dropped out</em> (DO) or resigned. One of the efforts that IPB can do to reduce the dropout rate for BUD IPB students is to find out the variables that affect the failure of BUD IPB students. The variables that influence the failure of BUD IPB students are analyzed by binary logistic regression. There is an imbalance of data classes in the response variables so that the method that can be used to overcome this is the Synthetic Minority Over-Sampling Technique (SMOTE). The classification model with SMOTE resulted in a higher average sensitivity than the model without SMOTE from 10,66% to 61,91%. This confirms that the model with SMOTE is better at predicting the minority class (BUD IPB students who do not pass). The variables that affect the failure of BUD IPB students are gender, school status of origin, study program groups, the presence or absence of Pre-University Programs (PPU), type of sponsor, average report cards, and GPA in the Joint Preparation Stage (TPB) or General Competency Education Program (PPKU).</p> Mahdiyah Riaesnianda Aam Alamudi Agus Soleh Septian Rahardiantoro Copyright (c) 2023 Xplore: Journal of Statistics 2023-01-15 2023-01-15 12 1 78 90 10.29244/xplore.v12i1.1055 Pemodelan Tingkat Kriminalitas di Indonesia Menggunakan Analisis Geographically Weighted Panel Regression https://journal.stats.id/index.php/xplore/article/view/950 <p>Crime is one of the socio-economic problems that Indonesia has not yet resolved. Although Indonesia is categorized as a safe country to visit, in reality, there are still many Indonesian people who experience crime. The resolution of this socio-economic problem is very important because it involves the safety and comfort of the community. This study aims to identify the factors that influence the crime rate in Indonesia and determine the best model for each province by comparing the panel data regression model and the Geographically Weighted Panel Regression (GWPR) model. This research data consists of 34 provinces in Indonesia from 2016 to 2020. The analysis used is panel data regression analysis and GWPR. The result is that the adaptive kernel gaussian GWPR is the best model with &nbsp;of 69,89% and AIC of 167,4585. The GWPR modeling produces model equations and significant variables for each province. In general, five variables have a significant effect on the crime rate, namely percentage of poor population, open unemployment rate, Gross Regional Domestic Product at the constant price per capita, human development index, and mean years of schooling.</p> Endah Febrianti Budi Susetyo Pika Silvianti Copyright (c) 2023 Xplore: Journal of Statistics 2023-01-15 2023-01-15 12 1 91 109 10.29244/xplore.v12i1.950 Klasifikasi Kadar Glukosa Darah Keluaran Alat Non-invasif Menggunakan Regresi Logistik Ordinal dengan Peringkasan Luas https://journal.stats.id/index.php/xplore/article/view/1078 <p>Diabetes Mellitus (DM) is the silent killer because its symptoms tend to go unnoticed. The IPB Non-Invasive Biomarking Team developed a non-invasive monitoring device to check blood. The tool uses the spectroscopy principle and produces an output in the form of a residual value of light intensity. A method is needed to predict the category of blood glucose levels based on the measurement results of non-invasive tools. Classification modeling is one of the methods that can be used to analyze the relationship between the blood glucose level class of invasive measurement results and the residual value of the intensity of non-invasive measurement results. One of the commonly used classification methods is ordinal logistic regression. Light spectrum-based data used as predictor X changes often provide changes that correlate with each other. The principal component analysis reduces its dimensions to become a new set of changes that do not correlate. Graph area summation in the period is the best summarization method because it can take advantage of the general data information. This study uses the ordinal logistic regression method as a modeling method by applying principal component analysis and graph area summation applied to 2017 data and 2019 data. Classification modeling in the 2017 data had a balanced accuracy value of 64,64%. Classification modeling in the 2019 data produced a balanced accuracy value of 57,57%. The design used in the 2017 tool and the 2019 tool is different, causing the residual intensity graph of the non-invasive measurement results to be read differently. The 2017 data model is better applied to homogeneous data and the 2019 data model is better applied to heterogeneous data.</p> Yuniar Istiqomah Erfiani Erfiani Utami Dyah Syafitri Copyright (c) 2023 Xplore: Journal of Statistics 2023-01-15 2023-01-15 12 1 110 121 10.29244/xplore.v12i1.1078 Penggerombolan Data Panel Emiten Sektor Pertambangan Selama Pandemi Covid-19 https://journal.stats.id/index.php/xplore/article/view/948 <p>The Covid-19 pandemic has made people start looking for new income, one of which<br>is stock investment. Mining Stock recorded the highest sectoral index increase in 2020.<br>The high increase in the mining sector index doesn’t indicate all of the stocks have a<br>good performance. Clustering data of mining stock can help to see which stock has the<br>best performance. Variables used in clustering are technical factors with details: return,<br>trading volume, transaction frequency, bid volume, and foreign buy. Data in this research<br>is longitudinal data from March 2020 until January 2022 and the clustering technique<br>used is k-means. Clustering on outliers data and non-outliers data is done separately.<br>Definition of outliers is exploratively with biplot analysis. Clustering on outliers data<br>results obtained are five clusters and clustering on non-outliers data results obtained are<br>two clusters. Best cluster is cluster who obtained ANTM because has highest value in<br>return, transaction frequency, and foreign buy.</p> Nadhif Nursyahban Aam Alamudi Farit Mochamad Afendi Copyright (c) 2023 Xplore: Journal of Statistics 2023-01-15 2023-01-15 12 1 122 133 10.29244/xplore.v12i1.948