Xplore: Journal of Statistics

Peramalan Harga Batu Bara Acuan Menggunakan Metode Autoregressive Integrated Moving Average dan Fungsi Transfer

Suci Pujiani Prahesti — 2023-01-15

Indonesia as one of the largest coal producing countries in the world has an
important role in coal global demand. Currently, most countries in Europe are
turning to coal as a source of electricity. This is due to Covid-19 pandemic and the
conflict between Russia and Ukraine which endangers energy sources. Therefore,
forecasting coal prices in the future is needed to determine the right policy in
dealing with the large demand for coal. Coal price fluctuation are influenced by
several factors such as the prices of the other commodities instance natural gas price.
The natural gas price factor will be modeled in coal price forecasting using the
transfer function method as the input series. This study compares the ARIMA and
Transfer Function in coal price forecasting. The results showed that MAPE values
of ARIMA and transfer function method are 23,14% and 17,66%. Based on MAPE
values that forecasting using the transfer function method has a better ability than
ARIMA method in forecasting coal prices.

Penerapan Metode Generalized Auto-Regressive Conditional Heteroscedasticity untuk Peramalan Harga Minyak Mentah Dunia

Putri Zainal — 2023-01-15

Crude oil is one of the commodities that are needed in various fields. World crude oil prices that continue to fluctuate, of course, have a big influence on the country's economy. Crude oil price data collected is time series or the collection process is carried out from time to time with monthly periods. Therefore, we need a system that can forecast future world crude oil prices which are expected to be taken into consideration by the government for decision making. One method that can be used to predict world crude oil prices is ARIMA (Auto-Regressive Integrated Moving Average) and GARCH (Generalized Auto-Regressive Conditional Heteroskedasticity) model. After modeling, it is proven that the world crude oil price data for the period January 2002 to June 2022 has a heteroscedasticity effect that cannot be overcome if only using the ARIMA model. The results of data processing show that the ARIMA (0,1,2) followed by the ARCH (2) is the best model with a MAPE value of 5,32%. The accuracy values obtained are classifield as very good for forecasting world crude oil prices.

Perbandingan Metode Hot-deck, Regression dan K-Nearest Neighbor Imputation dalam Pendugaan Data Hilang pada Dapodik Tahun 2020

Inayatul Izzati Diana Yusuf — 2023-01-15

Data Pokok Pendidikan (Dapodik) is a nation-wide data collection system that contains data on education units. Missing value in

Dapodik cause the loss of important information. To solve this problem can use imputation. Imputation is a procedure to predict the missing value with a certain method. This study aims to compare three imputation methods which are Hot-deck imputation, Regression Imputation and K-Nearest Neighbor imputation (KNNI). Simulation for generating missing value was carried out by dividing the percentage of 2%, 3%, 4% and 5%, then imputed with the three methods. The best model is determined based on the lowest value of RMSE and MAPE. The best imputation method based on the lowest RMSE and MAPE values is a regression imputation

Perbandingan Performa Metode Pohon Model Logistik dan Random Forest pada Pengklasifikasian Data

Purnama Sari — 2023-01-15

Multicollinearity and missing data are two common problems in big data. Missing data could decrease the prediction accuracy. Logistic model tree (LMT) is used to handle multicollinearity because multicollinearity does not affect the decision tree. Random forest can be used to decrease variance in prediction case. This study aimed to study the comparison of two methods, LMT and random forest, in multicollinearity and missing data in various cases using simulation study and real data as dataset. Evaluation model is based on classification accuracy and AUC measurement. The result stated that random forest had better performance if the multicollinearity level is moderate. LMT with omitted missing data is proven to have better performance for big data and when a high percentage of missing data occurred, and the multicollinearity level is severe. The next step is analysed real data with different sample size. The result stated that random forest have better performance. Omitted missing data have better performance in classification â€œbreast cancerâ€ data which consist 0,3 % missing data.

Penerapan Bernoulli Naïve Bayes untuk Analisis Sentimen Pengguna Twitter terhadap Layanan Online Food Delivery di Indonesia

Dea Fisyahri Akhilah Putri — 2023-01-15

Online food delivery is one of the drivers of the digital economy that all societies today are interested in. The trend of these services has intensified as changes in people's behavior and lifestyle in the Covid-19 pandemic. The digital platforms of food delivery services in Indonesia are GoFood, ShopeeFood, and GrabFood, present ease in both competitive transactions and multiple options by consumers. Its widespread use of these platforms certainly generates a variety of reviews and public opinion; one is through tweets on Twitter. This study aims to classify the sentiments on the various reviews into the label of positive and negative sentiments using the Bernoulli Naïve Bayes algorithm. The majority of reviews from March 15, 2022 to March 30, 2022 were positive sentiments, which indicated that people gave a positive impression during these online food delivery service. The results of this study show that Bernoulli Naïve Bayes with the feature selection of information gain generates a good performance in classifying sentiment labels based on accuracy scores obtained at 89%, 87%, 86%, and 85% in all data and each online food delivery platform (GoFood, ShopeeFood, and GrabFood).

Algoritme Support Vector Machine untuk Analisis Sentimen Berbasis Aspek Ulasan Game Online Mobile Legends: Bang-Bang

Mar Atul Aji Tyas Utami — 2023-01-15

The presence of the digital technology era is facilitated by an internet connection that is easily accessible and provides many features and entertainment, one of which is online games. Mobile Legends: Bang-Bang is a Multiplayer Online Battle Arena (MOBA)-type online game that has been popular since its launch in 2016. Currently, Mobile Legends: Bang-Bang is still the top free game on the Google Play Store. This popularity is inseparable from user reviews that provide different information and sentiment. This research will identify the sentiment of application user reviews based on aspects of gameplay, performance, visualization, and player. The classification method used in this study is the Support Vector Machine (SVM). The online game application Mobile Legends: Bang-Bang tends to have negative sentiment from aspects of gameplay, performance, and player. However, from the visualization aspect, they tend to have positive sentiment. The results of the evaluation of the model based on the value of accuracy, F1-score, and AUC, it was found that the gameplay, Performance, and Player aspects gave better classification results than the Visualization aspect.

Identifikasi Peubah yang Berpengaruh terhadap Ketidaklulusan Mahasiswa Program Sarjana BUD IPB dengan Regresi Logistik Biner

Mahdiyah Riaesnianda — 2023-01-15

One of the entrances available at the Bogor Agricultural University (IPB) is the Regional Representatives Scholarship (BUD). Not all BUD IPB students were able to complete their studies because they dropped out (DO) or resigned. One of the efforts that IPB can do to reduce the dropout rate for BUD IPB students is to find out the variables that affect the failure of BUD IPB students. The variables that influence the failure of BUD IPB students are analyzed by binary logistic regression. There is an imbalance of data classes in the response variables so that the method that can be used to overcome this is the Synthetic Minority Over-Sampling Technique (SMOTE). The classification model with SMOTE resulted in a higher average sensitivity than the model without SMOTE from 10,66% to 61,91%. This confirms that the model with SMOTE is better at predicting the minority class (BUD IPB students who do not pass). The variables that affect the failure of BUD IPB students are gender, school status of origin, study program groups, the presence or absence of Pre-University Programs (PPU), type of sponsor, average report cards, and GPA in the Joint Preparation Stage (TPB) or General Competency Education Program (PPKU).

Pemodelan Tingkat Kriminalitas di Indonesia Menggunakan Analisis Geographically Weighted Panel Regression

Endah Febrianti — 2023-01-15

Crime is one of the socio-economic problems that Indonesia has not yet resolved. Although Indonesia is categorized as a safe country to visit, in reality, there are still many Indonesian people who experience crime. The resolution of this socio-economic problem is very important because it involves the safety and comfort of the community. This study aims to identify the factors that influence the crime rate in Indonesia and determine the best model for each province by comparing the panel data regression model and the Geographically Weighted Panel Regression (GWPR) model. This research data consists of 34 provinces in Indonesia from 2016 to 2020. The analysis used is panel data regression analysis and GWPR. The result is that the adaptive kernel gaussian GWPR is the best model with of 69,89% and AIC of 167,4585. The GWPR modeling produces model equations and significant variables for each province. In general, five variables have a significant effect on the crime rate, namely percentage of poor population, open unemployment rate, Gross Regional Domestic Product at the constant price per capita, human development index, and mean years of schooling.

Klasifikasi Kadar Glukosa Darah Keluaran Alat Non-invasif Menggunakan Regresi Logistik Ordinal dengan Peringkasan Luas

Yuniar Istiqomah — 2023-01-15

Diabetes Mellitus (DM) is the silent killer because its symptoms tend to go unnoticed. The IPB Non-Invasive Biomarking Team developed a non-invasive monitoring device to check blood. The tool uses the spectroscopy principle and produces an output in the form of a residual value of light intensity. A method is needed to predict the category of blood glucose levels based on the measurement results of non-invasive tools. Classification modeling is one of the methods that can be used to analyze the relationship between the blood glucose level class of invasive measurement results and the residual value of the intensity of non-invasive measurement results. One of the commonly used classification methods is ordinal logistic regression. Light spectrum-based data used as predictor X changes often provide changes that correlate with each other. The principal component analysis reduces its dimensions to become a new set of changes that do not correlate. Graph area summation in the period is the best summarization method because it can take advantage of the general data information. This study uses the ordinal logistic regression method as a modeling method by applying principal component analysis and graph area summation applied to 2017 data and 2019 data. Classification modeling in the 2017 data had a balanced accuracy value of 64,64%. Classification modeling in the 2019 data produced a balanced accuracy value of 57,57%. The design used in the 2017 tool and the 2019 tool is different, causing the residual intensity graph of the non-invasive measurement results to be read differently. The 2017 data model is better applied to homogeneous data and the 2019 data model is better applied to heterogeneous data.

Penggerombolan Data Panel Emiten Sektor Pertambangan Selama Pandemi Covid-19

Nadhif Nursyahban — 2023-01-15

The Covid-19 pandemic has made people start looking for new income, one of which
is stock investment. Mining Stock recorded the highest sectoral index increase in 2020.
The high increase in the mining sector index doesnâ€™t indicate all of the stocks have a
good performance. Clustering data of mining stock can help to see which stock has the
best performance. Variables used in clustering are technical factors with details: return,
trading volume, transaction frequency, bid volume, and foreign buy. Data in this research
is longitudinal data from March 2020 until January 2022 and the clustering technique
used is k-means. Clustering on outliers data and non-outliers data is done separately.
Definition of outliers is exploratively with biplot analysis. Clustering on outliers data
results obtained are five clusters and clustering on non-outliers data results obtained are
two clusters. Best cluster is cluster who obtained ANTM because has highest value in
return, transaction frequency, and foreign buy.