Indonesian Journal of Statistics and Its Applications 2021-03-31T12:03:15+07:00 Agus M Soleh Open Journal Systems <p><strong>Indonesian Journal of Statistics and Its Applications (<a href=";1510202061&amp;1&amp;&amp;2017">eISSN:2599-0802</a>) (formerly named <a href="" target="_blank" rel="noopener">Forum Statistika dan Komputasi</a>), </strong><strong>established since 2017</strong><strong>, </strong>publishes scientific papers in the area of statistical science and the applications. The published papers should be research papers with, but not limited to, the following topics: experimental design and analysis, survey methods and analysis, operation research, data mining, statistical modeling, computational statistics, time series and econometrics, and statistics education. All papers were reviewed by peer reviewers consisting of experts and academicians across universities and agencies. This journal is <strong>nationally accredited (SINTA 3)</strong> by Directorate General of Research and Development Strengthening (DGRDS), Ministry of Research, Technology and Higher Education of the Republic of Indonesia No.: <a href="" target="_blank" rel="noopener">14/E/KPT/2019, dated 10 May 2019</a>. </p> <p><strong>Scope:</strong><br />Indonesian Journal of Statistics and Its Applications is a refereed journal committed to Statistics and its applications.</p> <p><strong>Issue</strong><em> </em><strong>Released</strong>: <em>March (No 1), July (No 2), and November (No 3). </em></p> Handling of Overdispersion in the Poisson Regression Model with Negative Binomial for the Number of New Cases of Leprosy in Java 2021-01-01T19:24:28+07:00 Yopi Ariesia Ulfa Agus M Soleh Bagus Sartono <p>Based on data from the Directorate General of Disease Prevention and Control of the Ministry of Health of the Republic of Indonesia, in 2017, new leprosy cases that emerged on Java Island were the highest in Indonesia compared to the number of events on other islands. The purpose of this study is to compare Poisson regression to a negative binomial regression model to be applied to the data on the number of new cases of leprosy and to find out what explanatory variables have a significant effect on the number of new cases of leprosy in Java. This study's results indicate that a negative binomial regression model can overcome the Poisson regression model's overdispersion. Variables that significantly affect the number of new cases of leprosy based on the results of negative binomial regression modeling are total population, percentage of children under five years who had immunized with BCG, and percentage of the population with sustainable access to clean water.</p> 2021-03-31T00:00:00+07:00 Copyright (c) 2021 Indonesian Journal of Statistics and Its Applications Comparison of Functional Regression and Functional Principal Component Regression for Estimating Non-Invasive Blood Glucose Level 2021-01-01T18:51:22+07:00 Nurul Fadhilah Erfiani Erfiani Indahwati Indahwati <p>The calibration method is an alternative method that can be used to analyze the relationship between invasive and non-invasive blood glucose levels. Calibration modeling generally has a large dimension and contains multicolinearities because usually in functional data the number of independent variables (p) is greater than the number of observations (p&gt;n). Both problems can be overcome using Functional Regression (FR) and Functional Principal Component Regression (FPCR). FPCR is based on Principal Component Analysis (PCA). In FPCR, the data is transformed using a polynomial basis before data reduction. This research tried to model the equations of spectral calibration of voltage value excreted by non-invasive blood glucose level monitoring devices to predict blood glucose using FR and FPCR. This study aimed to determine the best calibration model for measuring non-invasive blood glucose levels with the FR and FPCR. The results of this research showed that the FR model had a bigger coefficient determination (R2) value and lower Root Mean Square Error (RMSE) and Root Mean Square Error Prediction (RMSEP) value than the FPCR model, which was 12.9%, 5.417, and 5.727 respectively. Overall, the calibration modeling with the FR model is the best model for estimate blood glucose level compared to the FPCR model.</p> 2021-03-31T00:00:00+07:00 Copyright (c) 2021 Indonesian Journal of Statistics and Its Applications Improving Classification Model Performances using an Active Learning Method to Detect Hate Speech in Twitter 2020-11-25T03:25:49+07:00 Muhammad Ilham Abidin Khairil Anwar Notodiputro Bagus Sartono <p>Efforts from the police to address hate speech on social media such as Twitter will not be sufficient to rely solely on manual checks. Therefore, it is necessary to use statistical modelling like the classification model to detect hate speech automatically. Classification is a type of predictive modelling to produce accurate predictions based on labelled data. Generally, the available data are usually unlabelled implying that the labelling process needs to be done beforehand. Data labelling is time consuming, high cost, and often fails to produce correct labels. This research aims to improve the performances of classification models by adding a small amount of data through the so called active learning method. The results showed that there was no significant difference in the performances of logistic regression and naïve bayes classification models in detecting hate speech. However, the results also showed that adding data through the active learning method substantially improved the logistics regression performance in detecting hate speech when compared to data addition based on a simple random sampling method. Therefore, the performances of classification models in detecting hate speech on Twitter could be improved by using an active learning method.</p> 2021-03-31T00:00:00+07:00 Copyright (c) 2021 Indonesian Journal of Statistics and Its Applications ARFIMA Modelling for Tectonic Earthquakes in The Maluku Region 2021-03-18T12:45:01+07:00 Ferry Kondo Lembang Lexy Janzen Sinay Asrul Irfanullah <p>Maluku Province is one of the regions in Indonesia with a very active and very prone earthquake intensity because it is a meeting place for 3 (three) plates, namely the Eurasian, Pacific and Australian plates. In the last 100 years, the history of tectonic earthquakes with tsunamis that occurred in Indonesia was 25-30% occurring in the Maluku Sea and Banda Sea. Based on this fact, this study aims to analyze the incidence of tectonic earthquakes that occurred in the Maluku region and its surroundings using the Autoregressive Fractionally Integrated Moving Averages (ARFIMA) model which has the ability to explain long-term time series data (long memory). The results of the research data analysis show that the best model for predicting the number of tectonic earthquakes that occur in Maluku and its surroundings is ARFIMA (0; 0.712; 1) with an MSE value of 0.1156. Meanwhile, the best model for predicting the average magnitude of the number of tectonic earthquakes that occurred in Maluku and its surroundings is ARFIMA (0; -3,224 x 10-9; 1) with an MSE value of 0.01237. Based on the two best models, the prediction results obtained from the number of tectonic earthquakes and the average magnitude of the number of tectonic earthquakes that occurred in Maluku and its surroundings for the next three periods, namely the first period there were 31 tectonic earthquakes with an average magnitude of 4.38481 SR. the second period there were 32 tectonic earthquakes with an average magnitude of 4.38407, and the third period there were 32 tectonic earthquakes with an average magnitude of 4.38333.</p> 2021-03-31T00:00:00+07:00 Copyright (c) 2021 Indonesian Journal of Statistics and Its Applications Simulation Study of Robust Geographically Weighted Empirical Best Linear Unbiased Predictor on Small Area Estimation 2021-01-01T18:54:04+07:00 Naima Rakhsyanda Kusman Sadik Indahwati Indahwati <p>Small area estimation can be used to predict the population parameter with small sample sizes. For some cases, the population units that are close spatially may be more related than units that are further apart. The use of spatial information like geographic coordinates are studied in this research. Outlier contaminations can affect small area estimations. This study was conducted using simulation methods on generated data with six scenarios. The scenarios are the combination of spatial effects (spatial stationary and spatial non-stationary) with outlier contamination (no outlier, symmetric outliers, and non-symmetric outliers). The purpose of this study was to compare the geographically weighted empirical best linear unbiased predictor (GWEBLUP) and robust GWEBLUP (RGWEBLUP) with direct estimator, EBLUP, and REBLUP using simulation data. The performance of the predictors is evaluated using relative root mean squared error (RRMSE). The simulation results showed that geographically weighted predictors have the smallest RRMSE values for scenarios with spatial non-stationary, therefore offer a better prediction. For scenarios with outliers, robust predictors with smaller RRMSE values offer more efficiency than non-robust predictors.</p> 2021-03-31T00:00:00+07:00 Copyright (c) 2021 Indonesian Journal of Statistics and Its Applications The Model of Per-Capita Expenditure Figures in Sumatera Selatan uses a Geographically Weighted Panel Regression 2020-12-16T05:17:43+07:00 Dia Cahya Wati Dea Alvionita Azka Herni Utami <p>The Geographically Weighted Panel Regression (GWPR) is a development of a global regression model where the basic idea is taken from a combination of panel data and GWR. The GWPR model is built from the point approach method, which is based on the position of the coordinates of latitude and longitude. The parameters for the regression model at each location will produce different values. GWPR can accommodate spatial effects, so that it can better explain the relationship between response variables and predictors. The purpose of this study is to compare the GWPR model with the Fixed Gaussian and Adaptive Bisquare weighting functions based on the AIC value. The data used in this study is secondary data taken from the website of the Central Statistics Agency (BPS) in the form of Per-Capita Expenditure Figures in South Sumatra in 2013-2019. This research results that in the case of the Per-Capita Expenditure Rate (AP), it is better to use the GWPR method with a fixed gaussian weighting function in the modeling, where the resulting coefficient of determination is 95.81% rather than adaptive bisquare with a determination coefficient of 93.3%. The factors that influence the Per-Capita Expenditure Rate (AP) in South Sumatra on the fixed gaussian weighting are divided into 6 groups, while the adaptive bisquare is divided into 2 groups.</p> 2021-03-31T00:00:00+07:00 Copyright (c) 2021 Indonesian Journal of Statistics and Its Applications Exploration of Obesity Status of Indonesia Basic Health Research 2013 With Synthetic Minority Over-Sampling Techniques 2021-01-01T18:55:12+07:00 Sri Astuti Thamrin Dian Sidik Hedi Kuswanto Armin Lawi Ansariadi Ansariadi <p>The accuracy of the data class is very important in classification with a machine learning approach. The more accurate the existing data sets and classes, the better the output generated by machine learning. In fact, classification can experience imbalance class data in which each class does not have the same portion of the data set it has. The existence of data imbalance will affect the classification accuracy. One of the easiest ways to correct imbalanced data classes is to balance it. This study aims to explore the problem of data class imbalance in the medium case dataset and to address the imbalance of data classes as well. The Synthetic Minority Over-Sampling Technique (SMOTE) method is used to overcome the problem of class imbalance in obesity status in Indonesia 2013 Basic Health Research (RISKESDAS). The results show that the number of obese class (13.9%) and non-obese class (84.6%). This means that there is an imbalance in the data class with moderate criteria. Moreover, SMOTE with over-sampling 600% can improve the level of minor classes (obesity). As consequence, the classes of obesity status balanced. Therefore, SMOTE technique was better compared to without SMOTE in exploring the obesity status of Indonesia RISKESDAS 2013.</p> 2021-03-31T00:00:00+07:00 Copyright (c) 2021 Indonesian Journal of Statistics and Its Applications Segmentation and Positioning of Lecturers in the Department of Computer Science at Pakuan University Based on Student Assessment 2021-03-18T12:59:58+07:00 Yusma Yanti Asep Saepulrohman <p>Determining the segmentation and positioning of the lecturers in selecting the thesis supervisor is very important to do. It is because, with this information, the supervision process in thesis writing can run well. This study intends to analyze the segmentation and positioning of lecturers related to determine the thesis supervisor using the Clusterwise Bilinear Spatial Multidimensional Scaling Model (CBSMSM) method. The data used is survey data for fifth-semester bachelor students of the 2019/2020 academic year of the Department of Computer Science, Pakuan University. One hundred sixty-one student observations provide an assessment of 10 attributes regarding the characteristics of 32 lecturers of the department. Furthermore, the estimation of the segment coordinate parameters, lecturer coordinates, dimensions, and attributes simultaneously uses the alternating least square (ALS) algorithm. The number of segments and dimensions are selected based on the smallest sum square error (SSE) value for combining segments and other dimensions. As a result, we get four segments and four dimensions with an SSE value of 4864.003. Furthermore, the department can use this result to illustrate student assessments of their lecturers' characteristics regarding thesis supervision.</p> 2021-03-31T00:00:00+07:00 Copyright (c) 2021 Indonesian Journal of Statistics and Its Applications Study of Bagging Application in the Safe-Level Smote Method in Handling Unbalanced Classification 2021-03-18T13:35:07+07:00 Qorry Meidianingsih Debby Agustine <p>The problems of imbalanced class classification have been found in many real applications. It has potential to make the minority class instances tend to be classified into the majority class. This study examined the performance of bagging method’s application in safe-level SMOTE based on Support Vector Machine classifier. The data used consisted of three types based on the proportion of observations in the majority and minority classes. Each type of data has three variables, two independent variables and one variable dependent. The observations of independent variables were generated based on multivariate normal distribution, while dependent variables are binary. The results showed that the classifier has a high accuracy and sensitivity for all types of data for both in the imbalanced class and the balanced class (obtained by safe-level SMOTE and safe-level SMOTEBagging). Nevertheless, specificity was the main measure in assessing the performance of the classifier because it provides accuracy in classifying the minority class observations. The specificity increased when the number of observations between the two classes were approximately balance due to the implementation of safe-level SMOTE. The best performance of the Support Vector Machine in predicting minority class observations was achieved when bagging were applied in safe-level SMOTE. The specificity rate for all types of data were 77.93 percent, 78.46 percent, and 85.69 percent, respectively.</p> 2021-03-31T00:00:00+07:00 Copyright (c) 2021 Indonesian Journal of Statistics and Its Applications The Clustering of Provinces in Indonesia by The Economic Impact of Covid-19 using Cluster Analysis 2021-03-18T16:25:31+07:00 Zerlita Fahdha Pusdiktasari Widiarni Ginta Sasmita Wulaida Rizky Fitrilia Rahma Fitriani Suci Astutik <p>The Covid-19 pandemic has hit Indonesia since March 2020. Several policies have been issued by the Indonesian government to reduce the level of the spread of Covid-19. This policy has an impact on various fields of life, especially the economic sector in various sectors. This study was conducted to analyze the grouping of provinces whose economies are at risk of being affected by Covid-19 based on various economic sectors, namely the unemployment rate, the percentage of poor people, the provincial minimum wage, and the occupancy rate of hotels using cluster analysis. Cluster analysis was performed using several hierarchical methods, namely Simple, Complete, Average, and Centroid Linkage and Ward. The Cophenetic correlation coefficient (rCoph) was used to determine the best method, while the number of clusters was determined based on the Dunn, Connectivity, and Silhoutte indexes. The analysis result shows that Average Linkage is the best method with two clusters. The first cluster consists of all provinces in Indonesia except Papua, whose economy is highly at risk of being affected by Covid-19, characterized by a low percentage of the poor and a low provincial minimum wage, as well as high levels of open unemployment and hotel occupancy rates. Meanwhile, the second cluster consists of the Province of Papua, which is an economic group with a low risk of being affected by Covid-19. By looking at the impact of the Covid-19 disaster, the government can make recovery efforts and generalize economic recovery policies due to Covid-19 which have an impact on the economy of almost all provinces in Indonesia.</p> 2021-03-31T00:00:00+07:00 Copyright (c) 2021 Indonesian Journal of Statistics and Its Applications Comparing of Car-Bym, Generalized Poisson, and Negative Binomial Models on Tuberculosis Data in Banyumas Districs 2021-03-20T09:36:26+07:00 Jajang Jajang Budi Pratikno Mashuri Mashuri <p>In 2019 the number of people with TB (Tuberculosis) in Banyumas, Central Java, is high (1,910 people have been detected with TB). The number of people infected Tuberculosis (TB) in Banyumas is the count data and it is also the area data. In modeling, the parameter estimation and characteristic of the data need to be considered. Here, we studied comparing Generalized Poisson (GP), negative binomial (NB), and Poisson and CAR.BYM model for TB cases in Banyumas. Here, we use two methods for parameter estimation, maximum likelihood estimation (MLE) and Bayes. The MLE is used for GP and NB models, whereas Bayes is used for Poisson and CAR-BYM. The results showed that Poisson model detected overdispersion where deviance value is 67.38 for 22 degrees of freedom. Therefore, ratio of deviance to degrees of freedom is 3.06 (&gt;1). This indicates that there was overdispersion. The folowing GP, NB, Poisson-Bayes and CAR-BYM are used to modeling TB data in Banyumas and we compare their RMSE. With refer to RMES criteria, we found that CAR-BYM is the best model for modeling TB in Banyumas because its RMSE is smallest.</p> 2021-03-31T00:00:00+07:00 Copyright (c) 2021 Indonesian Journal of Statistics and Its Applications Comparison of Soft and Hard Clustering: A Case Study on Welfare Level in Cities on Java Island 2021-03-18T16:25:39+07:00 Nurafiza Thamrin Arie Wahyu Wijayanto <p>The National Medium Term Development Plan 2020-2024 states that one of the visions of national development is to accelerate the distribution of welfare and justice. Cluster analysis is analysis that grouping of objects into several smaller groups where the objects in one group have similar characteristics. This study was conducted to find the best clustering method and to classify cities based on the level of welfare in Java. In this study, the cluster analysis that used was hard clustering such as K-Means, K-Medoids (PAM and CLARA), and Hierarchical Agglomerative as well as soft clustering such as Fuzzy C Means. This study use elbow method, silhouette method, and gap statistics to determine the optimal number of clusters. From the evaluation results of the silhouette coefficient, dunn index, connectivity coefficient, and S<sub>w</sub>/S<sub>b</sub> ratio, it was found that the best cluster analysis was Agglomerative Ward Linkage which produced three clusters. The first cluster consists of 27 cities with moderate welfare, the second cluster consists of 16 cities with high welfare, the third cluster consists of 76 cities with low welfare. With the best clustering results, the government of cities in Java shall be able to make a better policies of welfare based on the dominant indicators found in each cluster.</p> 2021-03-31T00:00:00+07:00 Copyright (c) 2021 Indonesian Journal of Statistics and Its Applications Determinant Factors of Working Children based on Conditional Logistics Regression for Matched Pairs Data 2021-03-18T16:25:48+07:00 Rizky Zulkarnain Tri Listianingrum Khairil Anwar Notodiputro <p>Working children may create problem since it relates to human right as well as to the development of children especially in getting sufficient education. This paper discusses determinant factors of working children by using conditional logistics regression for matched pairs data. Matching is employed to adjust confounding factors and to avoid bias. In this paper there are three confounding factors that have been considered, i.e. residential area, gender, and income of household head. The results showed that the conditional regression model outperformed the standard regression model. The number of household members, whether the head of household was married or single, age of the head of household, educational attainment of the head of household, as well as the work status of the head of household were the determinant factors of the working children.</p> 2021-03-31T00:00:00+07:00 Copyright (c) 2021 Indonesian Journal of Statistics and Its Applications Geographically Weighted Regression with Kernel Weighted Function on Poverty Cases in West Java Province 2021-03-18T16:25:57+07:00 Winda Nurpadilah I Made Sumertajaya Muhamad Nur Aidi <p>Spatial regression analysis is a form of regression model that considers spatial effects. Geographically weighted regression (GWR) is the spatial regression methods that can be used to deal with the problem of spatial diversity. This method generates local model parameter estimates for each observation location. The application of spatial statistics can be done in all areas such as the problem of poverty. Poverty can be influenced by factors of proximity between regions, so that in determining the poverty factor, the proximity factor of the region cannot be ignored. West Java Province is a province with the largest population, so this study aims to model the poverty data in West Java Province by incorporating spatial effects. The weighting function used for the GWR model is the function of the fixed and adaptive kernels. The analysis results show that the fixed exponential kernel function has the smallest cross validation (CV) value, so the weighting matrix used in the model is determined by the exponential kernel function. The largest value and the smallest AIC value are owned by the GWR model with an exponential kernel function. Based on the results obtained by the the ANOVA table to test GWR's global goodness, the GWR model is more effective than global regression. Therefore, the GWR model is the best model when it used in West Java’s poverty cases. The effect of each explanatory variable on the percentage of poverty varies in each district/city in West Java Province.</p> 2021-03-31T00:00:00+07:00 Copyright (c) 2021 Indonesian Journal of Statistics and Its Applications Determinants of Male Adolescents Smoking Behavior in Indonesia using Negative Binomial Regression 2021-03-18T12:37:20+07:00 Angel Zushelma Hartono Siskarossa Ika Oktora <p>Adolescent smoking habits have become the Ministry of Health's major program associated with tobacco consumption. In 2016, the prevalence of adolescent smoking aged 10-18 years reached 8.8% and were rate increasingly against the Strategic Planning Ministry of Health 2015-2019 target to lower adolescent smoking prevalence to 5.4%. Male adolescents consuming cigarettes are higher than females. Whereas, high consumption of cigarettes in men will increase the risk of impotence and decrease reproductive health quality to affect future generations' quality. This study aims to determine the general picture of smoking behavior in Indonesia's male adolescent in 2018 and any variables that affect the number of cigarettes consumed. The analytical method used is Poisson Regression and Negative Binomial Regression. The data source used is raw data Riskesdas 2018 with the unit of analysis are male adolescent smokers aged 10-18 years. Research indicates that most male adolescents are light smokers. Heavy smokers were dominated by older age, living in a rural area, poorly educated, employed, lived with a household head who was a smoker, and had low education. Age, location of residence, education level, working status, smoking status, and household head education level significantly affect male adolescents' smoking behavior.</p> 2021-03-31T00:00:00+07:00 Copyright (c) 2021 Indonesian Journal of Statistics and Its Applications A Conditional Logistic Regression Model for Analyzing Unemployment Rates in West Java 2021-01-21T12:32:09+07:00 Dwi Jayanti Septian P Palupi Khairil Anwar Notodiputro <p>Unemployment is a critical problem faced by developing countries. It is a complex problem which creates other social and economic problems such as poverty, economic gaps, and crimes. This paper discusses the determinant factors of unemployment rates based on empirical data using the conditional logistic regression model. The model was used to analyze matched pair data using gender, age and residence as matching factors. The result showed that household status, marriage status, as well as levels of education were the determinant factors of a person being unemployed in West Java. It is also shown that the conditional logistic regression outperformed the standard logistic regression for analyzing the cause of unemployment.</p> 2021-03-31T00:00:00+07:00 Copyright (c) 2021 Indonesian Journal of Statistics and Its Applications Determining Critical Yield Index of Area Yield Insurance based on Basis Risk Constraint 2021-03-18T10:06:57+07:00 Valantino Agus Sutomo Dian Kusumaningrum Aurellia Layvieda Rahma Anisa <p> Area yield index insurance at district level faces heterogeneous basis risk due to geographical conditions which implies to obtain unprecise critical index . Clustering and zone-based area yield scheme can reduce heterogeneous basis risk that leads to determine the suitable alternative for . On the previous research, we have obtained 7 clusters and 2 level of paddy productivity based on clustering assumption from primary data in Java. The suitable clustering assumption for calculating is cluster based assumption, which gives the homogeneous paddy productivity under 7 clusters in Java. Therefore, our goal is to develop area yield index at district level (cluster based) with minimize basis risk at certain constraints for paddy farmer productivity in Java Indonesia. There are some methods for calculating such as mean, median, winsor mean, one sigma, two sigma and (first quartile) method on the basis risk constraints using confusion matrix. Furthermore, two basis risk constraints are the difference between overpayment and shortfall is not extremely far, and total basis risk does not exceed 20% of its total claim occurrence. Two sigma method has the lowest basis risk, overpayment, and shortfall, but it has lowest pure premium, small probability of claim, and low range of claim. Hence, we consider to use (first quartile) method as alternative and suitable method to calculate that satisfied two basis risk constraints. In conclusion, our research provides analytical calculation for area yield index at district level with pure premium as Rp 152,151 using ( method), which is sufficient to cover the total claim and consistent with the simulation.</p> 2021-03-31T00:00:00+07:00 Copyright (c) 2021 Indonesian Journal of Statistics and Its Applications