Indonesian Journal of Statistics and Its Applications

Comparison of Negative Binomial Regression Model and Geographically Weighted Poisson Regression on Infant Mortality Rate in South Sulawesi Province

Siswanto Siswanto — 2022-08-31

The number of infant mortality cases is an important indicator to assess the quality of a country's public health. A number of studies argue that the case of infant mortality has a close relation to the living area condition and the social status of the parents. Indirectly, the quality of life of babies in a country will impact the nation's quality of life in general. Therefore, many efforts are required to reduce the infant mortality in Indonesia. One of the steps that could be done to overcome this issue is to analyze the causative factors. The statistical method that has been developed for data analysis taking into account current spatial factors is the Geographically Weighted Poisson Regression (GWPR) with a weighted Bisquare kernel function. Based on the partial estimation with the GWPR model, there are seven groups based on significant variables that affect the number of infant deaths in South Sulawesi Province. Of the seven groups formed, the first group is the Selayar Islands where all variables have a significant effect. This needs to be a concern for the South Sulawesi provincial government to improve facilities and infrastructure in the Selayar Islands, of course the location which is very far from the city center can affect access to drug reception, medical personnel and so on. Based on the results of the analysis of the factors that affect the number of infant deaths in South Sulawesi Province using a negative binomial regression approach and GWPR with a bisquare kernel weighting, it can be concluded that the GWPR model used is the best for analyzing the number of infant deaths in South Sulawesi Province because it has an AIC value. The smallest is 167.668.

Comparison of Hierarchical Clustering, K-Means, K-Medoids, and Fuzzy C-Means Methods in Grouping Provinces in Indonesia according to the Special Index for Handling Stunting

Ghina Rofifa Suraya — 2022-08-31

Stunting has been widely known as the highest case of malnutrition suffered by toddlers in the world and has a bad impact on children's future. In 2018, Indonesia was ranked the 31st highest stunting in the world and ranked 4th in Southeast Asia. About 30.8% (roughly 3 out of 10) of children under 5 years suffer from stunting in Indonesia. To support the government policy making in handling stunting, it is undoubtedly necessary to classify the levels of stunting handling in regions in Indonesia. In this work, the hierarchical agglomerative and non-hierarchical clustering is compared and evaluated to perform clustering on stunting data. The agglomerative hierarchical cluster uses Single Linkage, Average Linkage, Complete Linkage, and Ward Method, while the non-hierarchical cluster uses K-Means, K-Medoids (PAM) Clustering, and Fuzzy C-Means. This study uses data from 12 IKPS indicators in 34 provinces in Indonesia in 2018. Based on the results of the evaluation using the Connectivity Coefficient, Dunn Index, Silhouette Coefficient, Davies Bouldin Index, Xie & Beni Index, and Calinski-Harabasz Index, the results show that the Average Linkage is the best cluster method with the optimal number of clusters is four clusters. The first cluster is a cluster with a good level of stunting management which consists of 28 provinces. The second cluster consists of only one province, DI Yogyakarta with a very good level of stunting handling. The third cluster consists of four provinces with poor stunting handling rates. Finally, the last cluster consisting of one province, Papua, has a very poor level of stunting handling.

A Dynamic Factor Model for Nowcasting Household Consumption

Az Zahra Amon Ra — 2022-08-31

A Dynamic Factor Model (DFM) is one of the time series models that can be used to forecast within a very short period in the future known as nowcasting. This model can be used to accommodate the frequency difference that exists between monthly explanatory variables and a response variable which is measured quarterly. This model has been commonly used in economics especially to forecast household consumption for the purpose of constructing economic policies. The economic condition of a country can be reflected in the country's Gross Domestic Product (GDP). Consumption is an important component of GDP because of its large proportion of GDP. One of the household economic activities to meet the various needs of goods and services is referred to as household consumption. This paper discusses the DFM to forecast household consumption based on the varimax and quartimax rotations. The results show that both rotational methods can be used for transmitting household consumption with the same precision.

Low Welfare Status Modeling Using Mixed Geographically Weighted Regression Method with Fixed Tricube Weighting Function

Tri Yuliyanti — 2022-08-31

Mixed Geographically Weighted Regression (MGWR) is a method for analyzing spatial data in regression that produces local and global parameters. Parameter estimation using WLS with a fixed tricube weighting function. The object of research in this study is poor population (X1), female household heads (X2), the education (X3), individuals with disabilities (X4), individuals having chronic disease (X5), individuals works (X6), uninhabitable houses (X7), and low welfare status (Y). This reseach applied to the low welfare status (Y) of each district/town in Central Java in 2019, and produced local variables are X1, X3, X5 and global variables are X2, X4, X6, and X7. However, only X1, X4, and X7 have a significant effect on Y in each district/town in Central Java, and X3 has a significant effect on only a few districts/cities, the other, X2, X5, and X6 have no significant effect on the model. The predictor variable has an effect of 98.92% on the model while the remaining 1.18% affected by other factors. The MGWR method divides 2 groups based on significant variables, (a) The first, a district/town whose low welfare status affected by X1, X3, X4, X7 covering Cilacap, Purbalingga, Kendal, Batang, Brebes, Pekalongan Town, and Tegal Town, (b) The second, districts/town whose low welfare status affected by X1, X4, X7 covering Banjarnegara, Purworejo, Temanggung, Kudus, Wonosobo, Pekalongan, Pemalang, Jepara, Wonogiri, Boyolali, Tegal, Magelang, Sukoharjo, Banyumas, Grobogan, Klaten, Karanganyar, Kebumen, Blora, Semarang Town, Pati, Sragen, Demak, Magelang Town, Salatiga Town, Surakarta Town, Semarang, and Rembang.

Handling Multicollinearity Problems in Indonesia's Economic Growth Regression Modeling Based on Endogenous Economic Growth Theory

Aldino Yanke — 2022-08-31

One of the multiple linear regression applications in economics is Indonesia’s economic growth model based on the theory of endogenous economic growth. Endogenous economic theory is the development of classical theory which cannot explain how the economy grows in the long run. The regression model based on the theory of endogenous economic growth used many independent variables, which caused multicollinearity problems. In this study, the multiple linear regression model using the least-squares estimation method and some methods to handle the multicollinearity problem was implemented. Variable selection methods (backward, forward, and stepwise), principal component regression (PCR), partial least square (PLS), and regularization methods (Ridge, Lasso, and Elastic Net) were applied to solve the multicollinearity problem. Variable selection method with backward, forward, and stepwise has not been able to overcome the problem of multicollinearity. In contrast, Principal Component Regression, PLS regression, and regularization regression methods overcame the multicollinearity problem. We used "leave one out cross-validation" (LOOCV) to determine the best method for handling multicollinearity problems with the smallest mean square of error (MSE). Based on the MSE value, the best method to overcome the multicollinearity problem in the economic growth model based on endogenous economic growth theory was the Lasso regression method.

Identification Pharmacodynamic Interactions of Active Compounds of Diabetes Mellitus Type 2 Herbal Plants Using the Random Forest Method

M. Aiman Askari — 2022-08-31

Drug-drug interactions is defined as the modification of the effect of a drug as a result of another drug given simultaneously or with an interval or when two or more drugs interact so that the effectiveness or toxicity of one or more drugs changes. Pharmacodynamic interactions are one type of interaction that needs special attention because these interactions work directly on the body's physiological systems and compete on the same receptors so that they can be antagonistic, additive, or synergistic. The use of medicinal plants is becoming an alternative because in addition to their relatively safer side effects, medicinal plants consisting of active compounds are appropriate in treating degenerative metabolic diseases triggered by mutations in many genes. As in the case of polypharmacies, interactions of active compounds in medicinal plants can also lead to phapharmodynamic interactions. Therefore, it is also necessary to identify the active compounds so that it can then be known whether the interaction of the compounds will be beneficial or detrimental. In this study, pharmacodynamic identification was applied to Diabetes Mellitus Type 2 medicinal plant compounds by using the independent variables Target Protein Connectedness (TPC), Side Effect Similarity (SES), and Chemical Similarities (CS) using Random Forest classification method. From a search of various databases, 21 active compounds were obtained and then only 100 compound interactions could be calculated as independent variables. With an accuracy value and AUC of 0,96, there were 93 pairs of compounds that interacted pharmacodynamically and the remaining 7 did not interact.

Economic Order Quantity (EOQ) for Perishable Goods with Weibull Distribution and Exponential Demand Rate Proportional to Price

Motunrayo Bankole — 2022-08-31

Business organizations that deal with consumable and perishable items have consistently incurred enormous loss as a result of the nature of their goods. The losses have direct negative impact on revenues. Unplanned and lack of precise production prediction models are responsible for this. An appropriate prediction model, developed to guide production plan and processes will help manufacturers in deciding which product to make and in what quantity. In this study, the Economic Order Quantity (EOQ) for perishable goods with Weibull lifetime distribution and exponential demand rate proportional to price was developed for perishable goods. The differential equations governing the instantaneous state of inventory in the interval [0, t2] were obtained and solved for the equation of the quantity of inventory at time t. Using fixed parameters for the weibull and exponential distributions, simulation study was conducted on the derived EOQ model using R programming language. The simulation shows that the EOQ increases with increase in Weibull parameter. Real data on six loafs of bread obtained from Afe Babalola University bakery was used to illustrate how the model works. Result shows a good fit to the data and the average EOQ ranges from 60 to 400 loafs with ordering times of either 1 or two days interval. The pattern of EOQ varies between type of loafs of bread. The EOQ model developed is shown by this result to be appropriate for perishable goods with weibull lifetime distribution and exponential demand rate proportional to price.

Analysis of Net Enumeration Rate of Senior High School Using Fixed-Effect Clustered-Robust Standard Error Model

Leonita Amara Husna Metanda — 2022-08-31

The Net Enumeration Rate (NER) of senior high school (SHS) in Indonesia in 2017-2019 always be the lowest than the other education levels and cannot fulfill the target of the 2014-2019 National Medium-Term Development Plan (RPJMN). This study aims to analyze the determinants of NER of SHS in Indonesia 2017-2019 using the panel data regression method. The independent variables include child labor, child marriage, Smart Indonesia Program (PIP), repeat rates, and poverty. The NER of SHS is the dependent variable. Based on the modeling, heteroscedasticity and autocorrelation problems are found. The fixed-effect clustered-robust standard error method is used to solve these problems. The results show that the NER of SHS increased every year, and poverty decreased every year. Meanwhile, other variables fluctuate during 2017-2019. Furthermore, it is found that child labor and poverty significantly affect the NER of SHS in Indonesia. Meanwhile, child marriage, PIP, and repeat rates have no significant effect. This study can be used by local government to implement more effective policies based on the factor that do have significant effects on NER of SHS in Indonesia in 2017-2019.

Identification of Social Support and Knowledge of Covid-19 Survivors with Structural Equation Modeling in R

Nur Silviyah Rahmi — 2022-08-31

COVID-19 cases in Indonesia have finally reached a second peak amounting to 4 million cases. A number of the death rate was 3.4 percent, yet the recovery rate was 95.9 percent. The Health Ministry of Republic Indonesia through the Covid-19 Task Force has issued guidelines for preventing and controlling Covid-19 to decrease the death rate and increase the recovery rate. According to the guidelines, a person who undergoes quarantine needs to be provided with health care, and social and psychosocial support. This study seeks to identify the influence of external factors including social support, as well as internal factors including patient motivation, and knowledge on the recovery rate of Covid-19 survivors. The research methods use Structural Equation Modelling to determine the indicators that have the most significant influence on the latent variables of social support, knowledge, and motivation for healing Covid-19. Primary data collection was carried out online with a sample of 176 Covid-19 survivors across Indonesia in August 2021. The methods of the Shapiro-Wilk test for normal multivariate show the p-value at 0.00 significantly satisfies the assumption. The result shows that social support has a significant effect on knowledge with a regression coefficient is 0.263. Knowledge has a regression coefficient is 0.645 for the Healing of Covid-19. In conclusion, the higher social support provided by the patient's external parties: family, surrounding environment, and public health center officers, will impact the higher patient's knowledge and healing of Covid-19 disease. Meanwhile, social support has no significant effect on healing actions.

Nested Mixed Models with Repeated Measurements for Analyzing Gross Profit of Public Companies in West Java

Alina Witri — 2022-08-31

The company's gross profit plays an important role in boosting the Gross Regional Domestic Product (PDRB) which will affect the revenue of local governments, known as Pendapatan Asli Daerah. Local governments often need information how gross profits of companies are different within each sector. It is not easy to investigate this matter especially if these companies are observed repeatedly and subsectors are nested within the sector. In this study, three factors were involved, i.e., sectors, subsectors which are nested in a particular sector, and time. It is assumed that the sectors and time of observation are fixed, whereas the subsectors are random. The response variable is the average gross profit per subsector of public companies in West Java. The objective of this study is to identify the variation of the subsectors, the effects of sectors as well as time on the average of the gross profit. Since the study involves fixed and random factors and the gross profit rate was observed more than one time, then a nested mixed model with repeated measurement is used. The results showed that there was no sector effect on the average gross profit, there is a variation in the average gross profit per subsector that is nested within the sector, and the time of observation did not influence the average gross profit.

Application of Adaptive Synthetic Nominal and Extreme Gradient Boosting Methods in Determining Factors Affecting Obesity: A Case Study of Indonesian Basic Health Research Survey 2013

Yoris Rombe — 2022-08-31

Obesity is the accumulation of excessive body fat and can be harmful to health. According to recent studies, several factors that contribute to the increasing prevalence of obesity in Indonesia include poor diet, lack of consumption of vegetables and fruits, high consumption of fast food, area of residence, and lack of physical activity. In addition, psychological factors, high consumption of alcohol and cigarettes, cultural differences, and stress factors also trigger obesity. The rapid development of the medical field cannot be separated from the availability of data that is increasingly easy to access and increasing knowledge in the medical field. This makes machine learning increasingly needed for pattern recognition from very large medical data, including obesity data. In this study, the factors that influence obesity status in Indonesia will be determined. In order to achieve this, Extreme Gradient Boosting (XGBoost) was used. This method is one of the classification methods that has better scalability and more efficient over its previous methods. Besides that, to overcome the imbalanced data, Adaptive Synthetic Nominal Algorithm (ADASYN-N) is used in order to balance the data and improve its prediction accuracy. Both the ADASYN-N and XGBoost methods will be applied to obesity data from the Indonesian Basic Health Research Survey in 2013. This study shows that female is more at risk in determining obesity status in Indonesia based on the highest gain value (37%). In addition, age 35-54 years, strenuous activity, and eating vegetables for 6 days are also risk factors of obesity.

An Empirical Comparison of Some Product Estimators

R.K. Sahoo — 2022-08-31

In this paper, we undertake an extensive comparative study of some biased, almost unbiased and unbiased product estimators on the ground of different performance measures through Monte Carlo simulation that has not yet been initiated in the survey sampling literature. The simulation experiment is conducted using data on 20 natural populations available in the literature, and the performance indicators taken into consideration are the absolute relative bias, percentage relative efficiency, coverage rate of confidence intervals, standard deviation of the student t-statistic, and approach to symmetry (normality). This empirical study will not only facilitate to assess the overall relative performance of different competing product or product-type estimators but will also be beneficial to provide some guidelines towards further research in this direction.

Analysis of Covid-19 Risk Perception Survey Result Using Generalized Structured Component Analysis

Zahira Rahvenia Robert — 2022-08-31

The capital city of Indonesia, Jakarta, became the province with the highest number of Covid-19. Response this situation, LaporCovid-19 collaborate with the Social Resilience Lab, Nanyang Technological University conducted a survey to measure how Jakarta residents perceive the risk of Covid-19 from May 29 to June 20 2020. Factors of risk perception are variables that cannot be measured directly, so they are analyzed used a Structural Equation Modeling (SEM) approach, namely Generalized Structured Component Analysis (GSCA). The Likert scale used can be considered as interval or ordinal depending on the point of view of the theory built. Therefore, this study will compare the GSCA method with the nonlinear GSCA and evaluate six variables, namely risk perception, knowledge, information, health behavior , social capital, and economy. Evaluation of the overall model showed that the nonlinear GSCA model can explain the diversity of qualitative data better than the GSCA model with FIT > 0.9. Based on GSCA nonlinear model, information has significantly influence of knowledge, economy and social capital have a real reciprocal relationship, along knowledge and risk perception have significantly influence of health behavior.

On the Statistical Learning Analysis of Rain Gauge Data over the Natuna Islands

Sandy Herho — 2022-08-31

Located in the middle of South China Sea with distance more than 700 m to nearby main lands, Natuna Islands settings remain the focus of scientific conversation. This article presents state-of-the-art statistical learning methods for analyzing rain gauge data over the Natuna Islands. By using shape preserving piecewise cubic interpolation, we managed to interpolate 671 null values from the daily precipitation data. Dominant periodicity analysis of daily precipitation signals using Lomb-Scargle Power Spectral Density shows annual, intraseasonal, and interannual precipitation patterns over the Natuna Islands. Unsupervised anomaly analysis using the Isolation Forest algorithm shows there are 146 anomaly daily precipitation data points. We also conducted an experiment to predict the accumulation of monthly precipitation over the Natuna Islands using the Bayesian structural time series algorithm. The results show that the local linear trend with seasonality model is able to model the value of accumulated monthly precipitation for a twelve-month prediction horizon. The work presented here has profound implications for rainfall observations in this area.