Repositories - Dept. of Statistics, IPB University

Bagging Based Ensemble Classification Method on Imbalance Datasets

Lukmanul Hakim, Bagus Sartono, Asep Saefuddin — Sun, 31 Dec 2017 00:00:00 +0700

In the last few years, the problem of class imbalances is a challenging problem in data mining community. The class imbalance occurs when one of the classes in the data has a larger number than others. That condition causing the classification being not optimum because the larger class gave more influences in the classification. Some cases of class imbalance issues become a very important thing, for example, to detect cheating in banking operations, network trouble, cancer diagnose, and prediction of technical failure. This study conducts a bagging based ensemble method to overcome the problem of class imbalance on 14 datasets. The purpose of this research is to see the ability of some bagging based ensemble methods on overcoming the class imbalance problem. The results obtained by using OverBagging method are more stable than other bagging based methods in various datasets.

Zero Inflated Binomial Models in Small Area Estimation with Application to Unemployment Data in Indonesia

Budi Hartono, Anang Kurnia, Indahwati Indahwati — Sun, 31 Dec 2017 00:00:00 +0700

Binary response variables are commonly modeled by binomial models. The binomial overdispersion occurs if the variability is greater than the variance of assumed model. The overdispersion can be caused by excess zeros. The overdispersion may produce underestimated standard error which in turn will produce underestimated p-value. Therefore, Zero Inflated Binomial (ZIB) models are considered to overcome the excess zeros in binomial data. A simulation study is employed to evaluate the performance of models by using RRMSE and relative bias. The simulation showed that the proposed method SAE ZIB has better fit than SAE ZIB Synthetic in terms of the smaller RRMSE. The proposed SAE ZIB method applies to unemployment data to estimate proportion of unemployment in each district/regency during period of August 2016 In Jambi Province, Indonesia. The real data application showed that SAE ZIB method is better than the direct estimates method in terms of the smaller standard error.

Small Area Estimation in Estimating Unemployment Rate in Bogor District of Sampled and Non-Sampled Areas UsingA Calibration Modeling Approach

Siti Aprizkiyandari, Anang Kurnia, Indahwati Indahwati — Sun, 31 Dec 2017 00:00:00 +0700

The main problem in Indonesia is unemployment. There are some various government policies to resolve unemployment, such as the availability of statistical data in unemployment. The National Labor Survey conducted by the Statistics Indonesia (BPS) only generates estimates at the national levels, whereas to carry out various government policies requires the availability of unemployment information to smaller levels. The Small Area Estimation (SAE) method is one of the solutions to estimate small area without adds sampling units. The method is borrowing strength from nearby observation sample areas. The study focused on estimating unemployment rate in Bogor sub-district level using Generalized Linear Mixed Models (GLMM) method with calibration approach. The results of the proposed method can produce the same result as published by BPS and are able to generate the result to sub-district level.

Forecasting The Broad Proportion Attack of Rice Blast Disease in Indonesia

Iman Setiawan, I Made Sumertajaya, Farit Mochammad Afendi — Sun, 31 Dec 2017 00:00:00 +0700

Classical regression analysis is a statistical technique for modeling, forecasting and investigating the relationship between response variable and explanatory variables. However, there are model adequacy must be checked on residual model i.e. autocorrelation. The autocorrelation problem can be solved by modeling the residual of regression model into model that specifically incorporates the autocorrelation structure. Autocorrelation can be caused by residual of regression model increasing over time. The time series regression model is one of the analyzes used to accommodate the model residual which increasing over time. This study used data on the broad proportion of rice blast (Pyricularia grisea) attacks. The purpose of this study is to forecast the broad proportion of rice blast attacks used classical regression model and time series regression model. Evaluate forecast values used mean absolute percentage error (MAPE). The comparison results showed that the forecast of time series regression model better than classical regression model.

Combined DEA and Classification Analysis with Case Study of Building Construction Company

Ray Tamtama, Bagus Sartono, Asep Saefuddin — Sun, 31 Dec 2017 00:00:00 +0700

Construction is an activity undertaken by a group of people to create the physical building needed to meet human needs. Construction companies will not survive intense competition unless they can play efficiently. This study examines the efficient construction company with DEA (Data Envelopment Analysis) and continued by looking at the characteristics of efficient construction companies in Southeast Asia using classification analysis. Many domestic building construction companies in southeast Asia are inefficient than efficient companies. This is due to company inputs that are higher than company output. Efficient domestic building construction there are 12.6% of companies and inefficient there is 87.4% in Southeast Asia.

The Estimation of The Total Number of Agricultural Families in Ogan Komering Ilir Regency of South Sumatra Province Under Incomplete Sampling Frame

Asih Maulida, Farit Mochamad Afendi, Kusman Sadik — Sun, 31 Dec 2017 00:00:00 +0700

The various geographic and topography condition in Indonesia makes several areas in Indonesia have limited access. It needs a high cost and spends a long time on collecting data onto this area so some researchers tend to exclude this area from the sampling frame. Incomplete sampling frames influence the inclusion probabilities of the non-included unit in sampling frame and arises bias. Several approaches could be used to reduce bias, one of them is Predecessor-Successor method. We used a direct estimation of the total number of agricultural families in Ogan Komering Ilir regency by classical sampling theory and PredecessorSuccessor method then evaluated their estimators. The results showed Predecessor-Successor method could reduce bias more effectively than classical sampling theory on a large sample size. Using an appropriate estimation method of a complete frame, the best estimator will be gotten. If it is unattainable, Predecessor-Successor method can be used to direct estimates of population quantity.

Geographically and Temporally Weighted Regression (GTWR) for Modeling Economic Growth using R

Miftahus Sholihin, Agus M Soleh, Anik Djuraidah — Sun, 31 Dec 2017 00:00:00 +0700

Economic growth is a main condition for the sustainability of regional economic development. Spatially, the highest economic growth in Indonesia is dominated by provinces in Java. However, the economic growth rate of Central Java Province is the lowest economic growth compared to other provinces. The Geographically and Temporally Weighted Regression (GTWR) method performed to model the economic growth of the Central Java Provincial districts by accommodating the influence of spatial-temporal heterogeneity. This modeling involves four explanatory variables e.g, number of labor force, local revenue, district minimum wage, and human development index with response variable gross regional domestic product. The results of the analysis showed that GTWR method has better coefficient determination (99.8%) with root mean squared error and Akaike's Information Criterion values of 0.84 and 1051.98. In general, HDI gives more influence to economic growth at each regency / city in Central Java during 2011-2015.

Forecasting Simulation with ARIMA and Combination of Stevenson-Porter-Cheng Fuzzy Time Series

Wahyu Dwi Sugianto, Agus M Soleh, Farit Mochamad Afendi — Sun, 31 Dec 2017 00:00:00 +0700

The simulation was implemented to find out the perfomance for a combination of methods in Stevenson-Porter-Cheng Fuzzy Time Series (FTS) based on 100 replicates on 100 generated data following the model of ARIMA (1,0,0) or AR (1). There are 9 scenarios used as a combination between 3 data generation error variance values (0.5, 1, 3) and 3 AR(1) parameter values i.e. 0.3, 0.5, and 0.7. The results of the simulation showed the greater variance of error and the value of the of AR(1) parameter then the variance of the MSE results with ARIMA will be even greater (0.0634 â€“ 15.7633). While the variance of the MSE results forecasting with Cheng and Cheng2 (no sub interval) FTS tend to be more stable (0.0712 â€“ 2.9648 and 0.0640 â€“ 2.7157). By using the percentage change of historical data as the set of universe, SP Cheng FTS produces MSE forecasting range values ranging from 0.0722 â€“ 14.7045. While SP Cheng2 FTS using the difference of historical data resulted in MSE forecasting values ranging from 0.0759 â€“ 4.6803. Although both MSE values do not look much better than Cheng and Cheng2 FTS, but the greater the AR(1) parameter then MSE forecasting of Cheng and Cheng2 FTS will be better than ARIMA and even closer to the Cheng and Cheng2 FTS.

Zero Inflated Beta Model in Small Area Estimation to Estimate Poverty Rates on Village Level in Langsa Municipality

Meita Jumiartanti, Indahwati Indahwati, Anang Kurnia — Sun, 31 Dec 2017 00:00:00 +0700

Village level poverty rates are needed as a consideration for allocating village funds. The national socio economic survey samples are designed to estimate poverty rates in province and distric level. Direct estimate for calculating estimates of village level poverty rates does not have a good precision due to small sample sizes. Small Area Estimation (SAE) technique is used to produce a good precision with small sample sizes. The estimates of poverty rates should also be produced for non sampled area and when no poor are included in the sample. We propose zero inflated beta model because poverty rates takes value in the intervals [0,1). Clustering technique is used to acommodate random effect area for non sampled area. The purpose of this research is to estimate poverty rates on village level in Langsa Municipality. The result showed that estimates poverty rates on village level with zero inflated beta model is better than direct estimates.

Prediction of CIF Components Proportion of Indonesian Import Value Using Multivariate Fractional Logit Model

Mardiah Mardiah, Asep Saefuddin, Indahwati Indahwati — Sun, 31 Dec 2017 00:00:00 +0700

International Merchandise Trade Statistics (IMTS) recommends to use a free on board (FOB) valuation for exports and cost, insurance, and freight (CIF) valuation for imports. CIF is a sum of FOB, freight, and insurance value of imported goods. IMTS suggests countries that record import value on CIF to have an additional method to decompose CIF into FOB, freight, and insurance value. FOB, insurance, and freight fraction follow multivariate fractional model. The model is to give prediction value of three CIF components fraction. Based on MAPE and RMSEP value, mode of transport, transit status, and group of two digit HS code are three covariates that the best precision of the predicted value.