Performance Comparison of the Linear Regression Model Based on Nonparametric Bootstrap Methods

Authors

  • Arisara Arunnobpharat Department of Mathematics, Faculty of Science, Srinakharinwirot University, Thailand
  • Arissa Damon Department of Mathematics, Faculty of Science, Srinakharinwirot University, Thailand
  • Nanticha Sanjaitham Department of Mathematics, Faculty of Science, Srinakharinwirot University, Thailand
  • Angkana Kokaew Department of Mathematics, Faculty of Science, Srinakharinwirot University, Thailand

Keywords:

Bootstrap method, confidence interval , Monte Carlo simulation , regression model , skew-normal distribution

Abstract

Background and Objectives : Regression analysis is a widely used statistical method for predicting or forecasting the value of a dependent variable from given values of independent variables. Linear regression analysis has several underlying assumptions about errors, but in some situations, the data may not meet these assumptions, leading to prediction errors. To address this issue, researchers have developed nonparametric regression models that do not rely on assumptions about the distribution of errors, providing greater flexibility. The objective of this research was to investigate and compare the performance of linear regression models under nonparametric bootstrap methods. This research utilized two different nonparametric bootstrap methods; the percentile bootstrap method (also known as the bootstrap-p method) and the studentized bootstrap method (or bootstrap-t method). The bootstrap-p method is widely accepted due to its simple resampling technique, which involves repeatedly sampling with replacement to create multiple bootstrap samples and then calculating the distribution of the estimator. This method is particularly useful when the sample size is small, and the data distribution is unknown. In contrast, the bootstrap-t method enhances the bootstrap-p method by incorporating the concept of studentization. This involves adjusting the bootstrap samples according to the standard error of the estimator, resulting in more accurate confidence intervals and reduced bias.

Methodology : The performance of the bootstrap regression models was examined using Monte Carlo simulation, based on the distribution of error terms (equation ) with three types of skew-normal distributions; left-skewed, right-skewed, and symmetric, which are commonly observed in real data. The parameters of skew-normal distribution include a location parameter(equation), a scale parameter (equation2 ) , and a shape parameter (equation ). These parameters for data simulation were selected to reflect different levels of skewness and variance, ensuring a comprehensive evaluation of the performance of bootstrap regression analysis across the three distribution types. The parameters were set as follows:equation=0 ,equation2 =1,2,5,10 , and equation  =-1,0,1, i.e., equationequation SN(0, equation2, equation) . For the independent variable, data was simulated under normal distribution with parameters equationx=3 and equation2x=2.25  i.e., X equation  N(3,2.25). Sample sizes of 20, 30, 50, and 100 were used. Confidence intervals for regression coefficients and regression models were constructed using both bootstrap methods at a 95% confidence level, with 1,000 repetitions for each situation. The performance of the regression coefficient estimates was compared using the coefficient mean, the standard deviation of the regression coefficient, and the root mean squared error of the regression coefficient were presented as the criteria of point estimation. Also, the performance of interval estimation was compared with coverage probability, average width, and standard deviation of interval width. Additionally, both bootstrap regression models were applied to two real datasets: newborn birth weights and grade point averages, comparing the performance of the best regression models based on the coefficient of determination, Akaike’s information criterion, Bayesian information criterion, and mean squared error.

Main Results : The simulation study aimed at comparing the efficiency of confidence intervals for regression coefficients found that, as the sample size increases, the overall coverage probability increases, except in cases where the error distribution was skewed for the intercept parameter. Additionally, the average width of the interval, and the standard deviation of the interval width tend to decrease in all parameter sets. In most cases, the bootstrap-t method consistently achieved the highest coverage probability. The comparison of the performance of bootstrap regression models in parameter estimation indicated that, for all sample sizes, the mean regression coefficient, which is the bootstrap estimate of the regression coefficient, closely approximated the parameter values. Additionally, as the sample size increased, the standard deviation of the regression coefficient and the root mean squared error of the regression coefficient decreased in all cases. This implies that the efficiency of the regression models improves with larger sample sizes. Furthermore, for all sample sizes, the bootstrap-t regression models provided a lower root mean squared error of the regression coefficient compared to the bootstrap-p regression models, indicating better performance. However, in cases where the random error terms followed a symmetric distribution and the sample size was small, the bootstrap-p method demonstrated higher efficiency in estimating the regression coefficients.

Conclusions : The comparison of bootstrap regression models revealed that both methods provided estimates close to the true parameters, indicating their effectiveness in handling different error distributions. However, the bootstrap-t method consistently demonstrated superior performance, particularly in terms of the coverage probability and the root mean squared error, which are key measures of accuracy in estimating regression coefficients. This suggests that the bootstrap-t method is more reliable, making it a more credible choice for constructing confidence intervals and estimating regression coefficients. As the sample size increased, the accuracy and reliability of the bootstrap methods improved, resulting in narrower confidence intervals and more precise regression coefficient estimates. Additionally, the results obtained from applying the bootstrap methods to real data were consistent with the simulations study under skew-normal distributions. Therefore, bootstrap methods are alternative methods for estimating confidence intervals and regression coefficients. They enhance the predictive efficiency of regression models, especially when parametric assumptions are violated. This study is significant and beneficial for both researchers and practitioners. It emphasizes the thorough consideration of sample size and error distribution to select the appropriate regression method.

References

Davison, A.C., & Kuonen, D. (2002). An Introduction to the Bootstrap with Applications in R. Statistical Computing & Statistical Graphics Newsletter, 13(1), 6-11.

Dikta, G., & Scheer, M. (2021). Bootstrap methods: with applications in R. Springer Nature.

Eck, D. J. (2018). Bootstrapping for multivariate linear regression models. Statistics & Probability Letters, 134, 141-149.

Efron, B. (1979). Bootstrap Methods: Another Look at the Jackknife. The Annals of Statistic, 7(1), 1-26.

Efron, B. (1982). The Jackknife, The Bootstrap and Other Resampling Plans. Society for industrial and applied mathematics.

Efron, B., & Tibshirani, R. J. (1994). An Introduction to the Bootstrap. (1st ed.). New York: Chapman & Hall/CRC.

Figueiredo, F., & Gomes, M.I. (2012). The Skew-Normal Distribution in SPC. REVSTAT-Statistical Journal, 11(1), 83-104.

Hall, P. (1988). Theoretical Comparison of Bootstrap Confidence Intervals. The Annals of Statistics, 16(3), 927-953.

Kashif, M., Aslam, M., Rao, G. S., AL-Marshadi, A. H., & Jun, C. H. (2017). Bootstrap Confidence Intervals of the Modified Process Capability Index for Weibull Distribution. Arabian Journal for Science and Engineering, 42, 4565-4573.

Kokaew, A., Thaithanan, J., Bodhisuwan, W., & Volodin, A. (2021). Confidence Estimation of a Ratio of Binomial Proportions for Dependent Populations. Lobachevskii Journal of Mathematics, 42(2), 394-403.

Mokhtar, S. F., Yusof, Z. M., & Sapiri, H. (2023). Confidence intervals by bootstrapping approach: a significance review. Malaysian Journal of Fundamental and Applied Sciences, 19(1), 30-42.

R Core Team. (2022). R: A Language and Environment for Statistical Computing. Vienna, Austria, Retrieved from https://www.r-project.org/.

Saha, M., Dey, S., & Maiti, S. S. (2018). Parametric and non-parametric bootstrap confidence intervals of CNpk for exponential power distribution. Journal of Industrial and Production Engineering, 35(3), 160-169.

Thiuthad, P. (2021). The Univariate Skew-Normal Distribution. The Jornal of Applied Science, 20(2), 293-304. (in Thai)

Wilcox, R. R. (2017). Introduction to Robust Estimation and Hypothesis Testing. (4th ed.). San Diego, CA: Academic Press.

Downloads

Published

2025-07-24

How to Cite

Arunnobpharat, A. . ., Damon, A. . ., Sanjaitham , N. . ., & Kokaew, A. (2025). Performance Comparison of the Linear Regression Model Based on Nonparametric Bootstrap Methods. Burapha Science Journal, 30(2 May-August), 720–739. retrieved from https://li05.tci-thaijo.org/index.php/buuscij/article/view/702