Predictive Modelling: An Assessment Through Validation Techniques

  • M. Iqbal Jeelani Division of Statistics and Computer Science, Faculty of Basic Sciences, SKUAST-J, J&K, India
  • Faizan Danish Division of Biostatistics, Department of Population Health, School of Medicine, New York University, New York, NY, USA
  • Saquib Khan Division of Statistics and Computer Science, Faculty of Basic Sciences, SKUAST-J, J&K, India
Keywords: Cross validation, prediction error rate, linear and non-linear model

Abstract

In this investigation, various statistical models were fitted on simulated symmetric and asymmetric data. Fitting of models was carried out with the help of various libraries in R studio, and various selection criteria were also used while fitting of models. In order to evaluate different validation techniques the simulated data was divided in training and testing data set and various functions in R were developed for the purpose of validation. Coefficient summary revealed that all statistical models were statistically significant across both symmetric as well as asymmetric distributions. In preliminary analysis TFEM (Type First Exponential Model) was found out to be the best linear model across both symmetric and asymmetric distributions with lower values of RMSE, MAE, BIAS, AIC and BIC. Among non-linear models, Haung model was found out to be best model across both the distributions as it has lower values of RMSE, MAE etc. Different validation techniques were used in the present study. Lower rates of prediction error in comparison to its counter parts, 5-folded cross validation performed better across all the statistical models.

Downloads

Download data is not yet available.

Author Biographies

M. Iqbal Jeelani, Division of Statistics and Computer Science, Faculty of Basic Sciences, SKUAST-J, J&K, India

M. Iqbal Jeelani is working as an Assistant Professor of Statistics at Division of Statistics and Computer Sciences, Faculty of Basic Sciences, Sher-e-Kashmir University of Agricultural Sciences & Technology of Jammu, J&K, India. He has obtained his B.Sc degree in Forestry from Wadura College, Sher-e-Kashmir University of Agricultural Sciences & Technology of Kashmir, J&K, India in 2008 and M. Sc (2011), Ph. D (2014) in Agricultural Statistics from Division of Agricultural Statistics, SKUAST-Kashmir and is a receipt of Gold Medal in M.Sc. Dr. Jeelani has research specialization in Applied Statistics and is well versed with R-software. Besides engaged in teaching undergraduate and postgraduate students, he has published large number of papers in reputed international and National journals of Statistics. Further, he has guided various postgraduate students in the field of Statistics as Major Advisor.

Faizan Danish, Division of Biostatistics, Department of Population Health, School of Medicine, New York University, New York, NY, USA

Faizan Danish is currently working as a Postdoctoral Fellow in Division of Biostatistics, Department of Population Health, School of Medicine, New York University, New York, 10016, USA. He has received Doctor of Philosophy (Ph.D.) in Statistics with specialization in Sampling Theory and Operations Research from Division of Statistics and Computer Sciences, Faculty of Basic Sciences, Sher-e-Kashmir University of Agricultural Sciences & Technology of Jammu, J&K, India and has graduated in Mathematics, Statistics and Economics from University of Kashmir, J&K, India. Dr. Danish has around 2 years of teaching experience and published around 25 research papers in reputed journals. Further, he has worked as Biostatiscian under Research Consultation Services, Doha Qatar. He has research expertise in Sampling Theory, Mathematical Programming, Applied Statistics and Biostatistics. Dr. Danish has proposed several methods for obtaining stratification points utilizing the classical technique as well as Mathematical Programming approach. Dr. Danish is well versed with Statistical Software’s: R, STATA, SPSS, Python Matlab, Fortran 77, O.P STAT, WINDOSTAT, Mathematica etc and have completed several online courses related to software’s from prominent global universities. Dr. Danish is a respected member of several statistical associations such as American Statistical Association, Institute of Mathematical Statistics, International Indian Statistical Association and others, and reviewer of several reputed journals.

Saquib Khan, Division of Statistics and Computer Science, Faculty of Basic Sciences, SKUAST-J, J&K, India

Saquib Khan is pursuing Master’s degree in Statistics in Division of Statistics and Computer Sciences, Faculty of Basic Sciences, Sher-e-Kashmir University of Agricultural Sciences & Technology of Jammu, J&K, India. He is working in Applied statistics utilizing the Cross Validation technique to choose different models suitable for several data sets. He has done his Bachelor’s in Agriculture from SKUAST-Jammu. Mr. Saquib is trying to explore the Applied Statistics utilizing different platforms.

References

Bazán, J.L., Bolfarine, H., Branco, M.D. (2010): A framework for skew-probit links in binary regression. Commun. Stat. Theory Methods 39, 678–697.

Bennett, P. N. (2003): Using Asymmetric Distributions to Improve Text Classifier Probability Estimates. In SIGIR ’03: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, 111–118.

Biging (eds) Forest Simulation Systems, Proc. of IUFRO Conf., 2–5 Nov. 1988. Univ. Calif., Div. Agric. and Nat. Res., Bulletin 1927, pp. 81–88.

Burk, T.E. (1990): Prediction error evaluation: preliminary results. In L.C. Wensel and G.S. Chen, M.H., Dey, D.K., Shao, Q.M. (1999): A new skewed link model for dichotomous quantal response data. J. Am. Stat. Assoc. 94(448), 1172–1186

Efron, B. and Gong, G., (1983): A leisurely look at the bootstrap, the jackknife and crossvalidation. Amer. Statist. 37:36–48.

Feng, C., Wang, H., Lu, N., Chen, T., He, H., Lu, Y. (2014): Log-transformation and its implications for data analysis. Shanghai Arch. Psychiatry, 26, 105.

Hassani, H., Yeganegi, M.R., Khan, A., Silva, E.S. (2020): The effect of data transformation on singular spectrum analysis for forecasting. Signals, 1, 2.

Hsu, C.-w., Chang, C.-c., and Lin, C.-j. (2010): A Practical Guide to Support Vector Classification.

Hirsch, R.P. (1991) Validation samples. Biometrics 47:1193–1194.

Kato, T., Omachi, S., and Aso, H. (2002): Asymmetric Gaussian and Its Application to Pattern Recognition. In Structural, Syntactic, and Statistical Pattern Recognition, volume 2396 of Lecture Notes in Computer Science. 405–413.

Kowalski J, Tu XM. (2007): Modern Applied U Statistics. New York: Wiley.

Larson, S. (1931): The shrinkage of the coefficient of multiple correlations. Journal of Educational Psychology, 22(1): 45–55.

Mosteller, F. and Turkey, J.W. (1968): Data Analysis, Including Statistics. In Handbook of Social Psycholog, Addison-Wesley. pp. 601–720.

Shifley, S.R. (1987): A generalized system of models forecasting Central States growth. USDA For. Serv., Res. Pap. NC-279. 10 p.

Snee, R. D. (1977): Validation of regression models: Methods and examples. Technometrics, 19: 415–428.

Tang W, He H, Tu XM. (2012): Applied categorical and count data analysis. FL: Chapman & Hall/CRC .

Tarp-Johansen, M.J., Skovsgaard, J.P., Madsen, S.F., Johannsen, V.K. and Skovgaard, I. (1996): Compatible stem taper and stem volume functions for oak in Denmark. Annales des Sciences Forestières, in press.

Vanclay, J. K. (1994): Modelling forest growth: Application to mixed tropical forests. CAB International, Wallingford.

Weisberg, S. (1985): Applied Linear Regression, 2nd ed. Wiley, NY, xiv+324 pp.

Published
2022-02-14
Section
Articles