08.00.13 Mathematical and instrumental methods of Economics
-
PROBABILITY-STATISTICAL MODELS OF CORRELATION AND REGRESSION
08.00.13 Mathematical and instrumental methods of Economics
Description
The correlation and determination coefficients are widely used in statistical data analysis. According to measurement theory, Pearson's linear paired correlation coefficient is applicable to variables measured on an interval scale. It cannot be used in the analysis of ordinal data. The nonparametric Spearman and Kendall rank coefficients estimate the relationship of ordinal variables. The critical value when testing the significance of the difference of the correlation coefficient from 0 depends on the sample size. Therefore, using the Chaddock Scale is incorrect. When using a passive experiment, the correlation coefficients are reasonably used for prediction, but not for control. To obtain probabilistic-statistical models intended for control, an active experiment is required. The effect of outliers on the Pearson correlation coefficient is very large. With an increase in the number of analyzed sets of predictors, the maximum of the corresponding correlation coefficients — indicators of approximation quality noticeably increases (the effect of “inflation” of the correlation coefficient). Four main regression analysis models are considered. Models of the least squares method with a determinate independent variable are distinguished. The distribution of deviations is arbitrary, however, to obtain the limit distributions of parameter estimates and regression dependences, we assume that the conditions of the central limit theorem are satisfied. The second type of model is based on a sample of random vectors. The dependence is nonparametric, the distribution of the two-dimensional vector is arbitrary. The estimation of the variance of an independent variable can be discussed only in the model based on a sample of random vectors, as well as the determination coefficient as a quality criterion for the model. Time series smoothing is discussed. Methods of restoring dependencies in spaces of a general nature are considered. It is shown that the limiting distribution of the natural estimate of the dimensionality of the model is geometric, and the construction of an informative subset of features encounters the effect of "inflation coefficient correlation". Various approaches to the regression analysis of interval data are discussed. Analysis of the variety of regression analysis models leads to the conclusion that there is no single “standard model”