ru / en  #### name

Orlov Alexander Ivanovich

professor

#### Research interests

статистические методы, организационно-экономическое моделирование. Разработал новую область прикладной статистики — статистику объектов нечисловой природы

0

## Articles count: 145

• Description
The correlation and determination coefficients are widely used in statistical data analysis. According to measurement theory, Pearson's linear paired correlation coefficient is applicable to variables measured on an interval scale. It cannot be used in the analysis of ordinal data. The nonparametric Spearman and Kendall rank coefficients estimate the relationship of ordinal variables. The critical value when testing the significance of the difference of the correlation coefficient from 0 depends on the sample size. Therefore, using the Chaddock Scale is incorrect. When using a passive experiment, the correlation coefficients are reasonably used for prediction, but not for control. To obtain probabilistic-statistical models intended for control, an active experiment is required. The effect of outliers on the Pearson correlation coefficient is very large. With an increase in the number of analyzed sets of predictors, the maximum of the corresponding correlation coefficients — indicators of approximation quality noticeably increases (the effect of “inflation” of the correlation coefficient). Four main regression analysis models are considered. Models of the least squares method with a determinate independent variable are distinguished. The distribution of deviations is arbitrary, however, to obtain the limit distributions of parameter estimates and regression dependences, we assume that the conditions of the central limit theorem are satisfied. The second type of model is based on a sample of random vectors. The dependence is nonparametric, the distribution of the two-dimensional vector is arbitrary. The estimation of the variance of an independent variable can be discussed only in the model based on a sample of random vectors, as well as the determination coefficient as a quality criterion for the model. Time series smoothing is discussed. Methods of restoring dependencies in spaces of a general nature are considered. It is shown that the limiting distribution of the natural estimate of the dimensionality of the model is geometric, and the construction of an informative subset of features encounters the effect of "inflation coefficient correlation". Various approaches to the regression analysis of interval data are discussed. Analysis of the variety of regression analysis models leads to the conclusion that there is no single “standard model”
• Description
There is a need to clean up the classification methods. This will increase their role in solving applied problems, in particular, in the diagnosis of materials. For this, first of all, it is necessary to develop requirements that classification methods must satisfy. The initial formulation of such requirements is the main content of this work. Mathematical classification methods are considered as part of the applied statistics methods. The natural requirements to the considered methods of data analysis and the presentation of calculation results arising from the achievements and ideas accumulated by the national probabilistic and statistical scientific school are discussed. Concrete recommendations are given on a number of issues, as well as criticism of individual errors. In particular, data analysis methods must be invariant with respect to the permissible transformations of the scales in which the data are measured, i.e. methods should be adequate in the sense of measurement theory. The basis of a specific statistical method of data analysis is always one or another probabilistic model. It should be clearly described, its premises justified - either from theoretical considerations, or experimentally. Data processing methods intended for use in real-world problems should be investigated for stability with respect to the tolerances of the initial data and model premises. The accuracy of the solutions given by the method used should be indicated. When publishing the results of statistical analysis of real data, it is necessary to indicate their accuracy (confidence intervals). As an estimate of the predictive power of the classification algorithm, it is recommended to use predictive power instead of the proportion of correct forecasts. Mathematical research methods are divided into "exploratory analysis" and "evidence-based statistics." Specific requirements for data processing methods arise in connection with their "docking" during sequential execution. The article discusses limits of applicability of probabilistic-statistical methods. Concrete statements of classification problems and typical errors when applying various methods for solving them are also considered
• Description
When solving some problems of economics and management at an enterprise, it becomes necessary to determine the retail price of a product or service at a known wholesale price or producer price. We offer to determine the retail price based on an analysis of a survey of potential consumers about the maximum possible price for the product or service in question. We calculate the retail price on the basis of optimizing the economic effect equal to the product of the result from the sale of one unit of goods by the demand function, which we estimate by interviewing consumers. To solve the optimization problem, we approximate the demand function using the least squares method. As examples, the linear and power models of the demand function are analyzed. Ways of further development of the proposed approach are discussed. Unresolved scientific problems are formulated. Methods for estimating the demand function in the context of a large number of repetitions of respondents and their tendency to “round numbers” require further elaboration, as a result of which the Kolmogorov criterion cannot be used to determine the accuracy of the restoration of the demand function. Various parametric and non-parametric approaches of regression analysis should be adapted to the problem of restoring the dependence of demand on price, as well as methods for solving the corresponding optimization problems
• Description
The new paradigm of mathematical research methods allows us to give a systematic analysis of various statements of statistical analysis problems and methods for solving them, based on a probabilistic-statistical model of generating data accepted by the researcher. Methods for testing the homogeneity of two independent samples - a classic area of mathematical statistics. For more than 110 years since the publication of the fundamental Student’s article, various criteria have been developed for testing the statistical hypothesis of homogeneity in various statements, and their properties have been studied. However, the need for streamlining the totality of the scientific results found is urgent. It is necessary to analyze the whole variety of problem statements for testing the statistical hypotheses of the homogeneity of two independent samples, as well as the corresponding statistical criteria. This analysis is devoted to this article. It contains a summary of the main results concerning the methods for testing the homogeneity of two independent samples, and a comparative study of them, allowing the system to analyze the diversity of such methods in order to select the most appropriate for processing specific data. Based on the basic probabilistic-statistical model, the main statements of the problem of testing the homogeneity of two independent samples are formulated. A comparative analysis of the Student and Cramer - Welch criteria, designed to test the homogeneity of mathematical expectations, is given, a recommendation on the widespread use of the Cramer - Welch criterion is substantiated. From nonparametric methods for testing homogeneity, the criteria of Wilcoxon, Smirnov, Lehmann - Rosenblatt are considered. Dismantled two myths about the Wilcoxon criteria. Based on the analysis of the publications of the founders, the incorrectness of the term "Kolmogorov – Smirnov criterion" is shown. To verify absolute homogeneity, i.e. coincidence of the distribution functions of samples, it is recommended to use the Lehmann - Rosenblatt criterion. The current problems of the development and application of nonparametric criteria are discussed, including the difference between nominal and real significance levels, making it difficult to compare power of criteria, and the need to take into account coincidences of sample values (from the point of view of the classical theory of mathematical statistics, the probability of coincidences is 0)
• Description
In 1979, non-numerical data statistics was singled out as an independent area of applied statistics. Initially, the term "statistics of objects of non-numerical nature" was used to denote this area of mathematical methods of economics. Our basic non-numeric statistics textbook is called "Non-Numeric Statistics". Non-numerical data statistics is one of the four main areas of applied statistics (along with number statistics, multidimensional statistical analysis, statistics of time series and random processes). Statistics of non-numerical data are divided into statistics in spaces of a general nature and sections devoted to specific types of non-numerical data (statistics of interval data, statistics of fuzzy sets, statistics of binary relations, etc.). Currently, statistics in spaces of a general nature is the central part of applied statistics, and non-numeric data statistics including it is the main area of applied statistics. This statement is confirmed, in particular, by the analysis of publications in the section "Mathematical Research Methods" of the journal "Industrial Laboratory. Diagnostics of Materials" - the main place of publication of russian studies on applied statistics. This article is devoted to the analysis of the basic ideas of non-numerical data statistics against the background of the development of applied statistics from the perspective of a new paradigm of mathematical research methods. Various types of non-numeric data are described. The historical path of statistical science is analyzed. We have discussed the development of statistics of non-numerical data. The article analyzes basic ideas of statistics in spaces of a general nature: average values, laws of large numbers, extreme statistical problems, nonparametric estimates of the probability density, classification methods (diagnostics and cluster analysis), statistics of the integral type. Some statistical methods for analyzing data lying in specific spaces of non-numeric nature are briefly considered: non-parametric statistics (real distributions usually differ significantly from normal), statistics of fuzzy sets, theory of expert estimates (the Kemeny median is a sample average of expert orderings), etc. We have also discussed some unsolved problems in statistics of nonnumeric data
• Description
Dynamic programming is designed to solve discrete optimal control problems. According to this method, the optimal solution in a multidimensional problem is found by decomposing it into stages, each of which represents a subproblem with respect to one variable. In economic problems, the number of stages is the planning horizon. The choice of a planning horizon is necessary for a rigorous statement of the applied problem in the field of economics and management, but it is often difficult to justify. We see a way out in the use of asymptotically optimal plans for which the values of the optimization criterion differ little from its values for optimal plans for all sufficiently large planning horizons. The main result of the paper is the existence of an asymptotically optimal plan. The proof is carried out in several statements. If the sum of the maximums of the transition functions tends to 0, the existence of an asymptotically optimal plan is obtained in Theorem 1. A special case is models with a discount at a discount coefficient less than 1. The main part of the article is devoted to models with a discount coefficient equal to 1. Theorem 2 on the highway is proved for base set of a finite number of elements. In Theorem 3, a statement is obtained on the approximation of an arbitrary set by a finite one. In the final Theorem 4, the existence of an asymptotically optimal plan is proved in the general case. The term “magistral” is associated with a well-known recommendation to drivers: in order to get from point A to point B, it is advisable to go to the highway (magistral) at the initial section of the road, and then exit the highway and get to point B. The recommendation for choosing the optimal one is similar trajectories using the Pontryagin maximum principle in the model of the optimal distribution of time between obtaining knowledge and developing skills. This fact underlines the methodological proximity of dynamic programming and the Pontryagin maximum principle
• Description
The instrumental methods of economics include the Monte Carlo method (statistical simulations method). It is widely used in the development, study and application of mathematical research methods in econometrics, applied statistics, organizational and economic modeling, in the development and making management decisions, in the basis of simulation modeling. The new paradigm of mathematical research methods developed by us is based on the use of the Monte Carlo method. In mathematical statistics, limit theorems on the asymptotic behavior of the considered random values were obtained for many methods of data analysis with an unlimited increase in sample volumes. The next step is to study the properties of these random values for finite sample sizes. For such a study, the Monte-Carlo method is used. In this article, we use this method to study the properties of statistical criteria for testing the homogeneity of two independent samples. We considered the most used in the analysis of real data criteria - Cramer-Welch, which coincides with the equality of the sample sizes with Student's criterion; Lord, Wilcoxon (Mann-Whitney), Wolfowitz, Van der Waerden, Smirnov, type omega-square (Lehmann-Rosenblatt). The Monte Carlo method allows us to estimate the rates of convergence of distributions of criteria statistics to the limits, to compare the properties of the criteria for finite sample sizes. To use the Monte Carlo method, it is necessary to select the distribution functions of the elements of the two samples. For this purpose, normal and Weibull – Gnedenko distributions are used. The recommendation was received: to test the hypothesis of coincidence of distribution functions of two samples, it is advisable to use the Lehmann-Rosenblatt (type omega-square) test. If there is reason to assume that the distributions differ mainly by the shift, then the Wilcoxon test and Van der Waerden criteria can also be used. However, even in this case, the omega-square type test may be more powerful. In the general case, besides the Lehmann-Rosenblatt criterion, the use of the Smirnov criterion is permissible, although for this criterion the real level of significance may differ from the nominal level of significance. We sstudied the frequency of discrepancies of statistical findings on different criteria
• Description
Among the widely used economic-mathematical models, dynamic programming plays an important role, and among them, models with discounting. The most famous example is the model for calculating the net present value (NPV) as an estimate of the efficiency of the investment project. In the article, it is clarified which features are distinguished by models with discounting among all models of dynamic programming. In models with discounting, the comparison of plans does not change when the time of the beginning of the implementation of plans changes, ie. there is a stability of the results of comparing plans. It is proved that if the results of comparing plans for 1 and 2 steps are stable in the dynamic programming model, then this model is a model with discounting. This theorem shows that the introduction of discounted functions for the estimation of the effect is justified only in stable economic conditions in which the orderliness of managerial decisions does not change from year to year. In other words, if at the beginning of the period under consideration the first solution is better than the second, then at all other times, up to the end of the period under consideration, the first solution is better than the second. Stable economic conditions are rarely found in the modern economy with its constant changes, including those caused by innovations. Therefore, the decision to choose (to implement) an investment project from a set of possible ones can not be based solely on the calculation of discounted project performance indicators, such as net present value and internal rate of return. Such indicators can only play a supporting role. Decide on the choice of an investment project for implementation is necessary on the basis of the whole range of social, technological, environmental, economic, political factors
• Description
According to measurement theory, statistical data are measured in various scales. The most widely used ordinal scale, scales of intervals and relations. Statistical methods of data analysis should correspond to the scales in which the data is measured. The term "correspondence" is specified with the help of the concepts of an adequate function and an allowable scale transformation. The main content of the article is a description of the average values that can be used to analyze data measured in the ordinal scale, interval and relationship scales, and some others. The main attention is paid to the means for Cauchy and the means for Kolmogorov. In addition to the mean, from this point of view, polynomials and correlation indices are also analyzed. Detailed mathematical proofs of characterization theorems are given for the first time in scientific periodicals. It is shown that in the ordinal scale there are exactly n average values, that can be used, namely, n order statistics. The proof is represented as a chain of 9 lemmas. In the scale of intervals from all Kolmogorov means, only the arithmetic mean can be used. In the scale of relations from all the Kolmogorov means, only the power means and the geometric mean are permissible. The kind of adequate polynomials in the relationship scale is indicated
• Description
Many procedures of applied mathematical statistics are based on the solution of extreme problems. As examples it is enough to name methods of least squares, maximum likelihood, minimal contrast, main components. In accordance with the new paradigm of applied mathematical statistics, the central part of this scientific and practical discipline is the statistics of non-numerical data (it is also called the statistics of objects of non-numerical nature or non-numeric statistics) in which the empirical and theoretical averages are determined by solving extreme problems. As shown in this paper, the laws of large numbers are valid, according to which empirical averages approach the theoretical ones with increasing sample size. Of great importance are limit theorems describing the asymptotic behavior of solutions of extremal statistical problems. For example, in the method of least squares, selective estimates of the parameters of the dependence approach the theoretical values, the maximum likelihood estimates tend to the estimated parameters, etc. It is quite natural to seek to study the asymptotic behavior of solutions of extremal statistical problems in the general case. The corresponding results can be used in various special cases. This is the theoretical and practical use of the limiting results obtained under the weakest assumptions. The present article is devoted to a series of limit theorems concerning the asymptotics of solutions of extremal statistical problems in the most general formulations. Along with the results of probability theory, the apparatus of general topology is used. The main differences between the results of this article and numerous studies on related topics are: we consider spaces of a general nature; the behavior of solutions is studied for extremal statistical problems of general form; it is possible to weaken ordinary requirements of bicompactness type by introducing conditions of the type of asymptotic uniform divisibility