Институт проблем информатики Российской Академии наук
Институт проблем информатики Российской Академии наук
Российская Академия наук

Институт проблем информатики Российской Академии наук




«INFORMATICS AND APPLICATIONS»
Scientific journal
Volume 15, Issue 1, 2021

Content | About  Authors

Abstract and Keywords

NORMAL SUBOPTIMAL FILTERING FOR DIFFERENTIAL STOCHASTIC SYSTEMS WITH UNSOLVED DERIVATIVES
  • I. N. Sinitsyn  Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation

Abstract: The article develops the series of papers dedicated to stochastic systems with unsolved derivatives. Methodological aspects of normal suboptimal filterings (NSOF) for stochastic systems with unsolved derivatives are presented. Nonlinear differential equations for state and observation are given at the following conditions: observation equations are Gaussian and do not depend on the state variable. One of sections is devoted to NSOF for Gaussian and non-Gaussian systems. Corresponding NSOF are given for additive noises. Also, an illustrative example is given. The NSOF quality analysis is considered.

Keywords: method of analytical modeling (MAM); method of normal approximation (MNA); method of statistical linearization (MSL); normal suboptimal filter; stochastic system (StS); stochastic systems with unsolved derivatives; shaping filter

ON SOME SPECIAL CASES IN THE PROBLEM OF STOCHASTIC DIFFERENTIAL SYSTEM OUTPUT CONTROL BY THE QUADRATIC CRITERION
  • A. V. Bosov  Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation

Abstract: A general study of the optimal control problem for the Ito diffusion process and linear controlled output with the quadratic quality criterion was carried out in the author's previous works (coauthored by A. I. Stefanovich).
An analysis of the available results allows to single out some models that are of a particular nature in relation to the general setting but have special practical significance. This article examines two such particular models. The first model is determined by the assumption of linear drift in the equation of state while maintaining nonlinear diffusion. It is shown that such a model provides linearity to the optimal control and the absence of the need to solve a parabolic equation for its implementation. But in this case, the quadratic Bellman function does not appear in the problem, the corresponding expression, as in the general case, is described by the solution of a parabolic equation and retains a meaningful stochastic interpretation expressed by the Feynman-Katz formula. The second model implements the assumption about the dependence of disturbances in the equations of state and output. The modified dynamic programming equation is solved in the same way as in the general case considered in previous works, including and within the framework of a combined model involving both cases presented. This model will be especially useful in the problems with incomplete information, when the assumption of the presence of complete information about the state and output will be replaced by a description of the observation system, in which the output is interpreted as indirect observations of the state. A numerical example, studied in detail in the author's previous works (coauthored by A. I. Stefanovich), is briefly discussed, since it turns out that it satisfies the assumption of linear drift in the equation of state and, accordingly, the previously obtained approximate solutions can be refined.

Keywords: stochastic differential equation; optimal control; system output control; stochastic differential systems with multiplicative and dependent disturbances

CONNECTIVITY OF CONFIGURATION GRAPHS IN COMPLEX NETWORK MODELS
  • Yu. L. Pavlov  Institute of Applied Mathematical Research of the Karelian Research Centre of the Russian Academy of Sciences, 11 Pushkinskaya Str., Petrozavodsk 185910, Russian Federation

Abstract: The author considers configuration graphs whose degrees of vertices are independent and identically distributed according to the generalized power-law distribution. Connections between vertices are equiprobably formed in compliance with their degrees. Such random graphs are often used for modeling complex communication networks like the Internet and social networks. It is assumed that the distribution of vertex degrees is unknown because it depends on a slowly varying function with unknown properties. The conditions are found under which a graph is asymptotically almost surely connected as the number of vertices tends to infinity. Under these conditions, estimates of the convergence rate to zero of the probability that the graph is not connected are obtained. The results in the present paper are proved using the properties of stable distributions and slowly varying functions.

Keywords: random graphs; configuration graphs; random vertex degrees; graph connectivity

METHODS OF THE CATEGORY THEORY IN DIGITAL DESIGN OF HETEROGENEOUS CYBER-PHYSICAL SYSTEMS
  • S. P. Kovalyov  V. A. Trapeznikov Institute of Control Sciences, Russian Academy of Sciences, 65 Profsoyuznaya Str., Moscow 117997, Russian Federation

Abstract: A mathematical device built upon the category theory is developed which was previously proposed to formally describe and rigorously explore engineering procedures based on mathematical and computer modeling.
With the help of the device, highly automated procedures for designing heterogeneous cyber-physical systems on top of digital twins, demanded by the upcoming fourth industrial revolution, are described and explored. For this purpose, the novel construction of the multicomma category is introduced, whose objects are the architectural models of a heterogeneous cyber-physical system with a certain fixed structural hierarchy scheme represented from a certain architecture viewpoint, and morphisms describe actions associated with selection ofconstituents for assembling a system from them. The application of the multicomma category in solving direct and inverse problems of designing individual systems and complex systems of systems is considered.

Keywords: cyber-physical system; digital twin; generative design; system of systems; category theory; multicomma category

METHODS OF CROSS-LINGUAL TEXT REUSE DETECTION IN LARGE TEXTUAL COLLECTIONS
  • R. V. Kuznetsova  Moscow Institute of Physics and Technology, 9 Institutskiy Per., Dolgoprudny, Moscow Region 141700, Russian Federation
  • O. Yu. Bakhteev  Moscow Institute of Physics and Technology, 9 Institutskiy Per., Dolgoprudny, Moscow Region 141700, Russian Federation, Antiplagiat Co., 42-1 Bolshoy Blvd., Moscow 121205, Russian Federation
  • Yu. V. Chekhovich  A. A. Dorodnicyn Computing Center, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 40 Vavilov Str., Moscow 119333, Russian Federation

Abstract: The paper investigates the cross-lingual text reuse detection problem. The paper proposes a monolingual approach to this problem: to translate the suspicious document into the language of the collection for the further monolingual analysis. One of the major requirements for the proposed method is robustness to the machine translation ambiguity. The further document analysis is divided into two steps. At the first step, the authors retrieve documents-candidates which are likely to be the source of the text reuse. For the robustness, the authors propose to retrieve the documents using word clusters that are constructed using distributional semantics. At the second step, the authors compare the suspicious document with candidates using sentence embeddings that are obtained by deep learning neural networks. The experiment was conducted for the "English-Russian" language pair both on the synthetic data and on the articles included in the Russian Science Citation Index.

Keywords: natural language processing; machine translation; deep learning; cross-lingual text reuse detection; distributional semantics

VARIATIONAL DEEP LEARNING MODEL OPTIMIZATION WITH COMPLEXITY CONTROL
  • O. S. Grebenkova  Moscow Institute of Physics and Technology, 9 Institutskiy Per., Dolgoprudny, Moscow Region 141700, Russian Federation
  • O. Yu. Bakhteev  Moscow Institute of Physics and Technology, 9 Institutskiy Per., Dolgoprudny, Moscow Region 141700, Russian Federation, Antiplagiat Co., 42-1 Bolshoy Blvd., Moscow 121205, Russian Federation
  • V. V. Strijov  Moscow Institute of Physics and Technology, 9 Institutskiy Per., Dolgoprudny, Moscow Region 141700, Russian Federation, A. A. Dorodnicyn Computing Center, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 40 Vavilov Str., Moscow 119333, Russian Federation

Abstract: This paper investigates the problem of deep learning model optimization. The authors propose a method to control model complexity. The minimum description length is interpreted as the complexity of the model. It acts as the minimal amount of information that is required to transfer information about the model and the dataset. The proposed method is based on representation of a deep learning model. The authors propose a form of a hypernet using the Bayesian inference. A hypernet is a model that generates parameters of an optimal model. The authors introduce probabilistic assumptions about the distribution of parameters of the deep learning model. The paper suggests maximizing the evidence lower bound of the Bayesian model validity. The authors consider the evidence bound as a conditional value that depends on the required model complexity. The authors analyze this method in computational experiments on the MNIST dataset.

Keywords: model variational optimization; hypernets; deep learning; neural networks; Bayesian inference; model complexity control

INFORMATION MODEL OF AIRCRAFT WEIGHT PROFILE
  • L. L. Vyshinsky  A. A. Dorodnicyn Computing Center, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 40 Vavilov Str., Moscow 119333, Russian Federation
  • Yu. A. Flerov  A. A. Dorodnicyn Computing Center, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 40 Vavilov Str., Moscow 119333, Russian Federation

Abstract: The article is devoted to the description of the information model of the weight profile of an aircraft. The weight profile of an aircraft is understood as a set of interconnected information objects containing a description of the structure, parameters, and characteristics of the aircraft sufficient for weight calculations, weight analysis, and weight control at all stages of the product life cycle. The described information model can serve as the basis for a scheme of a database in the development of automated weight design systems. In the article, the weight model of an aircraft is described in terms of network data structures.

Keywords: design automation; aircraft; weight design; weighting model; design tree; project generator

OPTIMAL THRESHOLD-BASED ADMISSION CONTROL IN THE M/M/s SYSTEM WITH HETEROGENEOUS SERVERS AND A COMMON QUEUE
  • Ya. M. Agalarov  Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation

Abstract: The article discusses the M/M/s system with heterogeneous servers and a common queue equipped with the mechanism to control the queue length in order to maximize the average marginal profit. The profit function includes a fee for successfully serviced customers, a fine for each rejected customer, a fine for idle period for each server, a fine for waiting (or for exceeding the allowable waiting time), and costs associated with queue maintenance.
The problem is to maximize the marginal profit on a set of simple threshold-based queue length control policies.
The property of convexity of the profit function is proved and conditions for existence of a finite optimal threshold of the queue length are obtained.

Keywords: queuing system; optimization; threshold strategy; queue length

PROBABILISTIC CHARACTERISTICS OF BALANCE INDEX OF FACTORS WITH GENERALIZED GAMMA DISTRIBUTION
  • E. N. Arutyunov  Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
  • A. A. Kudryavtsev  Faculty of Computational Mathematics and Cybernetics, M. V. Lomonosov Moscow State University, 1-52 Leninskie Gory, GSP-1, Moscow 119991, Russian Federation
  • Iu. N. Nedolivko  Faculty of Computational Mathematics and Cybernetics, M. V. Lomonosov Moscow State University, 1-52 Leninskie Gory, GSP-1, Moscow 119991, Russian Federation

Abstract: The main probabilistic characteristics of balance index in Bayesian formulation, assuming that negative and positive factors have a priori generalized gamma distribution, are given. The formulation of this problem is equivalent to a study of generalized gamma laws scale mixture characteristics. Special attention is paid to the case in which the factors distributions have shape parameters of different signs. Moment characteristics and different presentation of density in terms of gamma-exponential function, H-function, Macdonald function, and generalized hypergeometric function are given. The analysis method is based on Mellin transform and its inverse transform. New properties of gamma-exponential function are given. The obtained results can be widely applied within the natural science models that use distributions with positive unlimited support to describe processes and phenomena.

Keywords: Bayesian approach; generalized gamma distribution; gamma-exponential function; balance models; Mellin transform; H-function; hypergeometric function

NONASYMPTOTIC ANALYSIS OF BARTLETT-NANDA-PILLAI STATISTIC FOR HIGH-DIMENSIONAL DATA
  • A. A. Lipatiev  Department of Mathematical Statistics, Faculty of Computational Mathematics and Cybernetics, M. V. Lomonosov Moscow State University, 1-52 Leninskiye Gory, GSP-1, Moscow 119991, Russian Federation

Abstract: The author gets the computable error bounds for normal approximation of Bartlett-Nanda-Pillai statistic when dimensionality grows proportionally to the sample size. This result enables one to get more precise calculations of the p-values in applications of multivariate analysis. In practice, more and more often, analysts encounter situations when the number of factors is large and comparable with the sample size. The examples include signal processing. The proof is essentially based on the normality of the distribution of the elements of the matrices under consideration with the Wishart distribution. For random variables that are the matrix traces of the product and squares of matrices with the normalized Wishart distribution, convenient upper bounds for 1 - F are found where F is the distribution function of the corresponding matrix trace. Applying the properties of inverse matrices and positive semidefinite matrices, the Bartlett-Nanda-Pillai statistic is bounded from above by a combination of the above-mentioned matrix traces.

Keywords: computable estimates; accuracy of approximation; MANOVA; computable error bounds; Bartlett- Nanda-Pillai statistic; high-dimensional data

AN ARCHITECTURE FOR DISTRIBUTED DATA ANALYSIS PROBLEM SOLVING IN NEUROPHYSIOLOGY
  • D. O. Briukhov  Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
  • S. A. Stupnikov  Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
  • D. Yu. Kovalev  Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
  • I. A. Shanin  Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation

Abstract: The growth of volume and variety of data in the field of neurophysiology increases the need of the application of computer science methods such as statistical analysis, machine learning, and neural networks for the data analysis. Infrastructures providing storage of a large volume of data in neurophysiology as well as data distributed processing and analysis are required. This article proposes a software architecture for the problem solving based on the Hadoop distributed storage and analysis framework and GPU-assisted high-performance computing technologies.

Keywords: neurophysiology; neurophysiological resources; neuroinformatics; data intensive research; problem solving infrastructure; analysis of neurophysiological data

REPRESENTATION OF NEW LEXICOGRAPHICAL KNOWLEDGE IN DYNAMIC CLASSIFICATION SYSTEMS
  • A. A. Goncharov  Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
  • I. M. Zatsman  Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
  • M. G. Kruzhkov  Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation

Abstract: The distinctive feature of dynamic classification systems is that new categories may be introduced in the course of their use or definitions of existing categories may be modified, including cases of rearranging semantic content between categories. On one hand, this feature of dynamic classification systems provides a possibility to integrate new knowledge on-the-fly and to start using it immediately for linguistic annotation. On the other hand, if a category is changed, then, in some cases, the annotations it has been previously applied to will have to be reclassified. This paper has a twofold purpose, which is, first, to compare approaches to classification of entities based on (i) dynamic classification systems and (ii) ontologies that change over time; and then, second, to describe how new lexicographical knowledge is represented in dynamic classification systems.

Keywords: dynamic classification system; ontology versioning; linguistic annotation; reclassification of annotations

PROBLEM-ORIENTED UPDATING OF DICTIONARY ENTRIES OF BILINGUAL DICTIONARIES AND MEDICAL TERMINOLOGY: COMPARATIVE ANALYSIS
  • I. M. Zatsman  Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation

Abstract: Two approaches to the goal-oriented discovery of new knowledge from text data are compared. The first approach relates to the subject domain of lexicography. It focuses on extracting new meanings of linguistic units from texts to replenish the dictionary entries of bilingual dictionaries. The second approach relates to medical science and focuses on discovering new meanings of terms to update a disease's description in the form of its terminological portrait. The portrait includes definitions of terms with reflecting their dynamics over time, relationships between terms, contexts of their use, and links to sources of contexts. These approaches are compared in the following positions: the problem for the solution of which new knowledge is discovered, the purpose of its discovery, sources of concepts of new knowledge, the standard, comparison with which uses as the criterion of concepts' novelty, concept-source linkages, and concept dynamics. The purpose of the paper is to describe the outcomes of the comparative analysis of the approaches. It is proposed to position analysis outcomes as initial data for creating the conception of a human-artificial intelligence system for goal-oriented discovery of new knowledge from big data which is applicable in different subject domains.

Keywords: new knowledge generation; discovering knowledge from texts; artificial intelligence; human-artificial intelligence system

MODELING OF THE STOCHASTIC DYNAMICS OF CHANGES IN NODE STATES AND PERCOLATION TRANSITIONS IN SOCIAL NETWORKS WITH SELF-ORGANIZATION AND MEMORY
  • D. O. Zhukov  Russian Technological University (MIREA), 78 Vernadskogo Ave., Moscow 119454, Russian Federation
  • T. Yu. Khvatova  Peter the Great St. Petersburg Polytechnic University, 29 Polytechnicheskaya Str., St. Petersburg 195251, Russian Federation
  • A. D. Zaltcman  Russian Technological University (MIREA), 78 Vernadskogo Ave., Moscow 119454, Russian Federation

Abstract: This paper explores the use of theoretical informatics applied for analyzing and modeling the processes in sociotechnical systems (social networks). A stochastic model of users' (network nodes) dynamic changes of states (opinions or moods) and the percolation threshold in a social network with random connections among nodes was developed. This model demonstrates the opportunity for jump-like transitions in states (opinions, moods) of the nodes in a social network over a short period of time without external influence. While developing the model, the probabilistic schemes of state-to-state transitions of nodes (users having certain opinions and views) were considered; a second-order nonlinear differential equation was derived; the boundary for calculating the probability density function for a system being in a certain state depending on the time interval was formulated. The differential equation of the model contains a member representing the opportunity for self-organization; it also considers the presence of memory. The results of analysis of the stochastic model support those previously obtained by the authors when investigating social network processes using the percolation theory. This theory was used for defining the time of reaching the threshold values for the share of social network nodes when certain opinions or preferences can spread freely within the whole social network.

Keywords: stochastic dynamics; states of social network nodes; system self-organization; processes involving memory; percolation in social networks

ON THE SYSTEM HIERARCHY OF ARTIFICIAL INTELLIGENCE
  • S. N. Grinchenko  Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation

Abstract: Artificial intelligence (AI) is considered from the point of view of informatics-cybernetic modeling (ICM) of the development process of the self-controlling hierarchical-network system of Humankind as a natural phenomenon, closely associated with the concepts of "cognitive functions of man" and "intellectual activity of man." Based on the information-communication-infrastructural component of the AI definition and on ICM, the concept of "human-hardware intellectual unit"' is naturally generalized to all levels/tiers of the Humanity system located in its hierarchy above and below relative to the level/tier of "personality" As a result, the phenomenon of "personal natural-artificial intelligence" is supplemented by the phenomenon of "hierarchical AI." Its formation became possible starting from ~ 1946 - with the advent of basic information technology (BIT) of computers and took on an explosive character from ~ 1979 - with the advent of telecommunication/network BIT. Typical sizes of the ranges of levels/tiers in the hierarchy of the AI of the Humankind system are given (indicated dates and sizes are the result of a model calculation).

Keywords: artificial intelligence; information technology; informatics-cybernetic model; self-controlling hierarchical-network system of Humankind; human-hardware intellectual unit

ON THE ACCURACY OF THE NORMAL APPROXIMATION UNDER THE VIOLATION OF THE NORMAL CONVERGENCE
  • V. Yu. Korolev  Faculty of Computational Mathematics and Cybernetics, Lomonosov Moscow State University, GSP-1, Leninskie Gory, Moscow 119991, Russian Federation, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
  • A. V. Dorofeeva  Faculty of Computational Mathematics and Cybernetics, Lomonosov Moscow State University, GSP-1, Leninskie Gory, Moscow 119991

Abstract: When solving applied problems in various fields, it is conventional to use the normal approximation to the distribution of data with additive structure. As a criterion of the adequacy of such a model, it is possible to use bounds for the convergence rate in the central limit theorem (CLT) of the probability theory stating that under certain conditions (say, under the Lindeberg condition), the total effect of very many random factors acts as a random variable with the normal distribution. The classical bounds for the convergence rate in the CLT such as the Berry-Esseen inequality are proved under the condition that the third moments of the summands exist. Also, bounds are known that require the existence of the moments of orders 2 + 5 with 0 < 5 < 1. If only the moments of the second order exist, then the convergence in the CLT can be arbitrarily slow. But if the moments of the summands of the second order do not exist, then the convergence of the distributions of sums of independent random variables to the normal law does not take place. It is practically impossible to reliably check the conditions of the central limit theorem with the limited size of the available sample. Therefore, the question of what is the real accuracy of the normal approximation if it is theoretically impossible is of great interest. Moreover, in some situations, in computer simulation of sums of random variables whose distributions belong to the domain of attraction of the stable distribution with the characteristic exponent less than two, as the number of summands grows, first, the distance between the distribution of the normalized sum and the normal law decreases and starts to increase only when the number of summands becomes sufficiently large. In this paper, an attempt is undertaken to give some theoretical explanation of this effect and to give an answer to the question posed above.

Keywords: central limit theorem; accuracy of normal approximation; heavy tails; uniform distance