Институт проблем информатики Российской Академии наук
Институт проблем информатики Российской Академии наук
Российская Академия наук

Институт проблем информатики Российской Академии наук




«INFORMATICS AND APPLICATIONS»
Scientific journal
Volume 17, Issue 3, 2023

Content | About  Authors

Abstract and Keywords

ON THE FORMATION OF SETS OF PRECEDENTS BASED ON TABLES OF HETEROGENEOUS FEATURE DESCRIPTIONS BY METHODS OF TOPOLOGICAL THEORY OF DATA ANALYSIS
  • I. Yu. Torshin  Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation

Abstract: Factorization of the contributions of various variables in the analysis of heterogeneous feature descriptions is an urgent task of complex data mining. The paper proposes the development of the lattice formalism of the topological theory of data analysis, within which new methods for generating parametric estimates and metrics on lattices formed over the topologies of sets of objects are obtained. The formalism was tested on the problem of forming sets of precedents for conducting chemomicrobiome analysis. Whereas the generation of a set of initial information based on regression coefficients and the difference in the values of the learning material corresponded to an extremely low generalizing ability of custom algorithms (correlation coefficient in the control 0.32 ± 0.20), the use of the proposed estimates for generating sets of precedents in chemomicrobiomics problems made it possible to significantly increase the generalizing ability of the corresponding algorithms (correlation coefficient in control 0.79 ± 0.21).

Keywords: topological data analysis; lattice theory; parametrization of lattice terms; human microbiome; pharmacoinformatics, algebraic approach of Yu. I. Zhuravlev.

NONLINEAR DYNAMIC SYSTEM STATE OPTIMAL FILTERING BY OBSERVATIONS WITH RANDOM DELAYS
  • A. V. Bosov  Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation

Abstract: A mathematical model of a nonlinear dynamic observation system with a discrete time which allows taking into account the dependence of the time of receiving observations on the state of the observed object is proposed. The model implements the assumption that the time between the moment when the measurement of the state is formed and the moment when the measured state is received by the observer depends randomly on the position of the moving object. Such an assumption source is the process of observation by stationary means of an autonomous underwater apparatus in which the time of obtaining up-to-date data depends on the unknown distance between the object and the observer. Unlike deterministic delays formed by the known state of the observation environment, to account for the dependence of time delays on the unknown state of the object of observation, it is required to use random functions to describe them. The main result of the study of the proposed model is the solution ofthe optimal filtering problem. For this purpose, recurrent Bayesian relations describing the evolution of the a posteriori probability density are obtained. The difficulties ofusing a semifinished filter for practical purposes are discussed. The proposed model is illustrated by a practical example of the task of tracking a moving underwater object based on the results of measurements performed by typical acoustic sensors. It is assumed that the object moves under the water in a plane with a known average speed, constantly performs chaotic maneuvers, and is observed by two independent complexes of acoustic sensors measuring the distances to the object and the guiding cosines. The complexity of determining the position of such an object is illustrated by a simple filter using the geometric properties ofthe measured quantities and the least squares method.

Keywords: stochastic dynamic observation system; state filtering; optimal Bayesian filter; mean square evaluation criterion; autonomous underwater vehicle; acoustic sensor; target tracking

MARKET WITH MARKOV JUMP VOLATILITY II: ALGORITHM OF DERIVATIVE FAIR PRICE CALCULATION
  • A. V. Borisov  Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation

Abstract: The second part of the series is devoted to the numerical realization of the fair derivative price in the case of the incomplete financial market with a Markov jump stochastic volatility. The concept of the market price of risk applied to this model by Runggaldier (2004) allows derivation of a system of partial differential equations describing the temporal evolution of the derivative price as a function of the current underlying price and the implied stochastic volatility. This system represents a generalization of the classic Black-Sholes equation. By contrast with this classic version, the proposed system of equations does not permit an analytical solution. The paper presents an approximate analytical method of the fractional steps. The author equips the temporal axis with a grid, then the author approximates the required solution as a combination of the solution to the classical heat equation and the system of the ordinary linear differential equations. The paper contains the results of the numerical experiments illustrating the properties of both the Black-Sholes generalization and the joint evolution of the derivative and corresponding underlying prices.

Keywords: Markov jump process; optimal filtering; stochastic volatility; market price of risk; prevailing martingale measure

MODELING INSISTENT USER BEHAVIOR IN 5G NEW RADIO NETWORKS WITH RATE ADAPTATION AND BLOCKAGE
  • E. S. Sopin  RUDN University, 6 Miklukho-Maklaya Str., Moscow 117198, Russian Federation, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
  • A. R. Maslov  RUDN University, 6 Miklukho-Maklaya Str., Moscow 117198, Russian Federation
  • V. S. Shorgin  Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
  • V. O. Begishev  RUDN University, 6 Miklukho-Maklaya Str., Moscow 117198, Russian Federation

Abstract: Millimeter wave 5G New Radio access technology and future 6G terahertz systems are designed for rate-demanding multimedia applications. Such applications are adaptive to the network transmission rate. The unreliable nature of 5G/6G communications networks, subject to loss of connection due to dynamic blockage of radio propagation paths, can cause impatient user behavior leading to retries to continue service. For such systems, the authors propose a service model with impatient behavior and service interruption based on a resource queuing system with an orbit. The authors derive the new and ongoing session loss probabilities as well as characterize the efficiency of resource usage. Numerical results show that user persistence reduces the probabilities under consideration: performing an average of two retries reduces both indicators by 20%-70% while with an average of 10 attempts, the gain can reach 200%-300%. Persistence not only increases the use of system resources by 20%-40% but also allows them to be used efficiently reducing the percentage of wasted resources by two to three times.

Keywords: 5G; New Radio; resource queuing system; callbacks; propagation blocking; service interruption

MULTIUSER NETWORK LOAD ANALYSIS BY SPLITTING FLOWS ALONG THE SHORTEST ROUTES
  • Yu. E. Malashenko  Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
  • I. A. Nazarova  Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation

Abstract: In computational experiments on a multicommodity network model, two ways of transmitting flows of different types along shortest routes are investigated. In the first case, the transmitted internodal flows are equal in magnitude. In the other - a nondiscriminatory distribution is defined in which all pairs of nodes are distributed the same resources. The total load of the network edges resulting from the simultaneous transmission of all internodal flows is considered to be given. The proposed method allows one to obtain guaranteed estimates of the specific resource costs of the network and the maximum feasible load of the edges while transmitting split internodal flows along the shortest routes found. The results of a comparative analysis of the equalization distribution of flows and resources in networks with different structural features are given. The algorithmic scheme has a polynomial estimate of the required number of operations.

Keywords:  multicommodity flow model; distribution of internodal flows and loads; network peak load

OPTIMIZATION OF THE BUFFER MEMORY ALLOCATION SCHEME OF THE PACKET SWITCHING NODE
  • Ya. M. Agalarov  Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation

Abstract: The buffer of the packet switching node shared by several output communication lines is considered. Sharing buffer memory by multiple users reduces the amount of memory needed to meet latency requirements and the likelihood of packet loss. However, there is a problem of allocating buffer memory between different users, since individual users, having occupied all the memory, can restrict (or close) access to communication lines to other users which can significantly reduce the performance of the switching node as a whole. There are many different buffer memory allocation schemes, one of which, called SMA (Sharing with Minimum Allocation), is being investigated in this paper in order to reduce the costs associated with packet rejection and delay and the operation of the drive and communication lines. A multithreaded queuing system with parallel devices of the M/M/s/K type with a buffer shared according to the SMA scheme with a fixed number of storage locations reserved for each device is used as a model of the switching node. The mathematical formulation of the problem of optimizing the SMA scheme in terms of the volume of publicly accessible buffer locations is formulated in order to minimize system losses arising from rejection of applications, delay of applications in the queue, and operation of the buffer and devices. The theorem on the boundaries of the domain containing the point of the global optimum is proved. A number of statements are also given which are the consequences of the theorem about the point of the global optimum of the objective function for other models of the switching node and special cases of SMA.

Keywords: switching node; buffer memory allocation; optimization; queuing system

ON THE RATE OF CONVERGENCE AND LIMITING CHARACTERISTICS FOR ONE QUASI-BIRTH-DEATH PROCESS
  • I. A. Usov  Vologda State University, 15 Lenin Str., Vologda 160000, Russian Federation
  • Y. A. Satin  Vologda State University, 15 Lenin Str., Vologda 160000, Russian Federation
  • A. I. Zeifman  Vologda State University, 15 Lenin Str., Vologda 160000, Russian Federation, Moscow Center for Fundamental and Applied Mathematics, M.V. Lomonosov Moscow State University, 1-52 Leninskie Gory, GSP-1, Moscow 119991, Russian Federation, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation, Vologda Research Center of the Russian Academy of Sciences, 56A Gorky Str., Vologda 160014, Russian Federation

Abstract: A queuing system with one server and different repair and failure options is considered, the number of requirements in which is described by a quasi-birth-death process. To reasonably find the limiting probabilistic characteristics of the system, the rate of convergence to them is studied (that is, the rate at which the initial conditions of the system are "forgotten"). To study the rate of convergence to the limiting regime, a recently developed version of the approach based on the concept of the logarithmic norm of the operator function corresponding to the estimate of the norm of the Cauchy matrix as well as a modernized special transformation of the forward Kolmogorov system was applied. A numerical example is considered for which the estimation of the rate of convergence is shown in detail as well as the construction of some limiting characteristics of the model based on these estimates.

Keywords:  quasi-birth-death processes; rate of convergence; ergodicity bounds; logarithmic norm; queuing systems

A METHOD FOR ESTIMATING PARAMETERS OF THE GAMMA-EXPONENTIAL DISTRIBUTION FROM A SAMPLE WITH WEAKLY DEPENDENT COMPONENTS
  • A. A. Kudryavtsev  Department of Mathematical Statistics, Faculty of Computational Mathematics and Cybernetics, M. V Lomonosov Moscow State University, 1-52 Leninskie Gory, GSP-1, Moscow 119991, Russian Federation, Moscow Center for Fundamental and Applied Mathematics, M.V. Lomonosov Moscow State University, 1-52 Leninskie Gory, GSP-1, Moscow 119991, Russian Federation
  • O. V. Shestakov  Department of Mathematical Statistics, Faculty of Computational Mathematics and Cybernetics, M. V Lomonosov Moscow State University, 1-52 Leninskie Gory, GSP-1, Moscow 119991, Russian Federation, Moscow Center for Fundamental and Applied Mathematics, M.V. Lomonosov Moscow State University, 1-52 Leninskie Gory, GSP-1, Moscow 119991, Russian Federation, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation

Abstract: The article proves the asymptotic normality of the estimators for the gamma-exponential distribution parameters obtained using the modified method of moments in the case of a weak dependence of the sample components. For the estimators of the bent and scale parameters of the gamma-exponential distribution with fixed shape and concentration parameters, the central limit theorem is proved in the case when the maximum correlation coefficient between the sample elements tends to zero. The proof is based on the study of the sample spectral density and the results of the theory of stationary random sequences. The results of the article can be used to substantiate the asymptotic normality of the estimators for the parameters of the digamma distribution, the particular types of which include the generalized gamma distribution and the generalized beta distribution of the second kind that arise when describing processes modeled with distributions having a nonnegative unbounded support.

Keywords: weak dependence; parameter estimation; gamma-exponential distribution; mixed distributions; method of moments; asymptotic normality

LOGICAL METHODS OF CORRECT DATA CLASSIFICATION<
  • E. V. Djukova  Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
  • G. O. Masliakov  Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
  • A. P. Djukova  Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation

Abstract: The work is devoted to issues of the application of discrete apparatus (logical methods of integer data analysis) for the supervised classification problem. Three main directions of logical classification are considered: Correct Voting Procedures (CVP), Logical Analysis of Data (LAD), and Formal Concept Analysis (FCA). Using the terminology of the CVP direction, the basic concepts applied in LAD and FCA are presented. The general scheme of the logical classifier is described, according to which each logical classifier at the training stage sets some partial order on a special set of fragments of precedents descriptions and searches for the maximum elements of this set relative to the given order. Such studies are important for creating a general theory of correct classification according to precedents based on the use of a discrete apparatus.

Keywords: supervised classification; logical classifier; Correct Voting Procedures; Logical Analysis of Data; Formal Concept Analysis; irredundant representative elementary classifier; strong pattern; JSM-hypothesis; partial order

CLASSIFICATION BY CAUSE-AND-EFFECT RELATIONSHIPS
  • A. A. Grusho  Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
  • N. A. Grusho  Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
  • M. I. Zabezhailo  Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
  • D. V. Smirnov  Sberbank of Russia, 19 Vavilov Str., Moscow 117999, Russian Federation
  • E. E. Timonina  Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation

Abstract: By definition, property A in object O is the cause for the occurrence of consequence B which is available for observation in information space I if characteristics of A can generate an object in space I containing consequence B. In this case, B determinedly appears with the appearance of A. Therefore, one can consider the classification problem as calculating the consequences of the characteristics of the object where the consequences act as characteristics of the class. In this case, the characteristics of the classification object can be considered as the cause that deterministically (classification as mapping) generates consequences (characteristics of the class).
Each of the properties Ai, i = 1,..., k, is the cause of the deterministic appearance of a nonempty set of its consequences. If the number of classes is large as well as the sets of consequences of each, then the classification problem can be complex to compute due to the fact that repetitions of consequences in the sets of consequences are possible. Therefore, it is advisable to look for simplified schemes for classifying objects according to the causes for the consequences in them. For this, an apparatus of systems of various representatives can be used. In the context of the problem of classifying causes due to consequences, it is impossible to directly use F Hall's theorem on systems of various representatives, since elements of cause-and-effect chains cannot be broken. The paper shows that the transformation of each of the same chains of cause-and-effect relationships into one common new element in the sets of consequences forms the possibility of applying the conditions of F Hall's theorem.

Keywords: cause-and-effect relationships; finite classification; searching for the properties in unobservable data

TOWARD CLUSTERING OF NETWORK COMPUTING INFRASTRUCTURE OBJECTS BASED ON ANALYSIS OF STATISTICAL ANOMALIES IN NETWORK TRAFFIC
  • A. K. Gorshenin  Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation, M. V. Lomonosov Moscow State University, 1 Leninskie Gory, GSP-1, Moscow 119991, Russian Federation
  • S. A. Gorbunov  M. V. Lomonosov Moscow State University, 1 Leninskie Gory, GSP-1, Moscow 119991, Russian Federation, Moscow Center for Fundamental and Applied Mathematics, M.V. Lomonosov Moscow State University, 1-52 Leninskie Gory, GSP-1, Moscow 119991, Russian Federation
  • D. Yu. Volkanov  M. V. Lomonosov Moscow State University, 1 Leninskie Gory, GSP-1, Moscow 119991, Russian Federation

Abstract: The problem of detecting statistical anomalies (that is, outliers in relation to the typical values of upload and download traffic) of the load on the nodes of the network computing infrastructure is considered. The regular scaling in computing resources and storage as well as redirection of data flows is needed due to the increase of load in real systems. The procedure for detecting statistical anomalies in network traffic is proposed using the approximation of observations by the generalized gamma distribution for further clustering of network computing infrastructure objects in order to evaluate resource need. All computational statistical procedures described in the paper are implemented using the R programming language and they are applied for network traffic, simulated using a specialized architectural and software stand. The proposed approaches can also be used for a wider class of telecommunication problems.

Keywords: network infrastructure; network traffic; generalized gamma distribution; computational statistics; statistical hypothesis testing; anomaly detection; clustering

EFFICIENCY OF BINARY NEURAL NETWORKS FOR OBJECT DETECTION ON AN IMAGE
  • D. O. Korolev  Peter the Great St. Petersburg Polytechnic University, 29 Polytechnicheskaya Str., St. Petersburg 195251, Russian Federation
  • O. G. Maleev  Peter the Great St. Petersburg Polytechnic University, 29 Polytechnicheskaya Str., St. Petersburg 195251, Russian Federation

Abstract: Deep convolutional neural networks are widely used for object detection. However, modern deep convolutional neural network models are computationally expensive hindering their deployment in resource- constrained mobile and embedded devices. Binary neural networks help to alleviate the resource requirements of devices. Activations and weights in binary neural networks are limited by binary values (-1,1). The proposed method implements balancing and standardization of floating-point weights in forward propagation and two-stage sign function approximation in back propagation. The paper presents the results of detection accuracy on the PASCAL Face dataset as well as the results of speed and model size on the mobile device for the proposed method, the model without binarization, the TinyML network, and Bi-Real Net and ABC-Net methods.

Keywords: binary neural networks; convolutional neural networks; objects detection; model acceleration

FORMALIZED DESCRIPTION OF STATISTICAL INFORMATION PROCESSING IN DATABASES
  • V. V. Vakulenko  Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
  • I. M. Zatsman  Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation

Abstract: The paper presents an overview of the stages of statistical processing of text data, from specific informational objects in databases to the values of the numerical characteristics of these objects. For example, if the database contains the descriptions of full-text research articles, then they represent specific informational objects. With the appropriate population of such a database, the multistage procedure of their processing makes it possible to determine the values of the numerical characteristics of the publication activity of a researcher, a scientific division, and a scientific organization as a whole. Such procedures begin with the processing of specific informational objects and end with computing of the values of the numerical characteristics of these objects. At intermediate stages, tables and other both verbal and numerical objects may form. If the stages of the statistical processing are designed to be reversible and the database implements the function of verifying the values of the numerical characteristics, then the procedure of their verification begins with the values of the characteristics and ends with access to specific informational objects that were used to compute these values. The paper proposes a formalized description of the stages of statistical processing of text data in databases. Informational-mathematic transformation (IM-transformation) is the proposed name for such transformation of text data into numerical values. It combines the processing of specific informational objects, the formation of verbal and numerical objects, and the mathematical computation of the values of numerical characteristics. Such transformation of text data may include mathematic processes at certain stages; however, it does not completely reverse back to them. The goal of the paper is to propose the principles of formalized description of IM-transformation of texts in databases. To illustrate this, the paper provides the example of formalizing the process of determining the frequency of translation variants of connectives expressing intertextual relations between text fragments in the supracorpora database of connectives developed in the FRC CSC RAS.

Keywords: informational-mathematic transformation; text information; statistical processing of text information; supracorpora database

EVALUATION CRITERIA FOR DISCOURSE RELATIONS SEMANTIC AFFINITY
  • O. Yu. Inkova  Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation, University of Geneva, 22 Bd des Philosophes, CH-1205 Geneva 4, Switzerland
  • M. G. Kruzhkov  Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation

Abstract: The paper presents an overview of structured definitions of discourse relations created based on classification principles and criteria for evaluating their semantic affinity. The authors point out the shortcomings of existing classification approaches that are sometimes inconsistent or contradictory and outline the benefits of an alternative approach to classification of discourse relations which is based on their structured definition.
The paper provides examples of such definitions created within the Supracorpora Database of Connectives and discusses the criteria for evaluating their semantic closeness. As the structured definitions are represented by sets of distinguishing features, the authors discuss the problem of identifying proximity factors for each of these features.
The gathered data suggest a hypothesis that among the three groups of features: "Level," "Basic operation," and "Feature family", it is the last one that should have the most impact. Finally, directions for further research of this problem are considered, namely, the option of taking into account such factors as compatibility of discourse relations, correspondence of relations between the source text and its translation, and such cases where certain relation markers may express different discourse relations in various contexts.

Keywords: supracorpora database; logical-semantic relations; connectives; annotation; faceted classification

TRANSF0RMATI0N 0F THE ACK0FF'S HIERARCHY IN THE SCIENTIFIC PARADIGM 0F INF0RMATICS
  • I. M. Zatsman  Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation

Abstract: The DIKW (data, information, knowledge, and wisdom) hierarchy, which was published in 1989 by Russell Ackoff, is considered. In it, wisdom is at the top of the hierarchy followed by knowledge, information, and, at the very bottom, data. It was originally intended that the DIKW hierarchy could be used to describe the relationships between its four components. However, the problem of describing the mutual transformations of two neighboring components, especially for knowledge and information, turned out to be very difficult to solve within the DIKW hierarchy. The complexity of its solution lies in the fact that the DIKW hierarchy implies the generation of knowledge as an outcome of the process of filtering neighboring information but the means of implementing this process were not defined by Ackoff. It is also impossible to describe the semantic interpretation of data, since they are not directly adjacent to knowledge in the DIKW hierarchy which implies the existence of relations between neighboring components only. The aim of the paper is to transform the DIKW hierarchy within the framework of the scientific paradigm of informatics which is based on the medium division of its subject domain into mental, informational, digital, and a number of other media. While Ackoff used the principle of vertical placement of the components, this paper instead proposes to correlate the interfaces and sign systems used in informatics with the relationships between the three components of the hierarchy: data, information, and knowledge. If one uses the principle of vertical placement not of the components but of the media of the subject domain of informatics, then it can be proposed an approach to solving the problem of describing the mutual transformations of the three components of the hierarchy comparing them with informatics interfaces and sign systems. Such a comparison will make it possible to see those pairs of components for which the interfaces are not currently formalized, do not have a computer implementation, and are performed by experts. The paper provides the example of a knowledge discovery technology that combines automatic and expert (nonformalized) technological stages.

Keywords: scientific paradigm of informatics; data; information; knowledge; DIKW hierarchy; knowledge discovery technology