Институт проблем информатики Российской Академии наук
Институт проблем информатики Российской Академии наук
Российская Академия наук

Институт проблем информатики Российской Академии наук




«INFORMATICS AND APPLICATIONS»
Scientific journal
Volume 17, Issue 4, 2023

Content | About  Authors

Abstract and Keywords

NONLINEAR REGULARIZATION OF THE INVERSION OF LINEAR HOMOGENEOUS OPERATORS USING THE BLOCK THRESHOLDING METHOD
  • O. V. Shestakov  Department of Mathematical Statistics, Faculty of Computational Mathematics and Cybernetics, M. V Lomonosov Moscow State University, 1-52 Leninskie Gory, GSP-1, Moscow 119991, Russian Federation, Moscow Center for Fundamental and Applied Mathematics, M. V. Lomonosov Moscow State University, 1 Leninskie Gory, GSP-1, Moscow 119991, Russian Federation, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
  • E. P. Stepanov  Department of Mathematical Statistics, Faculty of Computational Mathematics and Cybernetics, M. V Lomonosov Moscow State University, 1-52 Leninskie Gory, GSP-1, Moscow 119991, Russian Federation

Abstract: The methods of thresholding the coefficients of wavelet expansions have become a popular tool for regularization of inverse statistical problems due to their simplicity, computational efficiency, and the ability to adapt both to the type of operators and to the features of the function under study. This approach proved to be the most fruitful for inversion of linear homogeneous operators arising in some signal and image processing problems.
The paper considers the block thresholding method in which the decomposition coefficients are processed in groups that allows taking into account information about neighboring coefficients. In a data model with an additive Gaussian noise, an unbiased estimate of the mean-square risk is analyzed and it is shown that under certain conditions, this estimate is strongly consistent and asymptotically normal. These properties allow constructing asymptotic confidence intervals for the theoretical mean-square risk of the method under consideration.

Keywords: linear homogeneous operator; wavelets; block thresholding; unbiased risk estimate; asymptotic normality; strong consistency

MARKET WITH MARKOV JUMP VOLATILITY III: PRICE OF RISK MONITORING ALGORITHM GIVEN DISCRETE-TIME OBSERVATIONS OF ASSET PRICES
  • A. V. Borisov  Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation

Abstract: The third part of the series is devoted to the online estimation of the market price of risk. The market model includes a deposit, underlying, and derivative assets. The model of the underlying asset prices contains stochastic volatility represented by a arkovjump process (MJP). There is no arbitrage in the considered market; so, the market price of risk is a function of the MJP's current value. So, the MJP monitoring problem transforms into the MJP state filtering one. The statistical data are available at discrete moments and contain the direct observations of the underlying assets and indirect observations of the derivative ones. The paper presents the solution to the optimal filtering problem and the corresponding algorithm of its numerical realization. The paper also contains a numerical example demonstrating the performance of the MJP state estimates in dependence on the type and structure of the available observations.

Keywords: Markov jump process; optimal filtering; stochastic volatility; market price of risk; prevailing martingale measure

PROCEDURE OF CONSTRUCTING A PARETO SET FOR DIFFERENTIABLE CRITERIA FUNCTIONS
  • Ya. I. Rabinovich  Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation

Abstract: A ubiquitous computational procedure for the multicriteria optimization allows one to approximate the Pareto set under different requirements to the vector of particular efficiency criteria and the set of feasible solutions. In the paper, it is assumed that particular efficiency criteria are pseudoconcave in an open neighborhood of a compact convex set of feasible solutions which can be given by differentiable functional constraints. To build specific numerical methods for approximating the Pareto set, a rule for choosing the initial approximation and a rule for moving from the current reference solution to the next one are proposed.

Keywords: multicriteria optimization; Pareto set; numerical methods of approximation; universal procedure

NONPARAMETRIC ALGORITHM FOR AUTOMATIC CLASSIFICATION OF REMOTE SENSING DATA
  • V. P. Tuboltsev  M. F. Reshetnev Siberian State University of Science and Technology, 31 Krasnoyarsky Rabochy Av., Krasnoyarsk 660037, Russian Federation
  • A. V. Lapko  M. F. Reshetnev Siberian State University of Science and Technology, 31 Krasnoyarsky Rabochy Av., Krasnoyarsk 660037, Russian Federation, Institute of Computational Modelling of the Siberian Branch of the Russian Academy of Sciences, 50/44 Akadem- gorodok, Krasnoyarsk 660036, Russian Federation
  • V. A. Lapko  M. F. Reshetnev Siberian State University of Science and Technology, 31 Krasnoyarsky Rabochy Av., Krasnoyarsk 660037, Russian Federation, Institute of Computational Modelling of the Siberian Branch of the Russian Academy of Sciences, 50/44 Akadem- gorodok, Krasnoyarsk 660036, Russian Federation

Abstract: A nonparametric algorithm for automatic classification of large-volume statistical data is proposed.
The algorithm under consideration assumes compression of initial information based on decomposition of multidimensional feature space. As a result, a large statistical sample is transformed into a data array composed of the centers of multidimensional sampling intervals and their corresponding frequencies of random variables.
The information obtained is used in the synthesis of the regression estimate of the probability density. A class is understood as a compact group of observations of a random variable corresponding to a unimodal fragment of the probability density function. On this basis, a nonparametric automatic classification algorithm is developed which is based on the sequential procedure for checking the proximity of the centers of multidimensional sampling intervals and the ratios between the frequencies of belonging of random variables from the original sample to these intervals.
To improve the computational efficiency of the proposed automatic classification algorithm, a multithreaded method of its software implementation is used. The practical significance of the developed algorithm for automatic classification is confirmed by the results of its application for assessing the state of the forests areas using remote sensing data.

Keywords: automatic classification; large-volume samples; sampling of the range of values of random variables; regression estimation of probability density; remote sensing data

MULTIFACTOR CLASSIFICATION TECHNOLOGY OF MATHEMATICAL CONTENT OF E-LEARNING SYSTEM
  • A. V. Bosov  Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
  • A. V. Ivanov  Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation

Abstract: The article continues the study of the problem of classifying the content of an e-learning system. The previously developed technology for thematic classification of mathematical content contained in the blocks of tasks and examples of e-learning system has been improved and supplemented with new functions. For this purpose, the previously used content model with two properties - a text description of a task and its formula part in TEXformat - has been supplemented with a number of formal numerical attributes, such as the presence of transcendental and derived functions, and the number of formulas in the task. This block of attributes made it possible to improve the quality of the existing thematic classifier and to implement two new ones. The first classifier determines the level of complexity of the task. The second multilabel classifier determines the set of student competencies that the task should form. Such a multifactorial classification is an important stage in the promising direction of the development of e-learning system - automated assessment of the quality of educational content. Performance testing of the proposed algorithms, training of classifiers, and analysis of classification quality were carried out using the tasks from the same discipline of the theory of functions of a complex variable but on significantly expanded set of data, including tasks for independent work of students - calculation and examination tasks.

Keywords: e-learning system; mathematical content; machine learning; multifactor classification; content quality assessment

AN EXTENSIBLE APPROACH TO DATA FUSION IN DISTRIBUTED COMPUTING ENVIRONMENTS
  • V. V. Sazontev  Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
  • S. A. Stupnikov  Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
  • V. N. Zakharov  Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation

Abstract: The paper belongs to the area of development of methods and tools for data integration. One of the most important stages of data integration is data fusion, i.e., the combination of records relating to the same real-world entity into a single record with conflict resolution for each of the attributes. The paper considers the formal statement of the data fusion problem, provides a brief review of major groups of data fusion methods. An approach for implementation of the data fusion stage within an extensible heterogeneous data integration system in a distributed computing environment is proposed. Software architecture and basic implementation ideas of the approach are considered.

Keywords: data fusion; distributed computing environment

SOLUTION OF THE PROBLEM OF OPTIMAL CONTROL OF THE STOCK OF A CONTINUOUS PRODUCT IN A STOCHASTIC MODEL OF REGENERATION WITH RANDOM COST CHARACTERISTICS
  • P. V. Shnurkov  National Research University Higher School of Economics, 34 Tallinskaya Str., Moscow 123458, Russian Federation

Abstract: The work is devoted to the study of the problem of managing the stock of a certain continuous product, the evolution of the volume of which is described by a regenerating stochastic process. The main feature of the considered mathematical model is that the cost characteristics that determine the price of supplying the product to the consumer and the costs associated with ensuring the functioning of the system depend on random external factors. The random control parameter is the time from the moment of the next replenishment of the stock to the moment of the next order for replenishment. It is proved that the stationary cost indicator of control efficiency in the optimization problem under consideration in its analytical structure is a fractional-linear integral functional depending on the distribution function of the control parameter. The theoretical solution of the optimization problem is based on the use of the extremum theorem for linear-fractional integral functionals.

Keywords: continuous product inventory control problem; random cost characteristics of the system; controlled regenerative stochastic processes; linear-fractional integral functionals in problems of stochastic optimal control

BOUNDS OF THE WORKLOAD IN A MULTICLASS RETRIAL QUEUE WITH EXPONENTIAL SERVICES
  • I. V. Peshkova  Petrozavodsk State University, 33 Lenina Pr., Petrozavodsk 185910, Russian Federation, Karelian Research Center of the Russian Academy of Sciences, 11 Pushkinskaya Str., Petrozavodsk 185910, Russian Federation

Abstract: A multiclass retrial queue with Poisson input and M classes of customers is investigated. For the given retrial system with exponential service times, the lower and upper bounds of the workload are derived. It is shown that the workload in the classical system M/Hm /1 with hyperexponential service times is the lower bound for the workload of the given retrial system. The upper bound is the workload in the classical M/G/l system where each customer occupies the server for the given service time and additional time corresponding to the inter-retrial time from the "slowest" orbit. The presented simulation results confirm the theoretical conclusions.

Keywords: retrial queue; workload; stochastic ordering

PRIORITY-BASED eMBB AND URLLC TRAFFIC COEXISTENCE MODELS IN 5G NR INDUSTRIAL DEPLOYMENTS
  • D. V. Ivanova  RUDN University, 6 Miklukho-Maklaya Str., Moscow 117198, Russian Federation
  • E. V. Markova  RUDN University, 6 Miklukho-Maklaya Str., Moscow 117198, Russian Federation
  • S. Ya. Shorgin  Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
  • Yu. V. Gaidamaka  RUDN University, 6 Miklukho-Maklaya Str., Moscow 117198, Russian Federation, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation

Abstract: The technology 5G New Radio simultaneously supports both Ultra-Reliable Low-Latency Service (URLLC) and enhanced Mobile Broadband Service (eMBB). Owing to extreme latency and reliability requirements of both types of services, a prioritization needs to be provided. The present authors consider an industrial environment where production equipment utilizes URLLC service for controlling motion and synchronous operation while eMBB service is used for remote monitoring. The authors proposed the model with priority service at base station (BS) with and without direct device-to-device (D2D) communications. The obtained numerical results indicate that priorities allow one to isolate URLLC and eMBB traffic efficiently. The D2D-aware strategy where the BS explicitly reserves resources for direct communications significantly outperforms strategies where explicit reservation is not utilized as well as the strategy where all the traffic goes through the BS.

Keywords: 5G; NR (New Radio); D2D; URLLC; eMBB; resource allocation; priority service

MODELS FOR STUDY OF THE INFLUENCE OF STATISTICAL CHARACTERISTICS OF COMPUTER NETWORKS TRAFFIC ON THE EFFICIENCY OF PREDICTION BY MACHINE LEARNING TOOLS
  • S. L. Frenkel  Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
  • V. N. Zakharov  Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation

Abstract: The article is an attempt to streamline and categorize a huge stream of publications on modern methods, techniques, and models of data forecasting of various nature in terms of their applicability for traffic forecasting in computer networks. The specified ordering is performed within the framework of the proposed conceptual model of forecasting algorithms. Within the framework of this conceptual model, the characteristics of both computer network traffic models and traffic control methods that can be explicitly or implicitly used in modern prediction software tools are highlighted. It is shown that the analysis of such probabilistic aspects of data description as the presence of significant nonstationarity, some nonlinear effects in data models, as well as the specifics of data distribution laws can influence the efficiency of learning predictors.

Keywords: network traffic prediction; probabilistic models

PARALLEL CORPUS ANNOTATION: APPROACHES AND DIRECTIONS FOR DEVELOPMENT
  • A. A. Goncharov  Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation

Abstract: Possible directions for the development of parallel corpus annotation tools are presented considering the actual situation in this area. The main approaches to conducting research on corpus material - (i) corpus-based; (ii) corpus-driven; and (iii) corpus-illustrated - are considered and the differences between them are briefly described. It is demonstrated that despite the abundance of corpus annotation tools, the vast majority of them are designed to deal with monolingual corpora and/or support a very limited functionality for annotating textual data. The largest number of functions are provided by supracorpora databases and web applications to access them which are being developed at FRC CSC RAS: (i) forming of original and translated text blocks necessary and sufficient for analyzing the occurrence of the studied language unit and its translation variant; (ii) identification of the occurrence of the studied language unit and its translation variant; (iii) selection of features characterizing the use of the studied language unit and its translation variant; and (iv) selection of features characterizing the translation correspondence. This set of functions provides solutions to a significant part of research problems but it can be extended. Three directions for the development of the existing functionality are suggested which can provide a more detailed description of linguistic material.

Keywords: parallel corpus; corpus linguistics; corpus annotation; linguistic annotation

EVALUATING THE DEGREE OF DISCOURSE RELATIONS SEMANTIC AFFINITY: METHODS AND INSTRUMENTS
  • O. Yu. Inkova  Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation, University of Geneva, 22 Bd des Philosophes, CH-1205 Geneva 4, Switzerland
  • M. G. Kruzhkov  Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation

Abstract: The methods for evaluating semantic affinity of discourse relations are examined. The authors propose several approaches to this problem using two information resources: a collection of structured definitions of logical- semantic relations (LSRs) formed by the authors and the Supracorpora Database of Connectives incorporating corpus-based annotations of translation correspondences that include text fragments with LSR markers in Russian, French, and Italian. It is demonstrated that when it comes to assessing the semantic affinity of LSRs, the following factors will be of a higher priority: affiliation of distinctive features of LSRs with the same family in the structured definitions of relations; correspondences between markers of different LSRs in the source and target texts; and cases when different LSRs are regularly expressed by the same markers in different contexts. Of a lesser importance is the factor of compatibility of different LSRs within the same context. It is assumed that based on the proposed methods, it will become possible to specify more precisely which distinguishing features of LSRs have the greatest impact on their potential semantic affinity.

Keywords: supracorpora database; logical-semantic relations; connectives; annotation; faceted classification

SCIENTIFIC PARADIGM OF INFORMATICS: CLASSIFICATION OF DOMAIN OBJECTS
  • I. M. Zatsman  Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation

Abstract: A description is given of the three top levels of classification of domain's objects of informatics which is positioned as an integral part of the system of scientific knowledge that covers a wide range of information and computer sciences. With such positioning, the boundaries of the domain expand significantly and largely correspond to the concept of polyadic computing by Paul Rosenbloom. All entities of informatics in the proposed scientific paradigm are divided into two global classes: objects and their transformations. For each class, in the process of creating the paradigm, its own classification is constructed. The paradigm's creation began with the formation of these classifications. The paper discusses the three top levels of classification of domain's objects of informatics. The basis for constructing the first (the highest) level is the division of the domain of informatics into the media: mental, sensory, digital, and a number of other media. The basis for constructing the second level of objects' classification is the division of sensory perceived objects of informatics into data and sign information which is the outcome of transformation of human cognitive structures into a sign form. The basis for constructing the third level of classification of objects is the typology of sign systems by A. Solomonick. The aim of the paper is to describe the approach to constructing the three top levels of classification of domain's objects of informatics and to compare it with the previously used approaches to describing its subject domain. Also, based on the proposed approach, the answers to those questions of Thomas Kuhn about the basic entities of the subject domain which should contain the paradigm of any science, not just informatics, are partly formulated.

Keywords: scientific paradigm; classification of domain's objects of informatics; basis of classification; subject domain media