Институт проблем информатики Российской Академии наук
Институт проблем информатики Российской Академии наук
Российская Академия наук

Институт проблем информатики Российской Академии наук




«INFORMATICS AND APPLICATIONS»
Scientific journal
Volume 15, Issue 2, 2021

Content | About  Authors

Abstract and Keywords

LINEAR OUTPUT CONTROL OF MARKOV CHAINS BY THE QUADRATIC CRITERION
  • A. V. Bosov  Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation

Abstract: The problem of optimal output control of a stochastic observation system, in which the state determines an unobservable Markov jump process and linear observations are given by a system of Ito differential equations with a Wiener process, is solved. Observations additively include control vector, so that a controlled output of the system is formed. The optimization goal is set by a general quadratic criterion. To solve the control problem, a separation theorem is formulated that uses the solution to the optimal filtering problem provided by the Wonham filter. As a result of the separation, an equivalent problem of output control of a diffusion process of a particular type, namely, with linear drift and nonlinear diffusion, is formed. The solution ofthis problem is provided by direct application ofthe dynamic programming method.

Keywords: Markov jump process; Ito stochastic differential system; optimal control; quadratic criterion; stochastic filtering; Wonham filter

FILTERING OF MARKOV JUMP PROCESSES GIVEN COMPOSITE OBSERVATIONS I: EXACT SOLUTION
  • A. V. Borisov  Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation, Moscow Aviation Institute (National Research University), 4 Volokolamskoe Shosse, Moscow 125080, Russian Federation, Department of Mathematical Statistics, Faculty of Computational Mathematics and Cybernetics, M. V. Lomonosov Moscow State University, 1-52 Leninskiye Gory, GSP-1, Moscow 119991, Russian Federation, Moscow Center for Fundamental and Applied Mathematics, M.V. Lomonosov Moscow State University, 1-52 Leninskiye Gory, GSP-1, Moscow 119991, Russian Federation
  • D. Kh. Kazanchyan  Department of Mathematical Statistics, Faculty of Computational Mathematics and Cybernetics, M. V. Lomonosov Moscow State University, 1-52 Leninskiye Gory, GSP-1, Moscow 119991, Russian Federation

Abstract: The first part of the series is devoted to the optimal filtering of the finite-state Markov jump processes (MJP) given the ensemble of the diffusion and counting observations. The noise intensity in the observable diffusion depends on the estimated MJP state. The special equivalent observation transformation converts them into the collection of the diffusion process of unit intensity, counting processes, and indirect measurements performed at some nonrandom discrete instants. The considered filtering estimate is expressed as a solution to the discrete-continuous stochastic differential system with the transformed observations on the right-hand side. The identifiability condition, under which MJP state can be reconstructed from indirect noisy observations precisely, is presented.

Keywords: Markov jump process; optimal filtering; multiplicative observation noises; stochastic differential equation; continuous and counting observations; identifiability condition

ON ONE NONSTATIONARY SERVICE MODEL WITH CATASTROPHES AND HEAVY TAILS
  • A. I. Zeifman  Department of Applied Mathematics, Vologda State University, 15 Lenin Str., Vologda 160000, Russian Federation, Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation, Vologda Research Center of the Russian Academy of Sciences, 56A Gorky Str., Vologda 160014, Russian Federation
  • Ya. A. Satin  Department of Applied Mathematics, Vologda State University, 15 Lenin Str., Vologda 160000, Russian Federation
  • I. A. Kovalev  Department of Applied Mathematics, Vologda State University, 15 Lenin Str., Vologda 160000, Russian Federation

Abstract: The paper considers the nonstationary queuing system with catastrophes, one server, and special group arrivals of requests. The intensities of increasing groups of requests can decrease rather slowly. The process X(t), which describes the number of requirements in such system, is considered, the existence of a limiting regime of the probability distribution of states and a limiting average for X(t) is proved, and estimates of the rate of convergence to the limiting regime and the limiting average are obtained. Approximation estimates are obtained using truncations by finite processes. As an example, the authors consider a simple model of a nonstationary system with a rather slow rate of decrease in the arrival rates of customer groups when the group size grows.

Keywords: nonstationary queuing system; countable Markov chains; limiting characteristics; rate of convergence; approximation

THE MULTIVARIATE DISTRIBUTIONS OF OUTPUT STREAMS IN A QUEUEING SYSTEM WITH PREEMPTIVE REPEAT PRIORITY
  • V. G. Ushakov  Department of Mathematical Statistics, Faculty of Computational Mathematics and Cybernetics, M. V Lomonosov Moscow State University, 1-52 Leninskie Gory, GSP-1, Moscow 119991, Russian Federation, Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
  • N. G. Ushakov  Institute of Microelectronics Technology and High-Purity Materials of the Russian Academy of Sciences, 6 Academician Osipyan Str., Chernogolovka, Moscow Region 142432, Russian Federation, Norwegian University of Science and Technology, 15A S. P. Andersensvei, Trondheim 7491, Norway

Abstract: The paper studies a single server queuing system with r types of customers, preemptive repeat priority and an infinite number of positions in the queue. The arrival stream of customers of each type is a Poisson stream.
Each type has its own generally distributed service time characteristics. The main result is the Laplace-Stieltjes transform of one- and two-dimensional stationary distribution functions of the interdeparture times for each type of customers. The analysis of the output process is carried out by the method of embedded Markov chains.
As embedded times, successive moments of the end of service of the same type customers are selected. From a practical perspective, an accurate characterization of the interdeparture time process is necessary when studying open networks of queues.

Keywords: output stream; preemptive repeat priority; embedded Markov chain; single server

ANALYSIS OF THE UNBIASED MEAN-SQUARE RISK ESTIMATE OF THE BLOCK THRESHOLDING METHOD
  • O. V. Shestakov  Department of Mathematical Statistics, Faculty of Computational Mathematics and Cybernetics, M. V Lomonosov Moscow State University, 1-52 Leninskie Gory, GSP-1, Moscow 119991, Russian Federation, Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation

Abstract: Signal and image processing methods based on wavelet decomposition and thresholding have become very popular in solving problems of compression and noise suppression. This is due to their ability to adapt to local features of functions, high speed of processing algorithms and optimality of estimates obtained. In this paper, a block thresholding method is considered, in which expansion coefficients are processed in groups, which makes it possible to take into account information about neighboring coefficients. In the model with additive noise, an unbiased estimate of the mean-square risk is analyzed and it is shown that, under certain conditions of regularity, this estimate is strongly consistent and asymptotically normal. These properties allow using the risk estimate as a quality criterion for the method and constructing asymptotic confidence intervals for the theoretical mean-square risk.

Keywords: wavelets; block thresholding; mean-square risk estimate; asymptotic normality; strong consistency

INTELLIGENT ANALYSIS OF BIG DATA EXTENDIBLE COLLECTIONS UNDER THE LIMITS OF PROCESS-REALTIME
  • A. A. Grusho  Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
  • M. I. Zabezhailo  A. A. Dorodnicyn Computing Center, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 40 Vavilov Str., Moscow 119333, Russian Federation
  • D. V. Smirnov  Sberbank of Russia, 19 Vavilov Str., Moscow 117999, Russian Federation
  • E. E. Timonina  Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation

Abstract: The problem how to extract relevant to the fixed goal data from regularly extended by new information collections of Big Data not braking given limits for data analysis and decision making (being in agreement with so- called process-real time restrictions) is discussed. The proposed approach is based on implementation of modern artificial intelligence techniques including knowledge representation and reasoning formalization for so-called Intelligent Data Analysis (IDA) computer systems. Some critical barriers preventing efficient application of this type IDA (e. g., computational complexity of some related to IDA combinatorial problems, including provable getting some of them in well-known classes of computationally hard problems, some characteristic features of knowledge representation and search iteration enumeration control, optimization of accuracy, and completeness of search results) are analyzed. A formalized description for the designed IDA set of procedures is presented. The discussed approach is illustrated by examples of its implementation in a corporate computer system of malicious insider activities identification and counteraction operating in a large Russian commercial bank.

Keywords: Big Data; process-real time; intelligent data analysis; information security; insider malicious activities

SOME PROPERTIES OF GAUSSIAN MIXTURES AND APPLICATIONS TO MAGNETOENCEPHALOGRAPHY PROBLEMS
  • M. B. Goncharenko  INTEL A/O, 17-4 Krylatskaya Str., Moscow 121614, Russian Federation
  • T. V. Zakharova  Department of Mathematical Statistics, Faculty of Computational Mathematics and Cybernetics, M. V Lomonosov Moscow State University, 1-52 Leninskie Gory, GSP-1, Moscow 119991, Russian Federation, Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation

Abstract: The article is dedicated to research of various properties of compound probability distributions (mixture distributions). Special attention is paid to the case when the mixed distribution is Gaussian. The authors establish the similarities in the behavior of Gaussian mixtures and Gaussian distributions during transformations. The authors study applications to magnetoencephalographic brain research. The authors determine the conditions under which the Aitken estimator (generalized least squares) is applicable for localization of sources of neurophysiologic activity in the case of noise having compound Gaussian distribution.

Keywords: compound distributions; compound Gaussian distribution; compound Student distribution; compound lognormal distribution; compound gamma distributions; magnetoencephalography; MEG; inverse MEG problem; Aitken's estimator

SOFT COMPUTING IN PROBLEMS OF MEDICAL DIAGNOSTICS
  • M. P. Krivenko  Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation

Abstract: In recent years, the importance of informatics has increased for the interpretation and analysis of data using computational methods, in particular, the so-called "soft" computing (Soft Computing - SC). The article discusses the possibilities of using SC for solving problems related to medicine and, especially, problems of decision support. At the same time, it is demonstrated that one should not artificially use innovations, especially since, at the cost of little effort, one can turn to classical approaches that are methodologically rigorous and lead to guaranteed results. The undoubted interest in the study of SC methodologies in various disciplines (genetics, physiology radiology, cardiology, neurology, etc.) demonstrates that their study is extremely fruitful and it is expected that future research in medicine will use the corresponding methods to a greater extent than today and for more complex tasks.

Keywords: medicine; soft computing; reference values; Bayesian approach

METHOD OF STRAIGHTENING DISTORTED DUE TO MULTICOLLINEARITY COEFFICIENTS IN REGRESSION MODELS
  • M. P. Bazilevskiyo  Department of Mathematics, Irkutsk State Transport University, 15 Chernyshevskogo Str., Irkutsk 664074, Russian Federation

Abstract: When constructing regression models, due to the strong multicollinearity of the explanatory variables, its coefficients are distorted, in particular, their signs, which negatively affects the interpretational qualities of such regression. This article is devoted to the development of a method of straightening coefficients distorted due to multicollinearity This method is based on the property of the fully connected linear regression models proposed by the author. A nonlinear system, which is used to estimate fully connected regressions, is investigated. It is shown that the solution of this system can be obtained numerically using the method of simple iterations. A method for choosing unknown lambda-parameters in fully connected regression is proposed. It was found that in multivariate fully connected models with a strong correlation of all factors, the signs of the coefficients for the variables in the secondary equation coincide with the corresponding signs of the correlation coefficients. To straighten the distorted coefficients on the basis of this research, the "Selection B" algorithm was developed. The developed method of straightening has been successfully demonstrated by the example of modeling Russia's gross domestic product (GDP).

Keywords: regression analysis; fully connected linear regression model; multicollinearity; interpretation; numerical method; GDP of Russia

COORDINATION OF AGENTS' GOALS IN COHESIVE HYBRID INTELLIGENT MULTIAGENT SYSTEMS
  • I. A. Kirikov  Kaliningrad Branch of the Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 5 Gostinaya Str., Kaliningrad236000, Russian Federation
  • S. V. Listopad  Kaliningrad Branch of the Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 5 Gostinaya Str., Kaliningrad236000, Russian Federation

Abstract: When developing an intelligent system as a community of heterogeneous intelligent agents, it is important to organize their interaction. To reduce the complexity of this procedure, it is proposed to simulate with methods of cohesive hybrid intelligent multiagent systems the mechanisms of cohesion emergence in teams of specialists solving problems "at a round table." Agents of such systems should be able to independently coordinate their goals and domain models and develop a protocol to solve the posed problem. The article proposes a model for coordinating the goals of agents of cohesive hybrid intelligent multiagent systems.

Keywords: cohesion; hybrid intelligent multiagent system; team of specialists

CHEBYSHEV-EDGEWORTH EXPANSIONS FOR DISTRIBUTIONS OF GENERALISED HOTELLING-TYPE STATISTICS BASED ON RANDOM SIZE SAMPLES
  • M. M. Monakhov  Moscow Center for Fundamental and Applied Mathematics, M. V. Lomonosov Moscow State University, 1-52 Leninskie Gory, GSP-1, Moscow 119991, Russian Federation

Abstract: The general transfer theorem for the distribution function of asymptotically normal statistics was generalized on the Hotelling-type statistics case and analog of general transfer theorem for the distribution function of Hotelling-type statistics with random size was proved. It allowed to obtain the Chebyshev-Edgeworth expansion for initial Hotelling-type statistics. The explicit form of the Chebyshev-Edgeworth expansion was obtained for the case when the random sample size distribution is the negative binomial distribution shifted by 1. The limit distribution for this case was F-distribution. The Cornish-Fisher expansion was obtained for the special case of parameter of random sample size. The computational experiment was conducted and graphs were plotted for Chebyshev-Edgeworth expansion illustration.

Keywords: generalised Chebyshev-Edgeworth expansion; Cornish-Fisher expansion; sample with random size; F-disribution; Hotelling-type statstics

COMPRESSION ALGORITHMS FOR FORCE VOLUME DATA I: CODING OF PREDICTION ERRORS
  • D. V. Sushko  Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation

Abstract: The author considers the problem of reversible (lossless) compression of force volume data which are the three-dimensional arrays with 16-bit integer elements. Such arrays are the result of atomic force microscopy scanning of microobjects in the force mapping mode. The author proposes reversible compression algorithms of force volume data based on the universal arithmetic coding of their prediction errors. The author uses two methods of universal coding. The first method based on the statistical model of the source with the calculable sequence of states implies the decomposition of an error prediction sequence into two subsequences which are coded independently. The second method implies a choice of the appropriate weight while constructing the code probabilities used in arithmetic coding. The author constructs bit rate estimations for the proposed algorithms for five test arrays. The results show that combination of the universal coding methods mentioned above makes significant reduction of the bit rate. The bit rates of the most efficient algorithm among proposed practically applicable algorithms for the test arrays are 3.9285, 3.5268, 3.5024, 4.2813, and 4.2246 bit/pixel.

Keywords: atomic force microscope; force volume data; reversible compression; arithmetic coding; universal coding

STRUCTURING PRINCIPLES OF ELECTRONIC DICTIONARY'S ENTRIES
  • A. A. Goncharov  Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
  • I. M. Zatsman  Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation

Abstract: Two tasks that arise when converting paper dictionaries into an electronic form are considered. In the first place, the authors suggest structuring inherited dictionary entries which provides the enrichment of the electronic dictionary's functionality, and in the second place, replacing the decorative design of the structural elements of dictionary entries with tagging that provide their addressing in databases. It is shown that the structure of dictionary entries used in traditional lexicography should be detailed. Simultaneously, it is necessary to categorize some of the structural elements to enrich the electronic dictionary's functionality. An approach to creating a classification system integrated into an electronic dictionary and classifying dictionary entries' structural items is described. The proposed solutions allow to significantly enrich the electronic dictionary's functionality compared to its paper version and overcome traditional lexicography limitations related to the paper form of dictionary representation.

Keywords: structuring principles; electronic dictionary; electronic lexicography; classification system

EXTRACTING KNOWLEDGE ABOUT MEANS OF EXPRESSION OF LOGICAL-SEMANTIC RELATIONS FROM THE SUPRACORPORA DATABASE
  • A. A. Goncharov  Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
  • O. Yu. Inkova  Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation

Abstract: The goal of this paper is to demonstrate how parallel texts annotated with a supracorpora database (SCDB) can be efficiently used to extract knowledge about alternative means of expression of logical-semantic relations (LSR). The authors review the most prominent discursively annotated corpora (Penn Discourse Treebank, Prague Dependency Treebank, and Rhetorical Structure Theory Discourse Treebank) to support the observation that there is no consensus among the researchers as to which linguistic means are to be considered connectives (i. e., prototypical markers of LSR) and which means are deemed "alternative." The research shows that application of the comparative method while leveraging the capabilities of the SCDB of connectives makes it possible not only to extract new knowledge about LSR markers but also to create thesauri of various means of LSR expression in the languages involved, including the alternative ones. In addition, the SCDB data makes it possible to generate new knowledge on correlations between specific LSRs and unconventional means of LSR expression and calculate frequencies of utilization of these means for the studied languages.

Keywords: supracorpora database; logical-semantic relations; connectives; knowledge generation; parallel texts

METHODS OF QUALITY ESTIMATION FOR MACHINE TRANSLATION: STATE-OF-THE-ART
  • V. A. Nuriev  Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
  • A. Yu. Egorova  Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
Abstract: The paper reviews the state-of-the-art methods of quality estimation for machine translation. These methods are grounded in two general approaches: automatic and manual. The automatic assessment builds on the data from comparison of the machine translation system output against the human-generated reference translation.
The manual (human) evaluation primarily takes into account pragmatic and functional aspects: the translation quality is assessed bearing in mind how well the system output is suited to fulfill the translation tasks. The first part presents some automatic metrics for evaluation of machine translation quality. Also, it speaks about both shortcomings of such metrics and new trends in their development. The other part of the paper is focused on human evaluation of machine translation. It describes: (i) evaluation of adequacy and fluency; (ii) ranking of translations; (iii) direct assessment; (iv) computation of the human translation edit rate, and (v) translation annotation involving an error typology.

Keywords: machine translation; translation quality; evaluation of machine translation quality; automatic metrics; direct assessment; typology of machine translation errors

STOCHASTIC DYNAMICS OF SELF-ORGANIZING SOCIAL SYSTEMS WITH MEMORY (ELECTORAL PROCESSES)
  • A. S. Sigov  Russian Technological University (MIREA), 78 Vernadskogo Ave., Moscow 119454, Russian Federation
  • E. G. Andrianova  Russian Technological University (MIREA), 78 Vernadskogo Ave., Moscow 119454, Russian Federation
  • L. A. Istratov  Russian Technological University (MIREA), 78 Vernadskogo Ave., Moscow 119454, Russian Federation

Abstract: The paper discusses the use of the methods and approaches which are common for theoretical computer science as well as the use of its applications for analysis and modeling of social group processes. Based on the developed model for describing stochastic processes, taking into account self-organization and the presence of memory, an analysis of the voter preference dynamics during the 2016 U.S. presidential campaign was conducted.
The sociological data processing allowed plotting the probability density histograms for the amplitudes of voter preference deviation, depending on their determination interval, and developing a model that well describes the main characteristics of the observed processes (appearance of oscillations, changes in the height and width of the distribution depending on the changes in the amplitude calculation interval, etc.). In the course of building the model, the probability schemes of transitions between the possible states of the social system (voter preferences) were considered and a second-order nonlinear differential equation was derived. In addition, a boundary problem to determine the probability density function of the amplitude of voter preference deviation depending on its determination interval was formulated and solved. The model differential equation has a term responsible for the self-organization possibility and takes into account the presence of memory. The oscillation possibility depends on the initial conditions. The developed model can be used for analyzing election campaigns and making relevant decisions.

Keywords: oscillation amplitude distribution function; stochastic dynamics; self-organization; presence of memory; probability density oscillations; electoral processes