Институт проблем информатики Российской Академии наук
Институт проблем информатики Российской Академии наук
Российская Академия наук

Институт проблем информатики Российской Академии наук




«INFORMATICS AND APPLICATIONS»
Scientific journal
Volume 14, Issue 1, 2020

Content | About  Authors

Abstract and Keywords.

ASYMPTOTIC REGULARITY OF THE WAVELET METHODS OF INVERTING LINEAR HOMOGENEOUS OPERATORS FROM OBSERVATIONS RECORDED AT RANDOM TIMES
  • O. V. Shestakov  Department of Mathematical Statistics, Faculty of Computational Mathematics and Cybernetics, M.V. Lomonosov Moscow State University, 1-52 Leninskiye Gory, GSP-1, Moscow 119991, Russian Federation, Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation

Abstract: When solving inverse statistical problems, it is often necessary to invert some linear homogeneous operator and it is usually necessary to use regularization methods, since the observed data are noisy. Popular methods for noise suppression are the procedures of thresholding the expansion coefficients of the observed function. The advantages of these methods are their computational efficiency and the ability to adapt to both the type of operator and the local features of the estimated function. An analysis of the errors of these methods is an important practical task, since it allows one to evaluate the quality of both the methods themselves and the equipment used. Sometimes, the nature of the data is such that observations are recorded at random times. If the observation points form a variational series constructed from a sample of a uniform distribution on the data recording interval, then the use of conventional threshold processing procedures is adequate. The present author analyzes the estimate of the mean square risk in the problem of inversion of linear homogeneous operators and demonstrates that under certain conditions, this estimate is strongly consistent and asymptotically normal.

Keywords: threshold processing; linear homogeneous operator; random observation points; mean square risk estimate

ANALYSIS OF CONFIGURATIONS OF LSTM NETWORKS FOR MEDIUM-TERM VECTOR FORECASTING
  • A. K. Gorshenin  Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation, Faculty of Computational Mathematics and Cybernetics, Lomonosov Moscow State University, GSP-1, Leninskie Gory, Moscow 119991, Russian Federation
  • V. Yu. Kuzmin  "Wi2Geo LLC," 3-1 Mira Ave., Moscow 129090, Russian Federation

Abstract: The paper analyzes 36 configurations of LSTM (long short-term memory) architectures for forecasting with a duration up to 70 steps based on data whose size is 300-500 elements. For probabilistic approximation of observations, a model based on finite normal mixtures is used; therefore, the mathematical expectation, variance, skewness, and kurtosis of these mixtures are used as initial data for forecasting. The optimal configurations of neural networks were determined and the practical possibility of constructing high-quality medium-term forecasts with a limited training time was demonstrated. The results obtained are important for the development of a probabilistic-statistical approach to the description of the evolution of turbulent processes in a magnetically active high-temperature plasma.

Keywords: LSTM; forecasting; deep learning; high-performance computing; CUDA

NUMERICAL SCHEMES OF MARKOV JUMP PROCESS FILTERING GIVEN DISCRETIZED OBSERVATIONS II: ADDITIVE NOISE CASE
  • A. V. Borisov  Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation

Abstract: The note is a sequel of investigations initialized in the article Borisov, A. 2019. Numerical schemes of Markov jump process filtering given discretized observations I: Accuracy characteristics. Inform. Appl. 13(4):68-75.
The basis is the accuracy characteristics of the approximated solution of the filtering problem for the state of homogeneous Markov jump processes given the continuous indirect noisy observations. The paper presents a number of the algorithms of their numerical realization together with the comparative analysis. The class of observation systems under investigation is bounded by ones with additive observation noises. This presumes that the observation noise intensity is a nonrandom constant. To construct the approximation, the authors use the left and midpoint rectangle rule of the accuracy order 2 and 3, respectively, and the Gaussian quadrature of the order 5. Finally, the presented numerical schemes have the accuracy of the order 1 /2, 1, and 2.

Keywords: Markov jump process; optimal filtering; additive and multiplicative observation noises; stochastic differential equation; analytical and numerical approximation

STOCHASTIC DIFFERENTIAL SYSTEM OUTPUT CONTROL BY THE QUADRATIC CRITERION. IV. ALTERNATIVE NUMERICAL DECISION
  • A. V. Bosov  Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
  • A. I. Stefanovich  Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation

Abstract: In the study of the optimal control problem for the Ito diffusion process and the controlled linear output with a quadratic quality criterion, an intermediate result is resumed: for approximate calculation of the optimal solution, an alternative to classical numerical integration method based on computer simulation is proposed.
The method allows applying statistical estimation to determine the coefficients @t(y) and Yt(y) of the previously obtained Bellman function Vt(y, z) = atz2 + в(y)z + Yt(y), determining the optimal solution in the original problem of optimal stochastic control. The method is implemented on the basis of the properties of linear parabolic partial differential equations describing @t(y) and Yt(y) - their equivalent description in the form of stochastic differential equations and a theoretical-probability representation of the solution, known as A. N. Kolmogorov equation, or an equivalent integral form known as the Feynman-Katz formula. Stochastic equations, relations for optimal control and for auxiliary parameters are combined into one differential system, for which an algorithm for simulating a solution is stated. The algorithm provides the necessary samples for statistical estimation of the coefficients в (y) and yt(y). The previously performed numerical experiment is supplemented by calculations presented by an alternative method and a comparative analysis of the results.

Keywords: stochastic differential equation; optimal control; Bellman function; linear differential equations of parabolic type; Kolmogorov equation; Feynman-Katz formula; computer simulations; Monte-Carlo method

ALIGNMENT OF ORDERED SET CARTESIAN PRODUCT
  • A. V. Goncharov  Moscow Institute of Physics and Technology, 9 Institutskiy Per., Dolgoprudny, Moscow Region 141700, Russian Federation
  • V. V. Strijov  Moscow Institute of Physics and Technology, 9 Institutskiy Per., Dolgoprudny, Moscow Region 141700, Russian Federation, A. A. Dorodnicyn Computing Center, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 40 Vavilov Str., Moscow 119333, Russian Federation

Abstract: The work is devoted to the study of metric methods for analyzing objects with complex structure. It proposes to generalize the dynamic time warping method of two time series for the case of objects defined on two or more time axes. Such objects are matrices in the discrete representation. The DTW (Dynamic Time Warping) method of time series is generalized as a method of matrices dynamic alignment. The paper proposes a distance function resistant to monotonic nonlinear deformations of the Cartesian product of two time scales. The alignment path between objects is defined. An object is called a matrix in which the rows and columns correspond to the axes of time. The properties of the proposed distance function are investigated. To illustrate the method, the problems of metric classification of objects are solved on model data and data from the MNIST dataset.

Keywords: distance function; dynamic alignment; distance between matrices; nonlinear time warping; space-time series

NEUROPHYSIOLOGY AS A SUBJECT DOMAIN FOR DATA INTENSIVE PROBLEM SOLVING
  • D. O. Briukhov  Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
  • S. A. Stupnikov  Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
  • D. Yu. Kovalev  Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
  • I. A. Shanin  Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation

Abstract: The goal of this survey is to analyze neurophysiology as a data intensive domain. Nowadays, the number of researches on the human brain is increasing. International projects and researches are aimed at improvement of the understanding of the human brain function. The amount of data obtained in typical laboratories in the field of neurophysiology is growing exponentially. The data are represented using a large number of various formats.
This requires creation of infrastructures, databases, and websites that provide unified access to data and support the exchange of data between researchers all over the world. Specific methods and tools forming the field of neuroinformatics (that is, an intersection of neurophysiology and computer science) are used to analyze collected data and to solve neurophysiological problems. These methods include, in particular, statistical analysis, machine learning, and neural networks.

Keywords: neurophysiology; neurophysiological resources; neuroinformatics; data intensive research; analysis of neurophysiological data

RISK-NEUTRAL DYNAMICS FOR THE ARIMA-GARCH RANDOM PROCESS WITH ERRORS DISTRIBUTED ACCORDING TO THE JOHNSON'S SU LAW
  • A. R. Danilishin  Department of Operations Research, Faculty of Computational Mathematics and Cybernetics, M. V. Lomonosov Moscow State University, 1-52 Leninskiye Gory, Moscow 119991, GSP-1, Russian Federation
  • D. Yu. Golembiovsky  Department of Operations Research, Faculty of Computational Mathematics and Cybernetics, M. V. Lomonosov Moscow State University, 1-52 Leninskiye Gory, Moscow 119991, GSP-1, Russian Federation, Department of Banking, Sinergy University, 80-G Leningradskiy Prospect, Moscow 125190, Russian Fede

Abstract: Risk-neutral world is one of the fundamental principles of financial mathematics, for definition of a fair value of derivative financial instruments. The article deals with the construction of risk-neutral dynamics for the ARIMA-GARCH (Autoregressive Integrated Moving Average, Generalized AutoRegressive Conditional Heteroskedasticity) random process with errors distributed according to the Johnson's SU law. Methods for finding risk-neutral coefficients require the existence of a generating function of moments (examples of such transformations are the Escher transformation, the extended Girsanov principle). A generating function of moments is not known for Student and Johnson's SU distributions. The authors form a generating function of moments for the Johnson's SU distribution and prove that a modification of the extended Girsanov principle may obtain a risk-neutral measure with respect to the chosen distribution.

Keywords: ARIMA; GARCH; risk-neutral measure; Girsanov extended principle; Johnson's SU ; option pricing

IMPROVEMENT OF THE ACCURACY OF SOLUTION OF TASKS FOR THE ACCOUNT OF THE CONSTRUCTION OF BOUNDARY CONDITIONS
  • S. M. Serebryanskii  Troitsk Branch of Chelyabinsk State University, 9 S. Rasin Str., Troitsk 457100, Russian Federation
  • A. N. Tyrsin  Science and Engineering Center "Reliability and Resource of Large Systems and Machines," Ural Branch of the Russian Academy of Sciences; 54a Studencheskaya Str., Yekaterinburg 620049, Russian Federation

Abstract: The problems of stability of the solution of inverse problems with respect to the exact setting of boundary conditions are considered. In practical applications, as a rule, the theoretical form of the functional dependence of the boundary conditions is a form that is not defined or not known, and there are also random measurement errors. Studies have shown that this leads to a significant reduction in the accuracy of solving the inverse problem. In order to increase the accuracy of solving inverse problems, it was proposed to refine the functional form of the boundary conditions by recognizing the form of the mathematical model of dependence with the subsequent approximation by this function of the behavior of a physical quantity at the boundary. Dependency recovery was performed using dependency recognition methods based on structural difference schemes and inverse mapping recognition. Model examples of implementation in the presence of additive random measurement errors and an unknown type of dependence of the boundary conditions are given.

Keywords: inverse problem; recognition; functional dependence; model; difference schemes; inverse function; sampling; variance; approximation

ON METHODS FOR IMPROVING THE ACCURACY OF MULTICLASS CLASSIFICATION ON IMBALANCED DATA
  • L. A. Sevastianov  Peoples' Friendship University of Russia (RUDN University), 6 Miklukho-Maklaya Str., Moscow 117198, Russian Federation
  • E. Yu. Shchetinin  Financial University under the Government of the Russian Federation, 49 Leningradsky Prospekt, Moscow 125993, Russian Federation

Abstract: This paper studies methods to overcome the imbalance of classes in order to improve the quality of classification with accuracy higher than the direct use of classification algorithms to unbalanced data. The scheme to improve the accuracy of classification is proposed, consisting in the use of a combination of classification algorithms and methods ofselection offeatures such as RFE (Recursive Feature Elimination), Random Forest, and Boruta with the preliminary use of balancing classes by random sampling methods, SMOTE (Synthetic Minority Oversamplimg TEchnique) and ADASYN (ADAptive SYNthetic sampling). By the example of data on skin diseases, computer experiments were conducted which showed that the use of sampling algorithms to eliminate the imbalance of classes as well as the selection of the most informative features significantly increases the accuracy of the classification results. The most effective classification accuracy was the Random Forest algorithm for sampling data using the ADASYN algorithm.

Keywords: imbalanced data; classification; sampling; random forest; ADASYN; SMOTE

MODELING OF MONITORING OF INFORMATION SECURITY PROCESS ON THE BASIS OF QUEUING SYSTEMS
  • G. A. Popov  Astrakhan State Technical University, 16 Tatischeva Str., Astrakhan 414056, Russian Federation
  • S. Zh. Simavoryan  Sochi State University, 94 Plastunskaya Str., Sochi 354003, Russian Federation
  • A. R. Simonyan  Sochi State University, 94 Plastunskaya Str., Sochi 354003, Russian Federation
  • E. I. Ulitina  Sochi State University, 94 Plastunskaya Str., Sochi 354003, Russian Federation

Abstract: The paper is devoted to the mathematical modeling of monitoring process by the information security systems, aimed at detection of hidden malicious attacks. The modeling is based on the queueing theory formalism. The monitoring process is reduced to the analysis of the customer flow arriving at the queueing system, in which each customer is regarded as carrying potential malicious attacks. Functional relations between the system state probability distribution and the distribution of the number of undetected malicious attacks on service completion epochs are obtained. These characteristics may allow one to improve the efficiency of malicious attacks detection process in the data processing systems.

Keywords: protection of information; information security; queuing system; probability

ON CAUSAL REPRESENTATIVENESS OF TRAINING SAMPLES OF PRECEDENTS IN DIAGNOSTIC TYPE TASKS
  • A. A. Grusho  Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
  • M. I. Zabezhailo  Dorodnicyn Computing Center, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 40 Vavilov Str., Moscow 119333, Russian Federation
  • E. E. Timonina  Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation

Abstract: The work focuses on some features of causality analysis in data mining tasks. The possibilities of using so-called open logic theories in diagnostic (classification) tasks to describe replenished sets of empirical data are discussed. In tasks of this type, it is necessary to establish (predict, diagnose, etc.) the presence or absence of a target property in a new precedent given by a description in the same presentation language of heterogeneous data, which describes examples having a target property and counter-examples not having a target property. The variant of construction of open theories describing collections of precedents by means of special logical expressions - characteristic functions - is presented. Characteristic functions allow to get rid of heterogeneity in descriptions of precedents. The procedural design of formation of characteristic functions of a training sample of precedents is proposed. The properties of characteristic functions and some conditions of their existence are studied.

Keywords: diagnostics; causal analysis; intelligent data analysis; open logic

PERFORMANCE OF THE BOUNDED PIPELINE
  • A. A. Khusainov  Komsomolsk-na-Amure State University, 27 Lenina Prosp., Komsomolsk-on-Amur, Khabarovsk Region 681013, Russian Federation

Abstract: The paper is devoted to studying the performance of a bounded pipeline that is a computational pipeline, the number of active stages of which is bounded at any time by a fixed number. The bounded pipelines with the given sum and the maximum of delays of stages are considered. The stages can have different delays. The main problem is to build an analytical model for calculating the processing time of a given amount of data using this bounded pipeline. The solution is simplified if the constraint is treated as a structural pipeline hazard. This analytical model is constructed for the case when the operation of a bounded pipeline has the property of continuity of processing for each input element. For such pipelines, the conjecture is proved in the paper that the minimum number of processors at which the greatest productivity is achieved is equal to the smallest integer not less than the ratio of the sum of stage delays to the maximum delay. It is established that if the property of continuity is not required, then this conjecture is not true. The constructed model can be used to synchronize the operation of the stages of a bounded pipeline with the continuity property. If we do not require the property of continuity, then we get an asynchronous bounded pipeline, the synchronization of the work for the stages is carried out on the basis of the data readiness. The software is developed, which is based on the theory of trace monoids and allows one to calculate the processing time with an asynchronous bounded pipeline.

Keywords: computational pipeline; trace monoid; Foata normal form; pipeline performance; structural hazard

METHOD FOR DEFINING FINITE NONCOMMUTATIVE ASSOCIATIVE ALGEBRAS OF ARBITRARY EVEN DIMENSION FOR DEVELOPMENT OF THE POSTQUANTUM CRYPTOSCHEMES
  • A. A. Kostina  St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences, 39, 14th Line V.O., St. Petersburg 199178, Russian Federation
  • A. Yu. Mirin  St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences, 39, 14th Line V.O., St. Petersburg 199178, Russian Federation
  • D. N. Moldovyan  St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences, 39, 14th Line V.O., St. Petersburg 199178, Russian Federation
  • R. Sh. Fahrutdinov  St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences, 39, 14th Line V.O., St. Petersburg 199178, Russian Federation

Abstract: The paper introduces a new unified method for defining finite noncommutative associative algebras of arbitrary even dimension m and describes the investigated properties of the algebras for the cases m = 4 and 6, when the algebras are defined over the ground field GF(p) with a large size of the prime number p. Formulas describing the set of p2 (p4) global left-sided units contained in the 4-dimensional (6-dimensional) algebra are derived. Only local invertibility takes place in the algebras investigated. Formulas for computing the unique local two-sided unit related to the fixed locally invertible vector are derived for each of the algebras. A new form of the hidden discrete logarithm problem is proposed as postquantum cryptographic primitive. The latter was used to develop the postquantum digital signature scheme.

Keywords: finite noncommutative algebra; associative algebra; computationally difficult problem; discrete logarithm; digital signature; postquantum cryptography

SIMULTANEOUS LOCALIZATION AND MAPPING METHOD IN THREE-DIMENSIONAL SPACE BASED ON THE COMBINED SOLUTION OF THE POINT-POINT VARIATION PROBLEM ICP FOR AN AFFINE TRANSFORMATION
  • A. V. Vokhmintcev  Chelyabinsk State University, 129 Br. Kashirinyh Str., Chelyabinsk 454001, Russian Federation, Ugra State University, 16 Chekhov Str., Khanty-Mansiysk 628012, Russian Federation
  • A. V. Melnikov  Ugra State University, 16 Chekhov Str., Khanty-Mansiysk 628012, Russian Federation
  • S. A. Pachganov  Ugra State University, 16 Chekhov Str., Khanty-Mansiysk 628012, Russian Federation

Abstract: Simultaneous localization and mapping is a problem in which frame data are used as the only source of external information to define the position of a moving camera in space and at the same time, to reconstruct a map of the study area. Nowadays, this problem is considered solved for the construction of two-dimensional maps for small static scenes using range sensors such as lasers or sonar. However, for dynamic, complex, and large-scale scenes, the construction of an accurate three-dimensional map of the surrounding space is an active area of research. To solve this problem, the authors propose a solution of the point-point problem for an affine transformation and develop a fast iterative algorithm for point clouds registering in three-dimensional space. The performance and computational complexity ofthe proposed method are presented and discussed by an example of reference data. The results can be applied for navigation tasks of a mobile robot in real-time.

Keywords: registration problem; localization; simultaneous localization and mapping; affine transformation; two-dimensional descriptors; iterative closest point

ANALYTICAL TEXTOLOGY IN INTELLIGENT PROCESSING SYSTEMS FOR UNSTRUCTURED DATA
  • E. B. Kozerenko  Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
  • M. Y. Mikheev  Research Computing Center Lomonosov Moscow State University, 1, bld. 4 Leninskie Gory, Moscow, GSP-1, 119991, Russian Federation
  • N. V. Somin  Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
  • L. I. Ehrlich  Research Computing Center Lomonosov Moscow State University, 1, bld. 4 Leninskie Gory, Moscow, GSP-1, 119991, Russian Federation
  • K. I. Kuznetsov  Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation

Abstract: The paper presents a new field of research at the intersection of linguistics, computer science, and philology involving logical and statistical methods of analyzing unstructured data in the form of natural language texts in order to solve a number of the tasks of extracting explicit and implicit knowledge from texts using a semantics-oriented linguistic processor, forming lexical statistical representations of texts, building analytical conclusions, discovery of the author's idiostyle and textual similarity of literary works based on the analysis of service words and other microtext elements; identifying the sentiment of texts, building a full profile of the author's text based on the superposition of methods. The example of the textological analysis of the "Blue Book" of the "Petersburg Diary" by Zinaida Hippius is considered.

Keywords: natural language processing; statistical methods; cognitive technology; lexical semantic analysis; knowledge extraction from texts; analytical systems

INCAPSULATION OF SEMANTIC REPRESENTATIONS INTO ELEMENTS OF A GRAMMAR<
  • Sh. B. Shihiev  Department of Discrete Mathematics and Computer Science, Dagestan State University, 43-a Gadzhiyev Str., Makhachkala 367000, Republic of Dagestan, Russian Federation
  • F. Sh. Shihiev  Department of Discrete Mathematics and Computer Science, Dagestan State University, 43-a Gadzhiyev Str., Makhachkala 367000, Republic of Dagestan, Russian Federation

Abstract: The article proposes a new mathematical apparatus of natural language representation for computer linguistics: morphology, syntax, and semantics are described as the objects of discrete mathematics forming a hierarchy and an integral information system. The proposed constructive language theory is a new approach to language learning by separating the domains of syntax and semantics, constructing the autonomous models of syntax and semantics, language formation as the mapping of elements of two sets: syntax and semantics.

Keywords: natural language; graph; syntax; semantics; lexicon; word form; morphological feature; lexical group; dictionary; sentence; algorithm

INFORMATION FUSION OF DOCUMENTS
  • S. K. Dulin  Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
  • N. G. Dulina  A. A. Dorodnicyn Computing Center, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 40 Vavilov Str., Moscow 119333, Russian Federation
  • P. V. Ermakov  TeleRetail GmbH, 30 MarkenstraBe, Diisseldorf 40227, Germany

Abstract: The paper considers the problems associated with the creation ofan expert base ofdocuments that require prompt processing of incoming information and, as a consequence, restructuring of the knowledge base. The authors propose procedures that reduce the search of the optimal consistent state of interrelated documents. An approach to assessing the relationship of text documents and informational messages as poorly structured objects was developed. The practical implementation of this approach is described.

Keywords: information fusion; controlled data and knowledge consistency; knowledge base restructuring