Eng | Rus

“Informatics and Applications” scientific journal

Volume 7, Issue 3, 2013

Content   Abstract and Keywords   About Authors

UNSUPERVISED APPROACH TO WEB WRAPPER MAINTENANCE.

  • A.M. Andreev   Bauman Moscow State Technical University, arkandreev@gmail.com
  • D. V. Berezkin  Bauman Moscow State Technical University, dmitryb2007@yandex.ru
  • I.A. Kozlov   Bauman Moscow State Technical University, kozlovilya89@gmail.com
  • K. V. Simakov   Bauman Moscow State Technical University, skv@ixlab.ru

Abstract: HTML-wrapper applications rely on formatting regularities of targeted websites. Therefore, maintenance of such applications is connected with the problem of detecting markup changes of web pages. This article describes the unsupervised approach to this problem. The proposed method of detection consists of two parts: the real-time one based on clustering considering HTML-document as a vector of some features and the time-lagged one based on comparison of distributions of such features for learning and testing sets of HTML-documents. There have been carried out several experiments with data obtained from real wrapper. The results reveal feasibility of the suggested approach.

Keywords:  wrapper maintenance; web-site parsing; clustering; HTML-markup statistical processing

BUILDING REAL-TIME NEWS RECOMMENDATION SERVICE USING NoSQL DBMS.

  • P.A. Klemenkov   M.V. Lomonosov Moscow State University, parser@cs.msu.su

Abstract: The analysis of user interaction with a Web application, the methods of conducting such an analysis, and their shortcomings are discussed. An implementation of the news recommendation service using existing approaches is described. A newNoSQL approach to building recommendation systems that operate in near real time is suggested.

Keywords:  recommendation systems; minhash; mapreduce; NoSQL

A VERIFIABLE MAPPING OF A MULTIDIMENSIONAL ARRAY DATA MODEL INTO AN OBJECT DATA MODEL.

  • S.A. Stupnikov   IPI RAN, ssa@ipi.ac.ru

Abstract: The paper considers a mapping of a multidimensional array data model into an object data model. General principles of mappings of array data models into object data models are formulated. A mapping of concrete models is also considered. The source model is the Array Data Model used in the SciDB DBMS. The target model is the SYNTHESIS language used as the canonical data model in the subject mediation technology. A method for verification of themapping is considered. Verification means a formal proof that themapping preserves information and semantics of the operations. Verification is realized using the AMN formal specification language. A practical aim of the paper is to provide a basis for virtual or materialized integration of array-based information resources.

Keywords:  multidimensional arrays; object data model; data model mapping; database integration

STUDY OF THE WIKIPEDIA(EN) CATEGORIES GRAPH.

  • A. V. Shkotin   GIS department, State GeologicalMuseum of the Russian Academy of Sciences, ashkotin@acm.org

Abstract: Wikipedia is the outstanding project of knowledge accumulation both of general using and different areas of specialization. Quality check of this knowledge, especially automatic, is very important. In this paper, the results of studying a structure of the English version of WCG (Wikipedia Categories Graph) as a whole are presented. The WCG is a system that supports structure of knowledge and it is interesting to know what WCG includes and how it is arranged. It is shown that in graph, there are unacceptable logic violations and organizational and technical methods for elimination are discussed.

Keywords:  Wikipedia; digraph; connected components; logical analysis

ACTIVE AUTHENTICATION METHODS USING KEYSTROKE DYNAMICS.

  • V. Yu. Kaganov   Faculty of Computational Mathematics and Cybernetics, M.V. Lomonosov Moscow State University, vladhid@mlab.cs.msu.su
  • A.K. Korolyov   Faculty of Computational Mathematics and Cybernetics, M.V. Lomonosov Moscow State University, akorolev@mlab.cs.msu.su
  • M.N. Krylov   Faculty of Computational Mathematics and Cybernetics, M.V. Lomonosov Moscow State University, krylovm@mlab.cs.msu.su
  • I. V.Mashechkin   Faculty of Computational Mathematics and Cybernetics, M.V. Lomonosov Moscow State University, mash@cs.msu.su
  • M. I. Petrovskiy   Faculty of Computational Mathematics and Cybernetics, M.V. Lomonosov Moscow State University, michael@cs.msu.su

Abstract: An overview of some effective methods of authentication using behavior models, created from keystroke dynamics data is presented. Also, a new data representation model was proposed, a number of experiments conducted using this model, and various algorithms of machine learing.

Keywords:  wavelets; thresholding; risk estimate; normal distribution; rate of convergence

PROBLEMS OF THE ONLINE ACCESS TO SCIENTIFIC JOURNALS.

  • A. V. Glushanovskii   Library for Natural Sciences, Russian Academy of Sciences, avglush@benran.ru
  • N. E. Kalenov   Library for Natural Sciences, Russian Academy of Sciences, nek@benran.ru

Abstract: The problems of supplying with full-text scientific information access via Internet for the institutions of the Russian Academy of Sciences (RAS) are considered. According to world practice, this task is resolved by the scientific libraries and libraries consortia for the best financial conditions. The practice of such access organization in Russia via Russian Foundation for Basic Research and National Electronic-information Consortia (NEICON) is described. The statistics of using NEICON provided online journals by RAS staff is considered. Organizational proposals for optimal decision of the task of online access to scientific information in the situation of financial limits in RAS are suggested.

Keywords:  scientific journals; full texts; Internet; remote access; libraries; consortia

DECISION SUPPORT SYSTEMS MODELING WITH SYNERGETIC ARTIFICIAL INTELLIGENCE.

  • I. A. Kirikov   Kaliningrad Branch of Institute of Informatics Problems, Russian Academy of Sciences, baltbipiran@mail.ru
  • A. V. Kolesnikov  Kaliningrad Branch of Institute of Informatics Problems, Russian Academy of Sciences, avkolesnikov@yandex.ru
  • S. V. Listopad   Kaliningrad Branch of Institute of Informatics Problems, Russian Academy of Sciences, ser-list-post@yandex.ru

Abstract: The approach to modeling collective effects of decision support systems within the paradigm of synergetic artificial intelligence is considered. The model and the functional structure of the hybrid intelligent multiagent system for modeling decision support systems are proposed. The results of computational experiments that demonstrate a positive impact of the self-organization effect on the quality of collective decisions are presented.

Keywords: decision support computer system; hybrid intelligent multiagent system with self-organization

SEMANTICS OF ASPECT-ORIENTED MODELING OF DATA AND PROCESSES.

  • S. P. Kovalyov   Institute of Control Problems, Russian Academy of Sciences, kovalyov@nm.ru

Abstract: An approach to semantic unification of aspect-oriented programming (AOP) technologies based on formalization by means of category theory is presented. Aspect-oriented programming technology is represented as a category of formal models of aspect-oriented programs and their interconnections equipped with functor of taking aspectual structure (labeling of models by concerns). Weaving of aspect-oriented programs is formalized as certain universal construction in this category. Formal AOP technologies applicable for reducing costs at modeling data and process scenarios are defined and considered. Weaving existence condition for scenario models is stated and justified.

Keywords: aspect-oriented programming; category theory; aspect weaving

COGNITIVE INTEROPERABILITY OF EXPERT COLLABORATION IN THE TASK OF THE RUSSIAN-FRENCH PARALLEL TEXTS PROCESSING:
LINGUISTIC AND COGNITIVE ASPECTS.

  • O. S. Kozhunova   IPI RAN, kozhunovka@mail.ru

Abstract: The resources of information and communication technologies “Refillable linguistic data base on translation difficulties” and “Subject-oriented thesaurus of Russian-French parallel texts” are discussed. The resources are at the design stage and to be implemented simultaneously with the Russian-French parallel corpus of belleslettres. Apart from the functionality, linguistic and cognitive aspects of expert interaction within the task of the Russian-French parallel texts processing through cooperative efforts are considered.

Keywords:  cognitive interoperability; task of natural language processing; Russian-French parallel texts

DATA ACQUISITION SIMULATION FOR NICA EXPERIMENT.

  • V. V. Korenkov  Joint Institute for Nuclear Research, Laboratory of Information Technologies Dubna, korenkov@cv.jinr.ru
  • A. V. Nechaevskiy   Joint Institute for Nuclear Research, Laboratory of Information Technologies Dubna, Andrey.Nechaevskiy@gmail.com
  • V. V. Trofimov  Joint Institute for Nuclear Research, Laboratory of Information Technologies Dubna, trofimov@jinr.ru

Abstract: The need for simulation model of data storage and processing for NICA accelerator complex is shown. The base of the simulation model is GridSim. This paper describes an approach to simulation the dCache and network. A simple example shows the case of the model use.

Keywords:  grid technologies; grid infrastructures; data storage systems; optimization; simulation; research; development; dCache; Tier1; NICA; Grid

ESTIMATES OF THE RATE OF CONVERGENCE OF THE DISTRIBUTIONS OF SOME RANDOM SUMS TO STABLE LAWS.

  • V. Yu. Korolev   Faculty of Computational Mathematics and Cybernetics, M.V. Lomonosov Moscow State University; IPI RAN, vkorolev@cs.msu.su
  • L. M. Zaks  Department of Modeling and Mathematical Statistics, Alpha-Bank, lily.zaks@gmail.com

Abstract: Estimates are presented for the rate of convergence of the distributions of special sums of independent identically distributed random variables with finite variances to symmetric strictly stable laws. The distribution of the random index is assumed to be mixed Poisson in which the mixing distribution is a stable law concentrated on the positive half-line. The absolute constants are written out explicitly.

Keywords:  stable distribution; Berry–Esseen inequality; random sum; doubly stochastic Poisson process (Cox process); mixed Poisson distribution

UNIVERSAL METRIC THESAURUS OF RUSSIAN LANGUAGE.

  • L. A. Kuznetsov   Russian Presidential Academy of National Economy and Public Administration (Lipetsk Branch), kuznetsov.leonid48@gmail.com
  • V. F. Kuznetsova  Russian Presidential Academy of National Economy and Public Administration (Lipetsk Branch), kuznetsov.leonid48@gmail.com
  • A. V. Kapnin   Lipetsk State Technical University, gert@inbox.ru

Abstract: All Russian language available thesauri are compiled by expert groups. In the paper, the tools for automatic generating of a thesaurus are presented. The tools are based on a formal presentation of the texts explaining semantics of the words and a quantify assessment of the semantic distance between the words as a measure of their proximity. The proposed solutions allow to use the formal mathematical presentations that minimize subjectivity in assessing the proximity of the words. The solutions give an opportunity to synthesize automatic systems for evaluating the semantic proximity of the words and to solve other problems in the area of texts processing.

Keywords:  computational linguistics; universal thesaurus; metric thesaurus; semantic proximity assessment; semantic distance; information theory

APPROXIMATION OF A MULTIDIMENSIONAL DEPENDENCY BASED ON LINEAR EXPANSION IN A DICTIONARY OF PARAMETRIC FUNCTIONS.

  • M. G. Belyaev   Institute for Information Transmission Problems RAS,Moscow Institute of Physics and Technology, Datadvance LLC, belyaev@iitp.ru
  • E. V. Bunaev  Institute for Information Transmission Problems RAS,Moscow Institute of Physics and Technology, Datadvance LLC, burnaev@iitp.ru

Abstract: The problem of a multidimensional function approximation using a finite set of pairs “point”–“function value at this point” is considered. As amodel for the function, an expansion in a dictionary containing nonlinear parametric functions has been used. Several subproblems should be solved when constructing an approximation based on such model: extraction of a validation sample, initialization of parameters of the functions from the dictionary, and tuning of these parameters. Efficient methods for solving these subproblems have been suggested. Efficiency of the proposed approach is demonstrated on some problems of engineering design.

Keywords:  nonlinear approximation; parametric dictionaries

 

RUS