Systems and Means of Informatics
2022, Volume 32, Issue 1, pp 160167
SEARCH OF ANOMALIES IN BIG DATA
 A. A. Grusho
 N. A. Grusho
 M. I. Zabezhailo
 D. V. Smirnov
 E. E. Timonina
 S. Ya. Shorgin
Abstract
The problem of a sufficient amount of the information for identifying the search object in the big data is that the search method may, under noise conditions, skip the searched object or, conversely, point to objects that accidentally possess the features of the present searched object. The paper discusses the simple approach to estimating the solvability of the problem of searching for the required information in big data in weak assumptions about the informativity of the identification features of search objects. In the simplest case, big data consist of a set of objects, each of which is described by a set of parameters. Each parameter definition area is its own information space. Parameter values help identify the searched object and filter false objects. If there are few parameters, then unambiguous identification of the desired object is possible in stronger restrictions on the volume of big data. Since the possibility of unambiguously identifying the desired object is not known in advance, it is necessary, at least approximately, to evaluate the restrictions on the amount of big data in which it is possible to unambiguously identify the desired information.
For such estimates, it is proposed to use the limit theorems of the probability theory in the series scheme.
[+] References (12)
 Axelsson, S. 2000. The baserate fallacy and its implications for the difficulty of intrusion detection. ACM T. Inform. Syst. Se. 3(3): 186205.
 Grusho, A., N. Grusho, and E. Timonina. 2016. Detection of anomalies in non numerical data. 8th Congress (International) on Ultra Modern Telecommunications and Control Systems and Workshops Proceedings. Piscataway, NJ: IEEE. 273276. doi: 10.1109/ICUMT.2016.7765370.
 Vaughan, G. 2018. Efficient big data model selection with applications to fraud detection. Int. J. Forecasting 36(3): 11161127.
 Wan, H. Y., Y. T. Zhang, J. Zhang, and J. Tang. 2019. AMiner: Search and mining of academic social networks. Data Intelligence 1 (1):58 76. doi: 10.1162 dint_a_00006.
 Smirnov, D. V., A. A. Grusho, M.I. Zabezhailo, and E.E. Timonina. 2021. Sistema sbora i analiza informatsii iz razlichnykh istochnikov v usloviyakh Big Data [System for collecting and analyzing information from various sources in Big Data conditions]. Int. J. Open Information Technologies 9(4):6471.
 Grusho, A. A., N. A. Grusho, M.I. Zabezhailo, and E.E. Timonina. 2020. Lokalizatsiya iskhodnoy prichiny anomalii [Root cause anomaly localization]. Problemy informatsionnoy bezopasnosti. Komp'yuternye sistemy [Problems of Information Security. Computer Systems] 4:916.
 Smirnov, D. V. 2021. Metodika problemnoorientirovannogo analiza Big Data v rezhime ogranichennogo vremeni [Methodology of problemoriented Big Data analysis in limited time mode]. Int. J. Open Information Technologies 9(9):8894.
 Grusho, A. A., N. A. Grusho, M.I. Zabezhailo, D.V. Smirnov, E.E. Timonina, and S. Ya. Shorgin. 2021. Statistika i klastery v poiskakh anomal'nykh vkrapleniy v usloviyakh bol'shikh dannykh [Statistics and clusters for detection of anomalous insertions in Big Data environment]. Informatika i ee Primeneniya  Inform. Appl. 15(4):8188.
 Grusho, A. A., M.I. Zabezhailo, D.V. Smirnov, and E.E. Timonina. 2017. Model' mnozhestva informatsionnykh prostranstv v zadache poiska insaydera [The model of the set of information spaces in the problem of insider detection]. Informatika i ee Primeneniya  Inform. Appl. 11 (4): 6569.
 Feller, W. 1950. An introduction to probability theory and its applications. 2nd ed. New York, NY: John Wiley and Sons, Inc. Vol. 1. 520 p.
 Mitzenmacher, M., and E. Upfal. 2005. Chernoff bounds. Probability and computing: Randomized algorithms and probabilistic analysis. Cambridge: Cambridge University Press. 6189.
 Shiryaev, A.N. 2004. Veroyatnost' [Probability]. Moscow: MTsNMO. 521 p.
[+] About this article
Title
SEARCH OF ANOMALIES IN BIG DATA
Journal
Systems and Means of Informatics
Volume 32, Issue 1, pp 160167
Cover Date
20220510
DOI
10.14357/08696527220115
Print ISSN
08696527
Publisher
Institute of Informatics Problems, Russian Academy of Sciences
Additional Links
Key words
information security; search for anomalies; algorithms for filtering false alarms
Authors
A. A. Grusho , N. A. Grusho , M. I. Zabezhailo , D. V. Smirnov , E. E. Timonina ,
and S. Ya. Shorgin
Author Affiliations
Federal Research Center "Computer Science and Control", Russian Academy of Sciences, 442 Vavilov Str., Moscow 119333, Russian Federation
Sberbank of Russia, 19 Vavilov Str., Moscow 117999, Russian Federation
