Informatics and Applications
2020, Volume 14, Issue 2, pp 4049
JOINT ASSESSMENT OF DATA PREDICTABILITY AND QUALITY PREDICTORS
 S. L. Frenkel
 V. N. Zakharov
Abstract
The paper proposes and analyzes a new approach to the selection of predictors necessary for predicting future values in data sequences in a specific time period. Our goal is lowcost implemented techniques that ensure the selection of an acceptable predictor for the current prediction session, or the decision about the impossibility of making a reliable forecast if one finds that this section of the sequence does not have the predictability property. For this, the predictability of this sequence is defined as the maximum conditional probability of the correct prediction in the set of available predictors for a given set of observed values. The selection of predictors is performed by both the magnitude of the conditional probability estimation and the degree of difference between a specific predictor and a predictor that is optimal for predicting the next outcome of the Bernoulli trials sequence.
[+] References (14)
 Buczak, L., and E. Guven. 2016. A survey of data mining and machine learning methods for cyber security intrusion detection. IEEE Commun. Surv. Tut. 18(2):1153 1176.
 Rooba, R., and V. Vallimayil. 2018. Semantic aware future page prediction based on domain. Int. J. Pure Appl. Math. 118(9):911919.
 Wolpert, D., andW Macready. 1997. No free lunch theorems for optimization. IEEE T. Evolut. Comput. 1(1):67 83.
 Ryabko, D., and B. Ryabko. 2015. Predicting the outcomes of every process for which an asymptotically accurate stationary predictor exists is impossible. IEEE Symposium (International) on Information Theory. IEEE. 12041206.
 Merhav, N., M. Feder, and M. Gutman. 1993. Some properties of sequential predictors for binary Markov sources. IEEE T. Inform. Theory 39(3):887893.
 Feder, M., and N. Merhav. 1998. Universal prediction. IEEE T. Inform. Theory 44(6):21242147.
 Hodge, V., R. Krishnan, J. Austin, J. Polak, andT. Jackson. 2014. Shortterm prediction of traffic flow using a binary neural network. Neural Comput. Appl. 25:16391655.
 EvenDar, E., M. Kearns, Y. Mansour, and J. Wortman.
2008. Regret to the best vs. regret to the average. Mach. Learn. 72:2137.
 Bass, D. 2011. Stochastic processes. Cambridge, U.K., Cambridge University Press. 392 p.
 Lavasani, A., and T. Eghlidos. 2009. Bit test for evaluating pseudorandom sequences. J. Sci. Technol. 16(1):1933.
 Nobel, A. 2004. Some stochastic properties of memoryless individual sequences. IEEE T. Inform. Theory 50:1497 1505.
 Chen, T, and C. Guestrin. 2016. XGboost: A scalable tree boosting system. arXiv.org. Available at: https://arxiv.org/pdf/1603.02754.pdf (accessed on April 15, 2020).
 Volovich, K., S. Denisov, A. Shabanov, and S. Malkovsky. 2019. Aspects of the assessment of the quality of loading hybrid highperformance computing cluster. CEUR Workshop Proceedings 2426:711.
 Frenkel, S. 2019. On a priory estimation of random sequences predictability. 6th Workshop on Computational Data Analysis and Numerical Methods Book of Abstracts. Covilha, Portugal. 109111. Available at: http://www.wcdanmubi19.uevora.pt/wp content/uploads/2019/09/BookofAbstracts.pdf (accessed June 22, 2020).
[+] About this article
Title
JOINT ASSESSMENT OF DATA PREDICTABILITY AND QUALITY PREDICTORS
Journal
Informatics and Applications
2020, Volume 14, Issue 2, pp 4049
Cover Date
20200630
DOI
10.14357/19922264200206
Print ISSN
19922264
Publisher
Institute of Informatics Problems, Russian Academy of Sciences
Additional Links
Key words
random sequences prediction; predictors; data analysis
Authors
S. L. Frenkel and V. N. Zakharov
Author Affiliations
Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 442 Vavilov Str., Moscow 119333, Russian Federation
