Informatics and Applications
2014, Volume 8, Issue 2, pp 8697
OBTAINING AN AGGREGATED FORECAST OF RAILWAY FREIGHT
TRANSPORTATION USING KULLBACK–LEIBLER DISTANCE
 A. P.Motrenko
 V. V. Strijov
Abstract
This study addresses the problem of obtaining an aggregated forecast of railway freight transportation.
To improve the quality of aggregated forecast, the time series clusterization problem is solved in such a way
that the time series in each cluster belong to the same distribution. To solve the clusterization problem, it is
necessary to estimate the distance between empirical distributions of the time series. A twosample test based on
the Kullback–Leibler distance between histograms of the time series is introduced. Theoretical and experimental
research of the suggested test is provided. Also, as a demonstration, the clusterization of a set of railway time series
based on the Kullback–Leibler distance between time series is obtained.
[+] References (15)
 Val’kov, A. S., E.M. Kozhanov, M.M. Medvednikova,
and F. I. Khusainov. 2012. Neparametricheskoe prognozirovanie
zagruzhennosti sistemy zheleznodorozhnykh
uzlov po istoricheskimdannym[Nonparametric forecasting
of railroad stations occupancy according to historical
data].Mashinnoe Obuchenie i Analiz Dannykh [J.Machine
Learning and Data Analysis] 4(1):448–465.
 Val’kov, A. S., E.M. Kozhanov, A. P. Motrenko, and
F. I. Khusainov. 2013. Postroenie krosskorrelyatsionnykh
zavisimostey pri prognoze zagruzhennosti zheleznodorozhnogo
uzla [Constructing a crosscorrelation model
to forecast the utilization of a railway junction station].
Mashinnoe Obuchenie i Analiz Dannykh [J.Machine
Learning and Data Analysis]. 5(1):505–518.
 Medvednikova,M.M. 2014 (in press). Soglasovanie agregirovannykh
neparametricheskikh prognozov vremennykh
ryadov [Matching of aggeragared nonparametric
forecasts of time series]. Mashinnoe Obuchenie i Analiz
Dannykh [J. Machine Learning and Data Analysis] 8(1).
[In Russian.]
 Kullback, S. 1959. Information theory and statistics. New
York: Wiley. 395 p.
 Chernoff, H. 1952. A measure of asymptotic efficiency
for tests of a hypothesis based on the sum of observations.
Ann. Math. Stat. 4(23):493–655
 Kolmogorov, A.N. 1965. On the approximation of distributions
of sums of independent summands by infinitely
divisible distributions. Contributions to statistics. Oxford:
Pergamon Press. P. 158–174.
 Ali, S.M., and S.D. Silvey. 1966. A general class of coefficients
of divergence of a distribution from another. J. R.
Stat. Soc. Series B (Methodoligical) 1(28):131–142.
 Csiszar, I., and P. Shields. 2004. Information theory and
statistics: A tutorial. Foundations Trend Comm. Inform.
Theory 4:417–528.
 Gibbs, A. L., and F. E. Su. 2002. On choosing and bounding
probability metrics. Intern. Stat. Rev. 3(70):419–435.
 Mallows, C. 1972. A note on asymptotic joint normality.
Ann. Math. Stat. 42(2):508–515.
 Irpino, A., R. Verde, and Y. Lechevallier. 2006. Dynamic
clustering of histograms using Wasserstein metric.
COMPSTAT. 869–876.
 Dvoenko, S.D. 1999.Neierarkhicheskiy divizimnyy algoritmklasterizatsii
[Nonhierarchical divisible clasterization
algorithm]. Avtomatika i Telemekhanika [Automation and
Remote Control] 4:117–123.
 Strizhov, V. V., M. P. Kuznetsov, and K. V. Rudakov.
2012. Metricheskaya klasterizatsiya posledovatel’nostey
aminokislotnykh ostatkov v rangovykh shkalakh [Metric
clustering of sequences of amino acid residues in
rank scales]. Matematicheskaya Biologiya i Bioinformatika
[Mathematical Biology and Bioinformatics] 7(1):345–
359.
 Dvoenko, S.D., and D.O Pshenichnyy. 2013. O metricheskoy
korrektsiimatrits parnykh sravneniy [Onmetric
correction of matrices of pairwise comparisons].Mashinnoe
Obuchenie i AnalizDannykh [J.Machine Learning and
Data Analysis] 5(1):606–620.
 Motrenko, A. P. 2014. Statisticheskiy test dlya proverki
gipotezy o prinadlezhnosti dvukh vyborok odnomu
raspredeleniyu na osnove rasstoniya Kul’baka–Leyblera
[A statistical test for the twosampe problem based on
the Kullback–Leibler distance]. http://sourceforge.net/
p/mlalgorithms/code/HEAD/tree/Group874/
Motrenko2014KL/code/KLtest.m.
[+] About this article
Title
OBTAINING AN AGGREGATED FORECAST OF RAILWAY FREIGHT
TRANSPORTATION USING KULLBACK–LEIBLER DISTANCE
Journal
Informatics and Applications
2014, Volume 8, Issue 2, pp 8697
Cover Date
20140331
DOI
10.14357/19922264140209
Print ISSN
19922264
Publisher
Institute of Informatics Problems, Russian Academy of Sciences
Additional Links
Key words
empirical distribution function; distance between histograms; Kullback–Leibler distance; twosample
problem
Authors
A. P. Motrenko and V.V. Strijov
Author Affiliations
Moscow Institute of Physics and Technology, 9 Institutskiy per., Dolgoprudny, Moscow Region 141700, Russian
Federation
Dorodnicyn Computing Centre, Russian Academy of Sciences, 40 Vavilov Str.,Moscow 119333, Russian
Federation
