Informatics and Applications

2016, Volume 10, Issue 3, pp 32-40


  • M. P. Krivenko


The paper considers the problem of feature selection for classification and issues related to the assessment of the quality of the solutions. Among the different methods of feature selection, attention is paid to sequential procedures; the probability of the correct classification is used to measure the quality of the classification. To evaluate this indicator, it is proposed to use cross-validation and the bootstrap method. At the same time, to investigate the set of sample values of probability of the correct classification, it is suggested to use comparative analysis of confidence intervals and the test for homogeneity of binomial proportions. While constructing Bayesian classifier as the data model mixture of normal distributions is adopted, the model parameters are estimated by the expectation-maximization algorithm. As an experiment, the paper considers the problem of well-thoughtout choice of classification characteristics when predicting the type of urinary stones in urology. It is demonstrated that the set of used features can be reduced not only without losing the quality of decisions, but also with increase of probability of correct prediction of the stone type.

