Title

Bayesian classification using noninformative Dirichlet priors

Date of Completion

January 1999

Keywords

Statistics|Engineering, Electronics and Electrical

Degree

Ph.D.

Abstract

In this dissertation, the Combined Bayes Test (CBT) and its average probability of error, P(e), are developed. The CBT combines training and test data to infer symbol probabilities where a Dirichlet (completely noninformative) prior is assumed for all classes. Using P( e), several results are shown based on the best quantization complexity, M* (which is related to the Hughes Phenomenon). For example, it is shown that M* increases with the training and test data. Also, it is demonstrated that the CBT outperforms a more conventional maximum likelihood (ML) based test, and the Kolmogorov-Smirnov test (KST). With this, the Bayesian Data Reduction Algorithm (BDRA) is developed. The BDRA uses P(e) (conditioned on the training data) and a “greedy” approach for reducing irrelevant features from each class, and its performance is shown to be superior to a neural network. From here, the CBT is extended to demonstrate performance when the training data of each class are mislabeled. In this case, performance is shown to degrade when mislabeling exists in the training data, and this depends on the severity of the mislabeling probabilities. However, it is also illustrated that the BDRA can be used to diminish the effect of mislabeling. Further, the BDRA is modified, using two different approaches, to classify test observations when the training data of each class contains missing feature values. In the first approach, each missing feature is assumed to be uniformly distributed over its range of values, and in the second approach the number of discrete levels for each feature is increased by one. Notice, that both methods of modeling missing features are shown to perform similarly, and they also outperform a neural network. With these results, the BDRA is applied to three problems of interest in classification. In the first problem, the BDRA is applied to training data that contains class-specific features. While in the second problem, the BDRA is used to fuse features which have been extracted from independent sonar echoes. Finally, in the last problem, the BDRA is trained and tested on the Australian Credit Card Data (ACCD). Notice, in each of these cases the BDRA is shown to improve performance over existing methods. ^