1,596 research outputs found

    A study of hierarchical and flat classification of proteins

    Get PDF
    Automatic classification of proteins using machine learning is an important problem that has received significant attention in the literature. One feature of this problem is that expert-defined hierarchies of protein classes exist and can potentially be exploited to improve classification performance. In this article we investigate empirically whether this is the case for two such hierarchies. We compare multi-class classification techniques that exploit the information in those class hierarchies and those that do not, using logistic regression, decision trees, bagged decision trees, and support vector machines as the underlying base learners. In particular, we compare hierarchical and flat variants of ensembles of nested dichotomies. The latter have been shown to deliver strong classification performance in multi-class settings. We present experimental results for synthetic, fold recognition, enzyme classification, and remote homology detection data. Our results show that exploiting the class hierarchy improves performance on the synthetic data, but not in the case of the protein classification problems. Based on this we recommend that strong flat multi-class methods be used as a baseline to establish the benefit of exploiting class hierarchies in this area

    Improving Classification When a Class Hierarchy is Available Using a Hierarchy-Based Prior

    Full text link
    We introduce a new method for building classification models when we have prior knowledge of how the classes can be arranged in a hierarchy, based on how easily they can be distinguished. The new method uses a Bayesian form of the multinomial logit (MNL, a.k.a. ``softmax'') model, with a prior that introduces correlations between the parameters for classes that are nearby in the tree. We compare the performance on simulated data of the new method, the ordinary MNL model, and a model that uses the hierarchy in different way. We also test the new method on a document labelling problem, and find that it performs better than the other methods, particularly when the amount of training data is small

    Classification in the Presence of Ordered Classes and Weighted Evaluative Attributes

    Get PDF
    We are interested in an important family of problems in the interface of the Multi-Attribute Decision-Making and Data Mining fields. This is a special case of the general classification problem, in which records describing entities of interest have been expressed in terms of a number of evaluative attributes. These attributes are associated with weights of importance, and both the data and the classes are ordinal. Our goal is to use historical records and the corresponding decisions to best estimate the class values of new data points in a way consistent with prior classification decisions, without knowledge of the weights of the evaluative attributes. We study three variants of this problem. The first is when all decisions are consistent with a single set of attribute weights (called the separable case.) The second is when all decisions are consistent, but involve two sets of attribute weights corresponding to two decision makers, who determine the classification of the data together (called the two-plane separable case.) The third is when there is some inconsistency in the set of weights that must be accounted for (called the non-separable case.) Furthermore, we examine 2-class problems and also multiple class problems. We propose the Ordinal Boundary method, which has a significant advantage over traditional approaches in multi-class problems. Linear programming (optimization) based approaches provide a promising avenue for dealing with these problems effectively. We present computational results that support this argument

    Automated Machine Learning for Multi-Label Classification

    Get PDF

    Using domain knowledge for interpretable and competitive multi-class human activity recognition

    Get PDF
    Human activity recognition (HAR) has become an increasingly popular application of machine learning across a range of domains. Typically the HAR task that a machine learning algorithm is trained for requires separating multiple activities such as walking, running, sitting, and falling from each other. Despite a large body of work on multi-class HAR, and the well-known fact that the performance on a multi-class problem can be significantly affected by how it is decomposed into a set of binary problems, there has been little research into how the choice of multi-class decomposition method affects the performance of HAR systems. This paper presents the first empirical comparison of multi-class decomposition methods in a HAR context by estimating the performance of five machine learning algorithms when used in their multi-class formulation, with four popular multi-class decomposition methods, five expert hierarchies—nested dichotomies constructed from domain knowledge—or an ensemble of expert hierarchies on a 17-class HAR data-set which consists of features extracted from tri-axial accelerometer and gyroscope signals. We further compare performance on two binary classification problems, each based on the topmost dichotomy of an expert hierarchy. The results show that expert hierarchies can indeed compete with one-vs-all, both on the original multi-class problem and on a more general binary classification problem, such as that induced by an expert hierarchy’s topmost dichotomy. Finally, we show that an ensemble of expert hierarchies performs better than one-vs-all and comparably to one-vs-one, despite being of lower time and space complexity, on the multi-class problem, and outperforms all other multi-class decomposition methods on the two dichotomous problems

    Evaluation of random forests on large-scale classification problems using a bag-of-visual-words representation

    Get PDF
    Random Forest is a very efficient classification method that has shown success in tasks like image segmentation or object detection, but has not been applied yet in large-scale image classification scenarios using a Bag-of-Visual-Words representation. In this work we evaluate the performance of Random Forest on the ImageNet dataset, and compare it to standard approaches in the state-of-the-art.Peer ReviewedPostprint (author’s final draft
    corecore