254,935 research outputs found

    Evaluation Measures for Hierarchical Classification: a unified view and novel approaches

    Full text link
    Hierarchical classification addresses the problem of classifying items into a hierarchy of classes. An important issue in hierarchical classification is the evaluation of different classification algorithms, which is complicated by the hierarchical relations among the classes. Several evaluation measures have been proposed for hierarchical classification using the hierarchy in different ways. This paper studies the problem of evaluation in hierarchical classification by analyzing and abstracting the key components of the existing performance measures. It also proposes two alternative generic views of hierarchical evaluation and introduces two corresponding novel measures. The proposed measures, along with the state-of-the art ones, are empirically tested on three large datasets from the domain of text classification. The empirical results illustrate the undesirable behavior of existing approaches and how the proposed methods overcome most of these methods across a range of cases.Comment: Submitted to journa

    An Empirical Study on Budget-Aware Online Kernel Algorithms for Streams of Graphs

    Full text link
    Kernel methods are considered an effective technique for on-line learning. Many approaches have been developed for compactly representing the dual solution of a kernel method when the problem imposes memory constraints. However, in literature no work is specifically tailored to streams of graphs. Motivated by the fact that the size of the feature space representation of many state-of-the-art graph kernels is relatively small and thus it is explicitly computable, we study whether executing kernel algorithms in the feature space can be more effective than the classical dual approach. We study three different algorithms and various strategies for managing the budget. Efficiency and efficacy of the proposed approaches are experimentally assessed on relatively large graph streams exhibiting concept drift. It turns out that, when strict memory budget constraints have to be enforced, working in feature space, given the current state of the art on graph kernels, is more than a viable alternative to dual approaches, both in terms of speed and classification performance.Comment: Author's version of the manuscript, to appear in Neurocomputing (ELSEVIER

    Machine Learning Classification of Digitally Modulated Signals

    Get PDF
    Automatic classification of digitally modulated signals is a challenging problem that has traditionally been approached using signal processing tools such as log-likelihood algorithms for signal classification or cyclostationary signal analysis. These approaches are computationally intensive and cumbersome in general, and in recent years alternative approaches that use machine learning have been presented in the literature for automatic classification of digitally modulated signals. This thesis studies deep learning approaches for classifying digitally modulated signals that use deep artificial neural networks in conjunction with the canonical representation of digitally modulated signals in terms of in-phase and quadrature components. Specifically, capsule networks are trained to recognize common types of PSK and QAM digital modulation schemes, and their classification performance is tested on two distinct datasets that are publicly available. Results show that capsule networks outperform convolutional neural networks and residual networks, which have been used previously to classify signals in the same datasets, and indicate that they are a meaningful alternative for machine learning approaches to digitally modulated signal classification. The thesis includes also a discussion of practical implementations of the proposed capsule networks in an FPGA-powered embedded system

    Random Prism: a noise-tolerant alternative to Random Forests

    Get PDF
    Ensemble learning can be used to increase the overall classification accuracy of a classifier by generating multiple base classifiers and combining their classification results. A frequently used family of base classifiers for ensemble learning are decision trees. However, alternative approaches can potentially be used, such as the Prism family of algorithms that also induces classification rules. Compared with decision trees, Prism algorithms generate modular classification rules that cannot necessarily be represented in the form of a decision tree. Prism algorithms produce a similar classification accuracy compared with decision trees. However, in some cases, for example, if there is noise in the training and test data, Prism algorithms can outperform decision trees by achieving a higher classification accuracy. However, Prism still tends to overfit on noisy data; hence, ensemble learners have been adopted in this work to reduce the overfitting. This paper describes the development of an ensemble learner using a member of the Prism family as the base classifier to reduce the overfitting of Prism algorithms on noisy datasets. The developed ensemble classifier is compared with a stand-alone Prism classifier in terms of classification accuracy and resistance to noise

    Finding groups in data: Cluster analysis with ants

    Get PDF
    Wepresent in this paper a modification of Lumer and Faieta’s algorithm for data clustering. This approach mimics the clustering behavior observed in real ant colonies. This algorithm discovers automatically clusters in numerical data without prior knowledge of possible number of clusters. In this paper we focus on ant-based clustering algorithms, a particular kind of a swarm intelligent system, and on the effects on the final clustering by using during the classification differentmetrics of dissimilarity: Euclidean, Cosine, and Gower measures. Clustering with swarm-based algorithms is emerging as an alternative to more conventional clustering methods, such as e.g. k-means, etc. Among the many bio-inspired techniques, ant clustering algorithms have received special attention, especially because they still require much investigation to improve performance, stability and other key features that would make such algorithms mature tools for data mining. As a case study, this paper focus on the behavior of clustering procedures in those new approaches. The proposed algorithm and its modifications are evaluated in a number of well-known benchmark datasets. Empirical results clearly show that ant-based clustering algorithms performs well when compared to another techniques

    How is a data-driven approach better than random choice in label space division for multi-label classification?

    Full text link
    We propose using five data-driven community detection approaches from social networks to partition the label space for the task of multi-label classification as an alternative to random partitioning into equal subsets as performed by RAkELd: modularity-maximizing fastgreedy and leading eigenvector, infomap, walktrap and label propagation algorithms. We construct a label co-occurence graph (both weighted an unweighted versions) based on training data and perform community detection to partition the label set. We include Binary Relevance and Label Powerset classification methods for comparison. We use gini-index based Decision Trees as the base classifier. We compare educated approaches to label space divisions against random baselines on 12 benchmark data sets over five evaluation measures. We show that in almost all cases seven educated guess approaches are more likely to outperform RAkELd than otherwise in all measures, but Hamming Loss. We show that fastgreedy and walktrap community detection methods on weighted label co-occurence graphs are 85-92% more likely to yield better F1 scores than random partitioning. Infomap on the unweighted label co-occurence graphs is on average 90% of the times better than random paritioning in terms of Subset Accuracy and 89% when it comes to Jaccard similarity. Weighted fastgreedy is better on average than RAkELd when it comes to Hamming Loss

    A Review of Classification Problems and Algorithms in Renewable Energy Applications

    Get PDF
    Classification problems and their corresponding solving approaches constitute one of the fields of machine learning. The application of classification schemes in Renewable Energy (RE) has gained significant attention in the last few years, contributing to the deployment, management and optimization of RE systems. The main objective of this paper is to review the most important classification algorithms applied to RE problems, including both classical and novel algorithms. The paper also provides a comprehensive literature review and discussion on different classification techniques in specific RE problems, including wind speed/power prediction, fault diagnosis in RE systems, power quality disturbance classification and other applications in alternative RE systems. In this way, the paper describes classification techniques and metrics applied to RE problems, thus being useful both for researchers dealing with this kind of problem and for practitioners of the field

    Survey on Combinatorial Register Allocation and Instruction Scheduling

    Full text link
    Register allocation (mapping variables to processor registers or memory) and instruction scheduling (reordering instructions to increase instruction-level parallelism) are essential tasks for generating efficient assembly code in a compiler. In the last three decades, combinatorial optimization has emerged as an alternative to traditional, heuristic algorithms for these two tasks. Combinatorial optimization approaches can deliver optimal solutions according to a model, can precisely capture trade-offs between conflicting decisions, and are more flexible at the expense of increased compilation time. This paper provides an exhaustive literature review and a classification of combinatorial optimization approaches to register allocation and instruction scheduling, with a focus on the techniques that are most applied in this context: integer programming, constraint programming, partitioned Boolean quadratic programming, and enumeration. Researchers in compilers and combinatorial optimization can benefit from identifying developments, trends, and challenges in the area; compiler practitioners may discern opportunities and grasp the potential benefit of applying combinatorial optimization

    Gibbs Max-margin Topic Models with Data Augmentation

    Full text link
    Max-margin learning is a powerful approach to building classifiers and structured output predictors. Recent work on max-margin supervised topic models has successfully integrated it with Bayesian topic models to discover discriminative latent semantic structures and make accurate predictions for unseen testing data. However, the resulting learning problems are usually hard to solve because of the non-smoothness of the margin loss. Existing approaches to building max-margin supervised topic models rely on an iterative procedure to solve multiple latent SVM subproblems with additional mean-field assumptions on the desired posterior distributions. This paper presents an alternative approach by defining a new max-margin loss. Namely, we present Gibbs max-margin supervised topic models, a latent variable Gibbs classifier to discover hidden topic representations for various tasks, including classification, regression and multi-task learning. Gibbs max-margin supervised topic models minimize an expected margin loss, which is an upper bound of the existing margin loss derived from an expected prediction rule. By introducing augmented variables and integrating out the Dirichlet variables analytically by conjugacy, we develop simple Gibbs sampling algorithms with no restricting assumptions and no need to solve SVM subproblems. Furthermore, each step of the "augment-and-collapse" Gibbs sampling algorithms has an analytical conditional distribution, from which samples can be easily drawn. Experimental results demonstrate significant improvements on time efficiency. The classification performance is also significantly improved over competitors on binary, multi-class and multi-label classification tasks.Comment: 35 page

    Adaptive Learning for the Resource-Constrained Classification Problem

    Full text link
    Resource-constrained classification tasks are common in real-world applications such as allocating tests for disease diagnosis, hiring decisions when filling a limited number of positions, and defect detection in manufacturing settings under a limited inspection budget. Typical classification algorithms treat the learning process and the resource constraints as two separate and sequential tasks. Here we design an adaptive learning approach that considers resource constraints and learning jointly by iteratively fine-tuning misclassification costs. Via a structured experimental study using a publicly available data set, we evaluate a decision tree classifier that utilizes the proposed approach. The adaptive learning approach performs significantly better than alternative approaches, especially for difficult classification problems in which the performance of common approaches may be unsatisfactory. We envision the adaptive learning approach as an important addition to the repertoire of techniques for handling resource-constrained classification problems
    • 

    corecore