1,226 research outputs found

    Deep Extreme Multi-label Learning

    Full text link
    Extreme multi-label learning (XML) or classification has been a practical and important problem since the boom of big data. The main challenge lies in the exponential label space which involves 2L2^L possible label sets especially when the label dimension LL is huge, e.g., in millions for Wikipedia labels. This paper is motivated to better explore the label space by originally establishing an explicit label graph. In the meanwhile, deep learning has been widely studied and used in various classification problems including multi-label classification, however it has not been properly introduced to XML, where the label space can be as large as in millions. In this paper, we propose a practical deep embedding method for extreme multi-label classification, which harvests the ideas of non-linear embedding and graph priors-based label space modeling simultaneously. Extensive experiments on public datasets for XML show that our method performs competitive against state-of-the-art result

    Review of Extreme Multilabel Classification

    Full text link
    Extreme multilabel classification or XML, is an active area of interest in machine learning. Compared to traditional multilabel classification, here the number of labels is extremely large, hence, the name extreme multilabel classification. Using classical one versus all classification wont scale in this case due to large number of labels, same is true for any other classifiers. Embedding of labels as well as features into smaller label space is an essential first step. Moreover, other issues include existence of head and tail labels, where tail labels are labels which exist in relatively smaller number of given samples. The existence of tail labels creates issues during embedding. This area has invited application of wide range of approaches ranging from bit compression motivated from compressed sensing, tree based embeddings, deep learning based latent space embedding including using attention weights, linear algebra based embeddings such as SVD, clustering, hashing, to name a few. The community has come up with a useful set of metrics to identify correctly the prediction for head or tail labels.Comment: 46 pages, 13 figure

    Hierarchical cluster guided labeling: efficient label collection for visual classification

    Get PDF
    2015 Summer.Visual classification is a core component in many visually intelligent systems. For example, recognition of objects and terrains provides perception during path planning and navigation tasks performed by autonomous agents. Supervised visual classifiers are typically trained with large sets of images to yield high classification performance. Although the collection of raw training data is easy, the required human effort to assign labels to this data is time consuming. This is particularly problematic in real-world applications with limited labeling time and resources. Techniques have emerged that are designed to help alleviate the labeling workload but suffer from several shortcomings. First, they do not generalize well to domains with limited a priori knowledge. Second, efficiency is achieved at the cost of collecting significant label noise which inhibits classifier learning or requires additional effort to remove. Finally, they introduce high latency between labeling queries, restricting real-world feasibility. This thesis addresses these shortcomings with unsupervised learning that exploits the hierarchical nature of feature patterns and semantic labels in visual data. Our hierarchical cluster guided labeling (HCGL) framework introduces a novel evaluation of hierarchical groupings to identify the most interesting changes in feature patterns. These changes help localize group selection in the hierarchy to discover and label a spectrum of visual semantics found in the data. We show that employing majority group-based labeling after selection allows HCGL to balance efficiency and label accuracy, yielding higher performing classifiers than other techniques with respect to labeling effort. Finally, we demonstrate the real-world feasibility of our labeling framework by quickly training high performing visual classifiers that aid in successful mobile robot path planning and navigation

    SuRVoS: Super-Region Volume Segmentation workbench

    Get PDF
    Segmentation of biological volumes is a crucial step needed to fully analyse their scientific content. Not having access to convenient tools with which to segment or annotate the data means many biological volumes remain under-utilised. Automatic segmentation of biological volumes is still a very challenging research field, and current methods usually require a large amount of manually-produced training data to deliver a high-quality segmentation. However, the complex appearance of cellular features and the high variance from one sample to another, along with the time-consuming work of manually labelling complete volumes, makes the required training data very scarce or non-existent. Thus, fully automatic approaches are often infeasible for many practical applications. With the aim of unifying the segmentation power of automatic approaches with the user expertise and ability to manually annotate biological samples, we present a new workbench named SuRVoS (Super-Region Volume Segmentation). Within this software, a volume to be segmented is first partitioned into hierarchical segmentation layers (named Super-Regions) and is then interactively segmented with the user's knowledge input in the form of training annotations. SuRVoS first learns from and then extends user inputs to the rest of the volume, while using Super-Regions for quicker and easier segmentation than when using a voxel grid. These benefits are especially noticeable on noisy, low-dose, biological datasets

    Multilevel Weighted Support Vector Machine for Classification on Healthcare Data with Missing Values

    Full text link
    This work is motivated by the needs of predictive analytics on healthcare data as represented by Electronic Medical Records. Such data is invariably problematic: noisy, with missing entries, with imbalance in classes of interests, leading to serious bias in predictive modeling. Since standard data mining methods often produce poor performance measures, we argue for development of specialized techniques of data-preprocessing and classification. In this paper, we propose a new method to simultaneously classify large datasets and reduce the effects of missing values. It is based on a multilevel framework of the cost-sensitive SVM and the expected maximization imputation method for missing values, which relies on iterated regression analyses. We compare classification results of multilevel SVM-based algorithms on public benchmark datasets with imbalanced classes and missing values as well as real data in health applications, and show that our multilevel SVM-based method produces fast, and more accurate and robust classification results.Comment: arXiv admin note: substantial text overlap with arXiv:1503.0625

    Chest X-Rays Image Classification from beta-Variational Autoencoders Latent Features

    Get PDF
    Chest X-Ray (CXR) is one of the most common diagnostic techniques used in everyday clinical practice all around the world. We hereby present a work which intends to investigate and analyse the use of Deep Learning (DL) techniques to extract information from such images and allow to classify them, trying to keep our methodology as general as possible and possibly also usable in a real world scenario without much effort, in the future. To move in this direction, we trained several beta-Variational Autoencoder (beta-VAE) models on the CheXpert dataset, one of the largest publicly available collection of labeled CXR images; from these models, latent features have been extracted and used to train other Machine Learning models, able to classify the original images from the features extracted by the beta-VAE. Lastly, tree-based models have been combined together in ensemblings to improve the results without the necessity of further training or models engineering. Expecting some drop in pure performance with the respect to state of the art classification specific models, we obtained encouraging results, which show the viability of our approach and the usability of the high level features extracted by the autoencoders for classification tasks.Comment: 8 pages, 5 figure

    Fast global interactive volume segmentation with regional supervoxel descriptors

    Get PDF
    In this paper we propose a novel approach towards fast multi-class volume segmentation that exploits supervoxels in order to reduce complexity, time and memory requirements. Current methods for biomedical image segmentation typically require either complex mathematical models with slow convergence, or expensive-to-calculate image features, which makes them non-feasible for large volumes with many objects (tens to hundreds) of different classes, as is typical in modern medical and biological datasets. Recently, graphical models such as Markov Random Fields (MRF) or Conditional Random Fields (CRF) are having a huge impact in different computer vision areas (e.g. image parsing, object detection, object recognition) as they provide global regularization for multiclass problems over an energy minimization framework. These models have yet to find impact in biomedical imaging due to complexities in training and slow inference in 3D images due to the very large number of voxels. Here, we define an interactive segmentation approach over a supervoxel space by first defining novel, robust and fast regional descriptors for supervoxels. Then, a hierarchical segmentation approach is adopted by training Contextual Extremely Random Forests in a user-defined label hierarchy where the classification output of the previous layer is used as additional features to train a new classifier to refine more detailed label information. This hierarchical model yields final class likelihoods for supervoxels which are finally refined by a MRF model for 3D segmentation. Results demonstrate the effectiveness on a challenging cryo-soft X-ray tomography dataset by segmenting cell areas with only a few user scribbles as the input for our algorithm. Further results demonstrate the effectiveness of our method to fully extract different organelles from the cell volume with another few seconds of user interaction. © (2016) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only
    corecore