1,226 research outputs found
Deep Extreme Multi-label Learning
Extreme multi-label learning (XML) or classification has been a practical and
important problem since the boom of big data. The main challenge lies in the
exponential label space which involves possible label sets especially
when the label dimension is huge, e.g., in millions for Wikipedia labels.
This paper is motivated to better explore the label space by originally
establishing an explicit label graph. In the meanwhile, deep learning has been
widely studied and used in various classification problems including
multi-label classification, however it has not been properly introduced to XML,
where the label space can be as large as in millions. In this paper, we propose
a practical deep embedding method for extreme multi-label classification, which
harvests the ideas of non-linear embedding and graph priors-based label space
modeling simultaneously. Extensive experiments on public datasets for XML show
that our method performs competitive against state-of-the-art result
Review of Extreme Multilabel Classification
Extreme multilabel classification or XML, is an active area of interest in
machine learning. Compared to traditional multilabel classification, here the
number of labels is extremely large, hence, the name extreme multilabel
classification. Using classical one versus all classification wont scale in
this case due to large number of labels, same is true for any other
classifiers. Embedding of labels as well as features into smaller label space
is an essential first step. Moreover, other issues include existence of head
and tail labels, where tail labels are labels which exist in relatively smaller
number of given samples. The existence of tail labels creates issues during
embedding. This area has invited application of wide range of approaches
ranging from bit compression motivated from compressed sensing, tree based
embeddings, deep learning based latent space embedding including using
attention weights, linear algebra based embeddings such as SVD, clustering,
hashing, to name a few. The community has come up with a useful set of metrics
to identify correctly the prediction for head or tail labels.Comment: 46 pages, 13 figure
Hierarchical cluster guided labeling: efficient label collection for visual classification
2015 Summer.Visual classification is a core component in many visually intelligent systems. For example, recognition of objects and terrains provides perception during path planning and navigation tasks performed by autonomous agents. Supervised visual classifiers are typically trained with large sets of images to yield high classification performance. Although the collection of raw training data is easy, the required human effort to assign labels to this data is time consuming. This is particularly problematic in real-world applications with limited labeling time and resources. Techniques have emerged that are designed to help alleviate the labeling workload but suffer from several shortcomings. First, they do not generalize well to domains with limited a priori knowledge. Second, efficiency is achieved at the cost of collecting significant label noise which inhibits classifier learning or requires additional effort to remove. Finally, they introduce high latency between labeling queries, restricting real-world feasibility. This thesis addresses these shortcomings with unsupervised learning that exploits the hierarchical nature of feature patterns and semantic labels in visual data. Our hierarchical cluster guided labeling (HCGL) framework introduces a novel evaluation of hierarchical groupings to identify the most interesting changes in feature patterns. These changes help localize group selection in the hierarchy to discover and label a spectrum of visual semantics found in the data. We show that employing majority group-based labeling after selection allows HCGL to balance efficiency and label accuracy, yielding higher performing classifiers than other techniques with respect to labeling effort. Finally, we demonstrate the real-world feasibility of our labeling framework by quickly training high performing visual classifiers that aid in successful mobile robot path planning and navigation
SuRVoS: Super-Region Volume Segmentation workbench
Segmentation of biological volumes is a crucial step needed to fully analyse their scientific content. Not having access to convenient tools with which to segment or annotate the data means many biological volumes remain under-utilised. Automatic segmentation of biological volumes is still a very challenging research field, and current methods usually require a large amount of manually-produced training data to deliver a high-quality segmentation. However, the complex appearance of cellular features and the high variance from one sample to another, along with the time-consuming work of manually labelling complete volumes, makes the required training data very scarce or non-existent. Thus, fully automatic approaches are often infeasible for many practical applications. With the aim of unifying the segmentation power of automatic approaches with the user expertise and ability to manually annotate biological samples, we present a new workbench named SuRVoS (Super-Region Volume Segmentation). Within this software, a volume to be segmented is first partitioned into hierarchical segmentation layers (named Super-Regions) and is then interactively segmented with the user's knowledge input in the form of training annotations. SuRVoS first learns from and then extends user inputs to the rest of the volume, while using Super-Regions for quicker and easier segmentation than when using a voxel grid. These benefits are especially noticeable on noisy, low-dose, biological datasets
Multilevel Weighted Support Vector Machine for Classification on Healthcare Data with Missing Values
This work is motivated by the needs of predictive analytics on healthcare
data as represented by Electronic Medical Records. Such data is invariably
problematic: noisy, with missing entries, with imbalance in classes of
interests, leading to serious bias in predictive modeling. Since standard data
mining methods often produce poor performance measures, we argue for
development of specialized techniques of data-preprocessing and classification.
In this paper, we propose a new method to simultaneously classify large
datasets and reduce the effects of missing values. It is based on a multilevel
framework of the cost-sensitive SVM and the expected maximization imputation
method for missing values, which relies on iterated regression analyses. We
compare classification results of multilevel SVM-based algorithms on public
benchmark datasets with imbalanced classes and missing values as well as real
data in health applications, and show that our multilevel SVM-based method
produces fast, and more accurate and robust classification results.Comment: arXiv admin note: substantial text overlap with arXiv:1503.0625
Chest X-Rays Image Classification from beta-Variational Autoencoders Latent Features
Chest X-Ray (CXR) is one of the most common diagnostic techniques used in
everyday clinical practice all around the world. We hereby present a work which
intends to investigate and analyse the use of Deep Learning (DL) techniques to
extract information from such images and allow to classify them, trying to keep
our methodology as general as possible and possibly also usable in a real world
scenario without much effort, in the future. To move in this direction, we
trained several beta-Variational Autoencoder (beta-VAE) models on the CheXpert
dataset, one of the largest publicly available collection of labeled CXR
images; from these models, latent features have been extracted and used to
train other Machine Learning models, able to classify the original images from
the features extracted by the beta-VAE. Lastly, tree-based models have been
combined together in ensemblings to improve the results without the necessity
of further training or models engineering. Expecting some drop in pure
performance with the respect to state of the art classification specific
models, we obtained encouraging results, which show the viability of our
approach and the usability of the high level features extracted by the
autoencoders for classification tasks.Comment: 8 pages, 5 figure
Fast global interactive volume segmentation with regional supervoxel descriptors
In this paper we propose a novel approach towards fast multi-class volume segmentation that exploits supervoxels in order to reduce complexity, time and memory requirements. Current methods for biomedical image segmentation typically require either complex mathematical models with slow convergence, or expensive-to-calculate image features, which makes them non-feasible for large volumes with many objects (tens to hundreds) of different classes, as is typical in modern medical and biological datasets. Recently, graphical models such as Markov Random Fields (MRF) or Conditional Random Fields (CRF) are having a huge impact in different computer vision areas (e.g. image parsing, object detection, object recognition) as they provide global regularization for multiclass problems over an energy minimization framework. These models have yet to find impact in biomedical imaging due to complexities in training and slow inference in 3D images due to the very large number of voxels. Here, we define an interactive segmentation approach over a supervoxel space by first defining novel, robust and fast regional descriptors for supervoxels. Then, a hierarchical segmentation approach is adopted by training Contextual Extremely Random Forests in a user-defined label hierarchy where the classification output of the previous layer is used as additional features to train a new classifier to refine more detailed label information. This hierarchical model yields final class likelihoods for supervoxels which are finally refined by a MRF model for 3D segmentation. Results demonstrate the effectiveness on a challenging cryo-soft X-ray tomography dataset by segmenting cell areas with only a few user scribbles as the input for our algorithm. Further results demonstrate the effectiveness of our method to fully extract different organelles from the cell volume with another few seconds of user interaction. © (2016) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only
- …