Search CORE

1,226 research outputs found

Deep Extreme Multi-label Learning

Author: Wang Xiangfeng
Yan Junchi
Zha Hongyuan
Zhang Wenjie
Publication venue
Publication date: 08/06/2018
Field of study

Extreme multi-label learning (XML) or classification has been a practical and important problem since the boom of big data. The main challenge lies in the exponential label space which involves

2^L

possible label sets especially when the label dimension

L

is huge, e.g., in millions for Wikipedia labels. This paper is motivated to better explore the label space by originally establishing an explicit label graph. In the meanwhile, deep learning has been widely studied and used in various classification problems including multi-label classification, however it has not been properly introduced to XML, where the label space can be as large as in millions. In this paper, we propose a practical deep embedding method for extreme multi-label classification, which harvests the ideas of non-linear embedding and graph priors-based label space modeling simultaneously. Extensive experiments on public datasets for XML show that our method performs competitive against state-of-the-art result

arXiv.org e-Print Archive

Crossref

Review of Extreme Multilabel Classification

Author: Das Shrutimoy
Dasgupta Arpan
Katyan Siddhant
Kumar Pawan
Publication venue
Publication date: 26/03/2023
Field of study

Extreme multilabel classification or XML, is an active area of interest in machine learning. Compared to traditional multilabel classification, here the number of labels is extremely large, hence, the name extreme multilabel classification. Using classical one versus all classification wont scale in this case due to large number of labels, same is true for any other classifiers. Embedding of labels as well as features into smaller label space is an essential first step. Moreover, other issues include existence of head and tail labels, where tail labels are labels which exist in relatively smaller number of given samples. The existence of tail labels creates issues during embedding. This area has invited application of wide range of approaches ranging from bit compression motivated from compressed sensing, tree based embeddings, deep learning based latent space embedding including using attention weights, linear algebra based embeddings such as SVD, clustering, hashing, to name a few. The community has come up with a useful set of metrics to identify correctly the prediction for head or tail labels.Comment: 46 pages, 13 figure

arXiv.org e-Print Archive

Hierarchical cluster guided labeling: efficient label collection for visual classification

Author: Wigness Maggie
Publication venue: Colorado State University. Libraries
Publication date: 01/01/2015
Field of study

2015 Summer.Visual classification is a core component in many visually intelligent systems. For example, recognition of objects and terrains provides perception during path planning and navigation tasks performed by autonomous agents. Supervised visual classifiers are typically trained with large sets of images to yield high classification performance. Although the collection of raw training data is easy, the required human effort to assign labels to this data is time consuming. This is particularly problematic in real-world applications with limited labeling time and resources. Techniques have emerged that are designed to help alleviate the labeling workload but suffer from several shortcomings. First, they do not generalize well to domains with limited a priori knowledge. Second, efficiency is achieved at the cost of collecting significant label noise which inhibits classifier learning or requires additional effort to remove. Finally, they introduce high latency between labeling queries, restricting real-world feasibility. This thesis addresses these shortcomings with unsupervised learning that exploits the hierarchical nature of feature patterns and semantic labels in visual data. Our hierarchical cluster guided labeling (HCGL) framework introduces a novel evaluation of hierarchical groupings to identify the most interesting changes in feature patterns. These changes help localize group selection in the hierarchy to discover and label a spectrum of visual semantics found in the data. We show that employing majority group-based labeling after selection allows HCGL to balance efficiency and label accuracy, yielding higher performing classifiers than other techniques with respect to labeling effort. Finally, we demonstrate the real-world feasibility of our labeling framework by quickly training high performing visual classifiers that aid in successful mobile robot path planning and navigation

Mountain Scholar (Digital Collections of Colorado and Wyoming)

SuRVoS: Super-Region Volume Segmentation workbench

Author: Achanta
Alun W. Ashton
Andrew P. French
Apostol
Asano
Boykov
Boykov
Breiman
Caselles
Chambolle
Chang
Cortes
Couprie
Cynthia Y. He
Duke
Elizabeth M.H. Duke
Frangakis
Friedman
Geurts
Goldstein
Grady
Hecksel
Imanol Luengo
Kass
Komodakis
Komodakis
Kremer
Li
Liu
Lucchi
Lučić
Lučić
Mark Basham
Mastronarde
Matthew C. Spink
Michele C. Darrow
Osher
Pedregosa
Pettersen
Prill
Rigort
Schindelin
Shotton
Tony Pridmore
Tsai
Tu
Vidavsky
Vincent
Vyas
Wah Chiu
Wei Dai
Wirtz
Ying Sun
Zhang
Publication venue: 'Elsevier BV'
Publication date: 27/02/2017
Field of study

Segmentation of biological volumes is a crucial step needed to fully analyse their scientific content. Not having access to convenient tools with which to segment or annotate the data means many biological volumes remain under-utilised. Automatic segmentation of biological volumes is still a very challenging research field, and current methods usually require a large amount of manually-produced training data to deliver a high-quality segmentation. However, the complex appearance of cellular features and the high variance from one sample to another, along with the time-consuming work of manually labelling complete volumes, makes the required training data very scarce or non-existent. Thus, fully automatic approaches are often infeasible for many practical applications. With the aim of unifying the segmentation power of automatic approaches with the user expertise and ability to manually annotate biological samples, we present a new workbench named SuRVoS (Super-Region Volume Segmentation). Within this software, a volume to be segmented is first partitioned into hierarchical segmentation layers (named Super-Regions) and is then interactively segmented with the user's knowledge input in the form of training annotations. SuRVoS first learns from and then extends user inputs to the rest of the volume, while using Super-Regions for quicker and easier segmentation than when using a voxel grid. These benefits are especially noticeable on noisy, low-dose, biological datasets

Nottingham ePrints

Nottingham eTheses

Crossref

Repository@Nottingham

Multilevel Weighted Support Vector Machine for Classification on Healthcare Data with Missing Values

Author: Marko Nicholas
Razzaghi Talayeh
Roderick Oleg
Safro Ilya
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 07/04/2016
Field of study

This work is motivated by the needs of predictive analytics on healthcare data as represented by Electronic Medical Records. Such data is invariably problematic: noisy, with missing entries, with imbalance in classes of interests, leading to serious bias in predictive modeling. Since standard data mining methods often produce poor performance measures, we argue for development of specialized techniques of data-preprocessing and classification. In this paper, we propose a new method to simultaneously classify large datasets and reduce the effects of missing values. It is based on a multilevel framework of the cost-sensitive SVM and the expected maximization imputation method for missing values, which relies on iterated regression analyses. We compare classification results of multilevel SVM-based algorithms on public benchmark datasets with imbalanced classes and missing values as well as real data in health applications, and show that our multilevel SVM-based method produces fast, and more accurate and robust classification results.Comment: arXiv admin note: substantial text overlap with arXiv:1503.0625

arXiv.org e-Print Archive

Directory of Open Access Journals

FigShare

Chest X-Rays Image Classification from beta-Variational Autoencoders Latent Features

Author: Chiti Arturo
Crespi Leonardo
Loiacono Daniele
Publication venue
Publication date: 01/01/2021
Field of study

Chest X-Ray (CXR) is one of the most common diagnostic techniques used in everyday clinical practice all around the world. We hereby present a work which intends to investigate and analyse the use of Deep Learning (DL) techniques to extract information from such images and allow to classify them, trying to keep our methodology as general as possible and possibly also usable in a real world scenario without much effort, in the future. To move in this direction, we trained several beta-Variational Autoencoder (beta-VAE) models on the CheXpert dataset, one of the largest publicly available collection of labeled CXR images; from these models, latent features have been extracted and used to train other Machine Learning models, able to classify the original images from the features extracted by the beta-VAE. Lastly, tree-based models have been combined together in ensemblings to improve the results without the necessity of further training or models engineering. Expecting some drop in pure performance with the respect to state of the art classification specific models, we obtained encouraging results, which show the viability of our approach and the usability of the high level features extracted by the autoencoders for classification tasks.Comment: 8 pages, 5 figure

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Politecnico di Milano

Fast global interactive volume segmentation with regional supervoxel descriptors

Author: Basham Mark
French Andrew P.
Luengo Imanol
Publication venue
Publication date: 21/03/2016
Field of study

In this paper we propose a novel approach towards fast multi-class volume segmentation that exploits supervoxels in order to reduce complexity, time and memory requirements. Current methods for biomedical image segmentation typically require either complex mathematical models with slow convergence, or expensive-to-calculate image features, which makes them non-feasible for large volumes with many objects (tens to hundreds) of different classes, as is typical in modern medical and biological datasets. Recently, graphical models such as Markov Random Fields (MRF) or Conditional Random Fields (CRF) are having a huge impact in different computer vision areas (e.g. image parsing, object detection, object recognition) as they provide global regularization for multiclass problems over an energy minimization framework. These models have yet to find impact in biomedical imaging due to complexities in training and slow inference in 3D images due to the very large number of voxels. Here, we define an interactive segmentation approach over a supervoxel space by first defining novel, robust and fast regional descriptors for supervoxels. Then, a hierarchical segmentation approach is adopted by training Contextual Extremely Random Forests in a user-defined label hierarchy where the classification output of the previous layer is used as additional features to train a new classifier to refine more detailed label information. This hierarchical model yields final class likelihoods for supervoxels which are finally refined by a MRF model for 3D segmentation. Results demonstrate the effectiveness on a challenging cryo-soft X-ray tomography dataset by segmenting cell areas with only a few user scribbles as the input for our algorithm. Further results demonstrate the effectiveness of our method to fully extract different organelles from the cell volume with another few seconds of user interaction. © (2016) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only

Nottingham ePrints

Nottingham eTheses

Repository@Nottingham