8 research outputs found
Hyperbolic Interaction Model For Hierarchical Multi-Label Classification
Different from the traditional classification tasks which assume mutual
exclusion of labels, hierarchical multi-label classification (HMLC) aims to
assign multiple labels to every instance with the labels organized under
hierarchical relations. Besides the labels, since linguistic ontologies are
intrinsic hierarchies, the conceptual relations between words can also form
hierarchical structures. Thus it can be a challenge to learn mappings from word
hierarchies to label hierarchies. We propose to model the word and label
hierarchies by embedding them jointly in the hyperbolic space. The main reason
is that the tree-likeness of the hyperbolic space matches the complexity of
symbolic data with hierarchical structures. A new Hyperbolic Interaction Model
(HyperIM) is designed to learn the label-aware document representations and
make predictions for HMLC. Extensive experiments are conducted on three
benchmark datasets. The results have demonstrated that the new model can
realistically capture the complex data structures and further improve the
performance for HMLC comparing with the state-of-the-art methods. To facilitate
future research, our code is publicly available
Toward a New Approach in Fruit Recognition using Hybrid RGBD Features and Fruit Hierarchy Property
We present hierarchical multi-feature classification (HMC) system for multiclass fruit recognition problem. Our approach to HMC exploits the advantages of combining multimodal features and the fruit hierarchy property. In the construction of hybrid features, we take the advantage of using color feature in the fruit recognition problem and combine it with 3D shape feature of depth channel of RGBD (Red, Green, Blue, Depth) images. Meanwhile, given a set of fruit species and variety, with a preexisting hierarchy among them, we consider the problem of assigning images to one of these fruit variety from the point of view of a hierarchy. We report on computational experiment using this approach. We show that the use of hierarchy structure along with hybrid RGBD features can improve the classification performance
A MEDICAL X-RAY IMAGE CLASSIFICATION AND RETRIEVAL SYSTEM
Medical image retrieval systems have gained high interest in the scientific community due to the advances in medical imaging technologies. The semantic gap is one of the biggest challenges in retrieval from large medical databases. This paper presents a retrieval system that aims at addressing this challenge by learning the main concept of every image in the medical database. The proposed system contains two modules: a classification/annotation and a retrieval module. The first module aims at classifying and subsequently annotating all medical images automatically. SIFT (Scale Invariant Feature Transform) and LBP (Local Binary Patterns) are two descriptors used in this process. Image-based and patch-based features are used as approaches to build a bag of words (BoW) using these descriptors. The impact on the classification performance is also evaluated. The results show that the classification accuracy obtained incorporating image-based integration techniques is higher than the accuracy obtained by other techniques. The retrieval module enables the search based on text, visual and multimodal queries. The text-based query supports retrieval of medical images based on categories, as it is carried out via the category that the images were annotated with, within the classification module. The multimodal query applies a late fusion technique on the retrieval results obtained from text-based and image-based queries. This fusion is used to enhance the retrieval performance by incorporating the advantages of both text-based and content-based image retrieval
XMIAR: X-ray medical image annotation and retrieval
The huge development of the digitized medical image has been steered
to the enlargement and research of the Content Based Image Retrieval (CBIR)
systems. Those systems retrieve and extract the images by their own low level
features, like texture, shape and color. But those visual features did not aloe the
users to request images by the semantic meanings. The image annotation or
classification systems can be considered as the solution for the limitations of the
CBIR, and to reduce the semantic gap, this has been aimed annotating or to make
the classification of the image with few controlled keywords. In this paper, we
suggest a new hierarchal classification for the X-ray medical image using the
machine learning techniques, which are called the Support Vector Machine
(SVM) and k-Nearest Neighbour (k-NN). Hierarchy classification design was
proposed based on the main body region. Evaluation was conducted based on
ImageCLEF2005 database. The obtained results in this research were improved
compared to the previous related studies
Aprendizado ativo com aplicações ao diagnóstico de parasitos
Orientadores: Alexandre Xavier Falcão, Pedro Jussieu de RezendeTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Conjuntos de imagens têm crescido consideravelmente com o rápido avanço de inúmeras tecnologias de imagens, demandando soluções urgentes para o processamento, organização e recuperação da informação. O processamento, neste caso, objetiva anotar uma dada imagem atribuindo-na um rótulo que representa seu conteúdo semântico. A anotação é crucial para a organizaçao e recuperação efetiva da informação relacionada às imagens. No entanto, a anotação manual é inviável em grandes conjuntos de dados. Além disso, a anotação automática bem sucedida por um classificador de padrões depende fortemente da qualidade de um conjunto de treinamento reduzido. Técnicas de aprendizado ativo têm sido propostas para selecionar, a partir de um grande conjunto, amostras de treinamento representativas, com uma sugestão de rótulo que pode ser confirmado ou corrigido pelo especialista. Apesar disso, essas técnicas muitas vezes ignoram a necessidade de tempos de resposta interativos durante o processo de aprendizado ativo. Portanto, esta tese de doutorado apresenta métodos de aprendizado ativo que podem reduzir e/ou organizar um grande conjunto de dados, tal que a fase de seleção não requer reprocessá-lo inteiramente a cada iteração do aprendizado. Além disso, tal seleção pode ser interrompida quando o número de amostras desejadas, a partir do conjunto de dados reduzido e organizado, é identificado. Os métodos propostos mostram um progresso cada vez maior, primeiro apenas com a redução de dados, e em seguida com a subsequente organização do conjunto reduzido. Esta tese também aborda um problema real --- o diagnóstico de parasitos --- em que a existência de uma classe diversa (isto é, uma classe de impureza), com tamanho muito maior e amostras que são similares a alguns tipos de parasitos, torna a redução de dados consideravelmente menos eficaz. Este problema é finalmente contornado com um tipo de organização de dados diferente, que ainda permite tempos de resposta interativos e produz uma abordagem de aprendizado ativo melhor e robusta para o diagnóstico de parasitos. Os métodos desenvolvidos foram extensivamente avaliados com diferentes tipos de classificadores supervisionados e não-supervisionados utilizando conjunto de dados a partir de aplicações distintas e abordagens baselines que baseiam-se em seleção aleatória de amostras e/ou reprocessamento de todo o conjunto de dados a cada iteração do aprendizado. Por fim, esta tese demonstra que outras melhorias são obtidas com o aprendizado semi-supervisionadoAbstract: Image datasets have grown large with the fast advances and varieties of the imaging technologies, demanding urgent solutions for information processing, organization, and retrieval. Processing here aims to annotate the image by assigning to it a label that represents its semantic content. Annotation is crucial for the effective organization and retrieval of the information related to the images. However, manual annotation is unfeasible in large datasets and successful automatic annotation by a pattern classifier strongly depends on the quality of a much smaller training set. Active learning techniques have been proposed to select those representative training samples from the large dataset with a label suggestion, which can be either confirmed or corrected by the expert. Nevertheless, these techniques very often ignore the need for interactive response times during the active learning process. Therefore, this PhD thesis presents active learning methods that can reduce and/or organize the large dataset such that sample selection does not require to reprocess it entirely at every learning iteration. Moreover, it can be interrupted as soon as a desired number of samples from the reduced and organized dataset is identified. These methods show an increasing progress, first with data reduction only, and then with subsequent organization of the reduced dataset. However, the thesis also addresses a real problem --- the diagnosis of parasites --- in which the existence of a diverse class (i.e., the impurity class), with much larger size and samples that are similar to some types of parasites, makes data reduction considerably less effective. The problem is finally circumvented with a different type of data organization, which still allows interactive response times and yields a better and robust active learning approach for the diagnosis of parasites. The methods have been extensively assessed with different types of unsupervised and supervised classifiers using datasets from distinct applications and baseline approaches that rely on random sample selection and/or reprocess the entire dataset at each learning iteration. Finally, the thesis demonstrates that further improvements are obtained with semi-supervised learningDoutoradoCiência da ComputaçãoDoutora em Ciência da Computaçã
Analyzing Granger causality in climate data with time series classification methods
Attribution studies in climate science aim for scientifically ascertaining the influence of climatic variations on natural or anthropogenic factors. Many of those studies adopt the concept of Granger causality to infer statistical cause-effect relationships, while utilizing traditional autoregressive models. In this article, we investigate the potential of state-of-the-art time series classification techniques to enhance causal inference in climate science. We conduct a comparative experimental study of different types of algorithms on a large test suite that comprises a unique collection of datasets from the area of climate-vegetation dynamics. The results indicate that specialized time series classification methods are able to improve existing inference procedures. Substantial differences are observed among the methods that were tested