9 research outputs found
Evaluation of Joint Multi-Instance Multi-Label Learning For Breast Cancer Diagnosis
Multi-instance multi-label (MIML) learning is a challenging problem in many
aspects. Such learning approaches might be useful for many medical diagnosis
applications including breast cancer detection and classification. In this
study subset of digiPATH dataset (whole slide digital breast cancer
histopathology images) are used for training and evaluation of six
state-of-the-art MIML methods.
At the end, performance comparison of these approaches are given by means of
effective evaluation metrics. It is shown that MIML-kNN achieve the best
performance that is %65.3 average precision, where most of other methods attain
acceptable results as well
Multi-Label Classifier Chains for Bird Sound
Bird sound data collected with unattended microphones for automatic surveys,
or mobile devices for citizen science, typically contain multiple
simultaneously vocalizing birds of different species. However, few works have
considered the multi-label structure in birdsong. We propose to use an ensemble
of classifier chains combined with a histogram-of-segments representation for
multi-label classification of birdsong. The proposed method is compared with
binary relevance and three multi-instance multi-label learning (MIML)
algorithms from prior work (which focus more on structure in the sound, and
less on structure in the label sets). Experiments are conducted on two
real-world birdsong datasets, and show that the proposed method usually
outperforms binary relevance (using the same features and base-classifier), and
is better in some cases and worse in others compared to the MIML algorithms.Comment: 6 pages, 1 figure, submission to ICML 2013 workshop on bioacoustics.
Note: this is a minor revision- the blind submission format has been replaced
with one that shows author names, and a few corrections have been mad
Multi-Instance Multilabel Learning with Weak-Label for Predicting Protein Function in Electricigens
Nature often brings several domains together to form multidomain and multifunctional proteins with a vast number of possibilities. In our previous study, we disclosed that the protein function prediction problem is naturally and inherently Multi-Instance Multilabel (MIML) learning tasks. Automated protein function prediction is typically implemented under the assumption that the functions of labeled proteins are complete; that is, there are no missing labels. In contrast, in practice just a subset of the functions of a protein are known, and whether this protein has other functions is unknown. It is evident that protein function prediction tasks suffer from weak-label problem; thus protein function prediction with incomplete annotation matches well with the MIML with weak-label learning framework. In this paper, we have applied the state-of-the-art MIML with weak-label learning algorithm MIMLwel for predicting protein functions in two typical real-world electricigens organisms which have been widely used in microbial fuel cells (MFCs) researches. Our experimental results validate the effectiveness of MIMLwel algorithm in predicting protein functions with incomplete annotation
Una librería para el aprendizaje multi-instancia multi-etiqueta
Premio extraordinario de Trabajo Fin de Máster curso 2019/2020. Máster en Ingeniería InformáticaThis project presents a library to work on solving multi instance multi label classification problems. It describes the data format, the software architecture, as well as the different algorithmic proposals that it incorporates. The library allows to add new algorithms in a simple way, facilitating researchers in this area to develop, test and compare new proposals. In addition, it is free and open source and is implemented in Java, using the Weka and Mulan libraries. This way, users who work with these libraries in learning with multiple instances and in learning with multiple labels will find a familiar development environment.Este proyecto presenta una librería para trabajar en la resolución de problemas de clasificación con múltiples instancias y múltiples etiquetas. Se describe el formato de datos, la arquitectura software, así como las diferentes propuestas algorítmicas que incorpora. La librería permite añadir nuevos algoritmos de forma sencilla, facilitando a los investigadores en esta área el desarrollo, prueba y comparación de nuevas propuestas. Además, es libre y de código abierto y está implementada en Java, usando las librerías Weka y Mulan. De este modo, los usuarios habituados a trabajar en las librerías anteriores tanto en el aprendizaje con múltiples instancias como en el aprendizaje con múltiples etiquetas, respectivamente, se encontrarán con un entorno de desarrollo con el que están familiarizados
Human-in-the-Loop Learning From Crowdsourcing and Social Media
Computational social studies using public social media data have become more and more popular because of the large amount of user-generated data available. The richness of social media data, coupled with noise and subjectivity, raise significant challenges for computationally studying social issues in a feasible and scalable manner. Machine learning problems are, as a result, often subjective or ambiguous when humans are involved. That is, humans solving the same problems might come to legitimate but completely different conclusions, based on their personal experiences and beliefs. When building supervised learning models, particularly when using crowdsourced training data, multiple annotations per data item are usually reduced to a single label representing ground truth. This inevitably hides a rich source of diversity and subjectivity of opinions about the labels.
Label distribution learning associates for each data item a probability distribution over the labels for that item, thus it can preserve diversities of opinions, beliefs, etc. that conventional learning hides or ignores. We propose a humans-in-the-loop learning framework to model and study large volumes of unlabeled subjective social media data with less human effort. We study various annotation tasks given to crowdsourced annotators and methods for aggregating their contributions in a manner that preserves subjectivity and disagreement. We introduce a strategy for learning label distributions with only five-to-ten labels per item by aggregating human-annotated labels over multiple, semantically related data items. We conduct experiments using our learning framework on data related to two subjective social issues (work and employment, and suicide prevention) that touch many people worldwide. Our methods can be applied to a broad variety of problems, particularly social problems. Our experimental results suggest that specific label aggregation methods can help provide reliable representative semantics at the population level
Recommended from our members
Multi-instance multi-label learning : algorithms and applications to bird bioacoustics
We consider the problem of supervised classification of bird species from audio recordings in a real-world acoustic monitoring scenario (i.e. audio data is collected in the field with an omnidirectional microphone, without human supervision). Obtaining better data about bird activity can assist conservation efforts, and improve our understanding of their interactions with the environment and other organisms. However, traditional observation methods are labor- intensive. Most prior work on machine learning for bird song is not applicable to real-world acoustic monitoring, because it assumes recordings contain only a single species of bird, while recordings typically contain multiple simultaneously vocalizing birds. We propose to use the multi-instance multi-label (MIML) framework in machine learning for the species classification problem, where the dataset is viewed as a collection of bags of instances paired with sets of labels. Furthermore, we formalize MIML instance annotation, where the goal is to predict instance labels while learning only from bag label sets. We develop the first MIML representation for audio, and several new algorithms for MIML instance annotation based on support vector machines or classifier chains. The proposed methods classify either the set of species present in a recording, or individual calls, while learning only from recordings paired with a set of species. This form of training data requires less human effort to obtain than individually labeled calls. These methods are successfully applied to audio collected in the field which included multiple simultaneously vocalizing species. The proposed algorithms for MIML classification are general, and are also applied to object recognition in images