5 research outputs found
Multitask Learning for Fine-Grained Twitter Sentiment Analysis
Traditional sentiment analysis approaches tackle problems like ternary
(3-category) and fine-grained (5-category) classification by learning the tasks
separately. We argue that such classification tasks are correlated and we
propose a multitask approach based on a recurrent neural network that benefits
by jointly learning them. Our study demonstrates the potential of multitask
models on this type of problems and improves the state-of-the-art results in
the fine-grained sentiment classification problem.Comment: International ACM SIGIR Conference on Research and Development in
Information Retrieval 201
Multi-Label Quantification
The work of A. Moreo and F. Sebastiani has been supported by the SoBigData++ project, funded
by the European Commission (Grant 871042) under the H2020 Programme INFRAIA-2019-1, by
the AI4Media project, funded by the European Commission (Grant 951911) under the H2020
Programme ICT-48-2020, and by the SoBigData.it and FAIR projects funded by the Italian Ministry
of University and Research under the NextGenerationEU program; the authors’ opinions do not
necessarily reflect those of the funding agencies. The work of M. Francisco has been supported by
the FPI 2017 predoctoral programme, from the Spanish Ministry of Economy and Competitiveness
(MINECO), grant BES-2017-081202.Quantification, variously called supervised prevalence estimation or learning to quantify, is the supervised
learning task of generating predictors of the relative frequencies (a.k.a. prevalence values) of the classes of
interest in unlabelled data samples. While many quantification methods have been proposed in the past
for binary problems and, to a lesser extent, single-label multiclass problems, the multi-label setting (i.e.,
the scenario in which the classes of interest are not mutually exclusive) remains by and large unexplored.
A straightforward solution to the multi-label quantification problem could simply consist of recasting the
problem as a set of independent binary quantification problems. Such a solution is simple but naĂŻve, since
the independence assumption upon which it rests is, in most cases, not satisfied. In these cases, knowing
the relative frequency of one class could be of help in determining the prevalence of other related classes.
We propose the first truly multi-label quantification methods, i.e., methods for inferring estimators of class
prevalence values that strive to leverage the stochastic dependencies among the classes of interest in order
to predict their relative frequencies more accurately. We show empirical evidence that natively multi-label
solutions outperform the naĂŻve approaches by a large margin. The code to reproduce all our experiments is
available online.SoBigData++ project, funded by the European Commission (Grant 871042) under the H2020 Programme INFRAIA-2019-1AI4Media project, funded by the European Commission (Grant 951911) under the H2020 Programme ICT-48-2020SoBigData.it and FAIR projects funded by the Italian Ministry of University and Research under the NextGenerationEU programPI 2017 predoctoral programme, from the Spanish Ministry of Economy and Competitiveness (MINECO), grant BES-2017-08120
A review onquantification learning
The task of quantification consists in providing an aggregate estimation (e.g. the class distribution in a classification problem) for unseen test sets, applying a model that is trained using a training set with a different data distribution. Several real-world applications demand this kind of methods that do not require predictions for individual examples and just focus on obtaining accurate estimates at an aggregate level. During the past few years, several quantification methods have been proposed from different perspectives and with different goals. This paper presents a unified review of the main approaches with the aim of serving as an introductory tutorial for newcomers in the fiel
Machine learning for acquiring knowledge in astro-particle physics
This thesis explores the fundamental aspects of machine learning, which are involved with acquiring knowledge in the research field of astro-particle physics. This research field substantially relies on machine learning methods, which reconstruct the properties of astro-particles from the raw data that specialized telescopes record. These methods are typically trained from resource-intensive simulations, which reflect the existing knowledge about the particles—knowledge that physicists strive to expand. We study three fundamental machine learning tasks, which emerge from this goal.
First, we address ordinal quantification, the task of estimating the prevalences of ordered classes in sets of unlabeled data. This task emerges from the need for testing the agreement of astro-physical theories with the class prevalences that a telescope observes. To this end, we unify existing methods on quantification, propose an alternative optimization process, and develop regularization techniques to address ordinality in quantification problems, both in and outside of astro-particle physics. These advancements provide more accurate reconstructions of the energy spectra of cosmic gamma ray sources and, hence, support physicists in drawing conclusions from their telescope data.
Second, we address learning under class-conditional label noise. More particularly, we focus on a novel setting, in which one of the class-wise noise rates is known and one is not. This setting emerges from a data acquisition protocol, through which astro-particle telescopes simultaneously observe a region of interest and several background regions. We enable learning under this type of label noise with algorithms for consistent, noise-aware decision thresholding. These algorithms yield binary classifiers, which outperform the existing state-of-the-art in gamma hadron classification with the FACT telescope. Moreover, unlike the state-of-the-art, our classifiers are entirely trained from the real telescope data and thus do not require any resource-intensive simulation.
Third, we address active class selection, the task of actively finding those proportions of classes which optimize the classification performance. In astro-particle physics, this task emerges from the simulation, which produces training data in any desired class proportions. We clarify the implications of this setting from two theoretical perspectives, one of which provides us with bounds of the resulting classification performance. We employ these bounds in a certificate of model robustness, which declares a set of class proportions for which the model is accurate with a high probability. We also employ these bounds in an active strategy for class-conditional data acquisition. Our strategy uniquely considers existing uncertainties about those class proportions that have to be handled during the deployment of the classifier, while being theoretically well-justified