Search CORE

1,706 research outputs found

An Effectiveness Metric for Ordinal Classification: Formal Properties and Experimental Results

Author: Amigó Enrique
Carrillo-de-Albornoz Jorge
Gonzalo Julio
Mizzaro Stefano
Publication venue
Publication date: 01/01/2020
Field of study

In Ordinal Classification tasks, items have to be assigned to classes that have a relative ordering, such as positive, neutral, negative in sentiment analysis. Remarkably, the most popular evaluation metrics for ordinal classification tasks either ignore relevant information (for instance, precision/recall on each of the classes ignores their relative ordering) or assume additional information (for instance, Mean Average Error assumes absolute distances between classes). In this paper we propose a new metric for Ordinal Classification, Closeness Evaluation Measure, that is rooted on Measurement Theory and Information Theory. Our theoretical analysis and experimental results over both synthetic data and data from NLP shared tasks indicate that the proposed metric captures quality aspects from different traditional tasks simultaneously. In addition, it generalizes some popular classification (nominal scale) and error minimization (interval scale) metrics, depending on the measurement scale in which it is instantiated.Comment: To appear in Proceedings of ACL 202

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Università degli Studi di Udine

Improving average ranking precision in user searches for biomedical research datasets

Author: Gaudinat Arnaud
Gobeill Julien
Mottin Luc
Ruch Patrick
Teodoro Douglas
Vachon Thérèse
Publication venue
Publication date: 01/01/2017
Field of study

Availability of research datasets is keystone for health and life science study reproducibility and scientific progress. Due to the heterogeneity and complexity of these data, a main challenge to be overcome by research data management systems is to provide users with the best answers for their search queries. In the context of the 2016 bioCADDIE Dataset Retrieval Challenge, we investigate a novel ranking pipeline to improve the search of datasets used in biomedical experiments. Our system comprises a query expansion model based on word embeddings, a similarity measure algorithm that takes into consideration the relevance of the query terms, and a dataset categorisation method that boosts the rank of datasets matching query constraints. The system was evaluated using a corpus with 800k datasets and 21 annotated user queries. Our system provides competitive results when compared to the other challenge participants. In the official run, it achieved the highest infAP among the participants, being +22.3% higher than the median infAP of the participant's best submissions. Overall, it is ranked at top 2 if an aggregated metric using the best official measures per participant is considered. The query expansion method showed positive impact on the system's performance increasing our baseline up to +5.0% and +3.4% for the infAP and infNDCG metrics, respectively. Our similarity measure algorithm seems to be robust, in particular compared to Divergence From Randomness framework, having smaller performance variations under different training conditions. Finally, the result categorization did not have significant impact on the system's performance. We believe that our solution could be used to enhance biomedical dataset management systems. In particular, the use of data driven query expansion methods could be an alternative to the complexity of biomedical terminologies

arXiv.org e-Print Archive

Hes-so: ArODES Open Archive (University of Applied Sciences and Arts Western Switzerland / Haute école spécialisée de Suisse occidentale / FH Westschweiz)

The Novartis Repository

Archive ouverte UNIGE

REINA at RepLab2013 Topic Detection Task: Community Detection

Author: Alonso-Berrocal José-Luis
G. Figuerola Carlos
Zazo-Rodríguez Ángel-F.
Publication venue
Publication date: 23/09/2013
Field of study

Social networks have become a large repository of comments which can extract multiple information. Twitter is one of the most widespread social networks and larger and is therefore an important source for detecting states of opinion, events and happenings before even the mainstream media. Topic detection is important to discover areas of interest that arise in the tweets. We have used classical systems for a similarity matrix and we have used community detection techniques. The results have been good and allows us to study new possibilities

REINA at RepLab2013 Topic Detection Task: Community Detection

Author: Alonso Berrocal José Luis
Figuerola Carlos G.
Zazo Rodríguez Ángel Francisco
Publication venue
Publication date: 23/09/2013
Field of study

[EN]Social networks have become a large repository of comments which can extract multiple information. Twitter is one of the most widespread social networks and larger and is therefore an important source for detecting states of opinion, events and happenings before even the mainstream media. Topic detection is important to discover areas of interest that arise in the tweets. We have used classical systems for a similarity matrix and we have used community detection techniques. The results have been good and allows us to study new possibilities

Gestion del Repositorio Documental de la Universidad de Salamanca

Learning to classify software defects from crowds: a novel approach

Author: Hernández-González J.
Inza I.
Lozano J.A.
Rachel H.
Rodríguez D.
Publication venue
Publication date: 01/01/2017
Field of study

In software engineering, associating each reported defect with a cate- gory allows, among many other things, for the appropriate allocation of resources. Although this classification task can be automated using stan- dard machine learning techniques, the categorization of defects for model training requires expert knowledge, which is not always available. To cir- cumvent this dependency, we propose to apply the learning from crowds paradigm, where training categories are obtained from multiple non-expert annotators (and so may be incomplete, noisy or erroneous) and, dealing with this subjective class information, classifiers are efficiently learnt. To illustrate our proposal, we present two real applications of the IBM’s or- thogonal defect classification working on the issue tracking systems from two different real domains. Bayesian network classifiers learnt using two state-of-the-art methodologies from data labeled by a crowd of annotators are used to predict the category (impact) of reported software defects. The considered methodologies show enhanced performance regarding the straightforward solution (majority voting) according to different metrics. This shows the possibilities of using non-expert knowledge aggregation techniques when expert knowledge is unavailable

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

BCAM's Institutional Repository Data

Bagged ensemble of Fuzzy C-Means classifiers for nuclear transient identification

Author: Baraldi Piero
Razavi-Far Roozbeh
Zio Enrico
Publication venue: 'Elsevier BV'
Publication date: 01/05/2011
Field of study

This paper presents an ensemble-based scheme for nuclear transient identification. The approach adopted to construct the ensemble of classifiers is bagging; the novelty consists in using supervised fuzzy C-means (FCM) classifiers as base classifiers of the ensemble. The performance of the proposed classification scheme has been verified by comparison with a single supervised, evolutionary-optimized FCM classifier with respect of the task of classifying artificial datasets. The results obtained indicate that in the cases of datasets of large or very small sizes and/or complex decision boundaries, the bagging ensembles can improve classification accuracy. Then, the approach has been applied to the identification of simulated transients in the feedwater system of a boiling water reactor (BWR)

HAL-CentraleSupelec

Scholarship at UWindsor

HAL-Rennes 1

Bagged ensemble of Fuzzy C-Means classifiers for nuclear transient identification

Author: Baraldi Piero
Razavi-Far Roozbeh
Zio Enrico
Publication venue: Scholarship at UWindsor
Publication date: 01/05/2011
Field of study

Scholarship at UWindsor

Overview of RepLab 2012: Evaluating Online Reputation Management Systems

Author: Amigó E.
Corujo A.
de Rijke M.
Gonzalo J.
Meij E.
Publication venue: CEUR-WS
Publication date: 01/01/2012
Field of study

International Migration, Integration and Social Cohesion online publications