2,139 research outputs found
Universal, Unsupervised (Rule-Based), Uncovered Sentiment Analysis
We present a novel unsupervised approach for multilingual sentiment analysis
driven by compositional syntax-based rules. On the one hand, we exploit some of
the main advantages of unsupervised algorithms: (1) the interpretability of
their output, in contrast with most supervised models, which behave as a black
box and (2) their robustness across different corpora and domains. On the other
hand, by introducing the concept of compositional operations and exploiting
syntactic information in the form of universal dependencies, we tackle one of
their main drawbacks: their rigidity on data that are structured differently
depending on the language concerned. Experiments show an improvement both over
existing unsupervised methods, and over state-of-the-art supervised models when
evaluating outside their corpus of origin. Experiments also show how the same
compositional operations can be shared across languages. The system is
available at http://www.grupolys.org/software/UUUSA/Comment: 19 pages, 5 Tables, 6 Figures. This is the authors version of a work
that was accepted for publication in Knowledge-Based System
One model, two languages: training bilingual parsers with harmonized treebanks
We introduce an approach to train lexicalized parsers using bilingual corpora
obtained by merging harmonized treebanks of different languages, producing
parsers that can analyze sentences in either of the learned languages, or even
sentences that mix both. We test the approach on the Universal Dependency
Treebanks, training with MaltParser and MaltOptimizer. The results show that
these bilingual parsers are more than competitive, as most combinations not
only preserve accuracy, but some even achieve significant improvements over the
corresponding monolingual parsers. Preliminary experiments also show the
approach to be promising on texts with code-switching and when more languages
are added.Comment: 7 pages, 4 tables, 1 figur
Towards Syntactic Iberian Polarity Classification
Lexicon-based methods using syntactic rules for polarity classification rely
on parsers that are dependent on the language and on treebank guidelines. Thus,
rules are also dependent and require adaptation, especially in multilingual
scenarios. We tackle this challenge in the context of the Iberian Peninsula,
releasing the first symbolic syntax-based Iberian system with rules shared
across five official languages: Basque, Catalan, Galician, Portuguese and
Spanish. The model is made available.Comment: 7 pages, 5 tables. Contribution to the 8th Workshop on Computational
Approaches to Subjectivity, Sentiment and Social Media Analysis (WASSA-2017)
at EMNLP 201
The Fragility of Multi-Treebank Parsing Evaluation
Held in Gyeongju, Republic of Korea. October 12-17, 2022[Absctract]: Treebank selection for parsing evaluation and the spurious effects that might arise from a biased choice have not been explored in detail. This paper studies how evaluating on a single subset of treebanks can lead to weak conclusions. First, we take a few contrasting parsers, and run them on subsets of treebanks proposed in previous work, whose use was justified (or not) on criteria such as typology or data scarcity. Second, we run a large-scale version of this experiment, create vast amounts of random subsets of treebanks, and compare on them many parsers whose scores are available. The results show substantial variability across subsets and that although establishing guidelines for good treebank selection is hard, some inadequate strategies can be easily avoided.This work was supported by a 2020 Leonardo
Grant for Researchers and Cultural Creators from
the FBBVA,15 as well as by the European Research Council (ERC), under the European Union’s
Horizon 2020 research and innovation programme
(FASTPARSE, grant agreement No 714150). The
work is also supported by ERDF/MICINN-AEI
(SCANNER-UDC, PID2020-113230RB-C21), by
Xunta de Galicia (ED431C 2020/11), and by Centro de Investigación de Galicia “CITIC” which is
funded by Xunta de Galicia, Spain and the European Union (ERDF - Galicia 2014–2020 Program),
by grant ED431G 2019/01.Xunta de Galicia; ED431C 2020/11Xunta de Galicia; ED431G 2019/0
How important is syntactic parsing accuracy? An empirical evaluation on rule-based sentiment analysis
This version of the article has been accepted for publication, after peer review and is subject to Springer Nature’s AM terms of use, but is not the Version of Record and does not reflect post-acceptance improvements, or any corrections. The Version of Record is available online at: https://doi.org/10.1007/s10462-017-9584-0[Abstract]: Syntactic parsing, the process of obtaining the internal structure of sentences in natural languages, is a crucial task for artificial intelligence applications that need to extract meaning from natural language text or speech. Sentiment analysis is one example of application for which parsing has recently proven useful. In recent years, there have been significant advances in the accuracy of parsing algorithms. In this article, we perform an empirical, task-oriented evaluation to determine how parsing accuracy influences the performance of a state-of-the-art rule-based sentiment analysis system that determines the polarity of sentences from their parse trees. In particular, we evaluate the system using four well-known dependency parsers, including both current models with state-of-the-art accuracy and more innacurate models which, however, require less computational resources. The experiments show that all of the parsers produce similarly good results in the sentiment analysis task, without their accuracy having any relevant influence on the results. Since parsing is currently a task with a relatively high computational cost that varies strongly between algorithms, this suggests that sentiment analysis researchers and users should prioritize speed over accuracy when choosing a parser; and parsing researchers should investigate models that improve speed further, even at some cost to accuracy.Carlos Gómez-Rodríguez has received funding from the European Research Council (ERC), under the European Union’s Horizon 2020 research and innovation programme (FASTPARSE, Grant Agreement No 714150), Ministerio de Economía y Competitividad (FFI2014-51978-C2-2-R), and the Oportunius Program (Xunta de Galicia). Iago Alonso-Alonso was funded by an Oportunius Program Grant (Xunta de Galicia). David Vilares has received funding from the Ministerio de Educación, Cultura y Deporte (FPU13/01180) and Ministerio de Economía y Competitividad (FFI2014-51978-C2-2-R)
LyS_ACoruña at SemEval-2022 Task 10: Repurposing Off-the-Shelf Tools for Sentiment Analysis as Semantic Dependency Parsing
Held 14-15 July 2022, Online[Absctract]: This paper addressed the problem of structured sentiment analysis using a bi-affine semantic dependency parser, large pre-trained language models, and publicly available translation models. For the monolingual setup, we considered: (i) training on a single treebank, and (ii) relaxing the setup by training on treebanks coming from different languages that can be adequately processed by cross-lingual language models. For the zero-shot setup and a given target treebank, we relied on: (i) a word-level translation of available treebanks in other languages to get noisy, unlikely-grammatical, but annotated data (we release as much of it as licenses allow), and (ii) merging those translated treebanks to obtain training data. In the post-evaluation phase, we also trained cross-lingual models that simply merged all the English treebanks and did not use word-level translations, and yet obtained better results. According to the official results, we ranked 8th and 9th in the monolingual and cross-lingual setups.This work is supported by a 2020 Leonardo Grant
for Researchers and Cultural Creators from the
FBBVA, as well as by the European Research
Council (ERC), under the European Union’s Horizon 2020 research and innovation programme
(FASTPARSE, grant agreement No 714150). The
work is also supported by ERDF/MICINN-AEI
(SCANNER-UDC, PID2020-113230RB-C21), by
Xunta de Galicia (ED431C 2020/11), and by Centro de Investigación de Galicia “CITIC” which is
funded by Xunta de Galicia, Spain and the European Union (ERDF - Galicia 2014–2020 Program),
by grant ED431G 2019/01.Xunta de Galicia; ED431C 2020/11Xunta de Galicia; ED431G 2019/0
Retrieval of bilingual Spanish-English information by means of a standard automatic translation system
This paper describes our participation in bilingual retrieval (queries in Spanish on documents in English), by means of an information retrieval system based on the vector model. The queries, formulated in Spanish, were translated into English by means of a commercial automatic translation system; the terms extracted from the resulting translations were filtered in order to get rid of empty words and then they were normalised by stemming. Results are poorer than those obtained through monolingual retrieval with the original queries in English slightly above 15%
John's ellipsoid and the integral ratio of a log-concave function
We extend the notion of John’s ellipsoid to the setting of integrable
log-concave functions. This will allow us to define the integral ratio of a
log-concave function, which will extend the notion of volume ratio, and we
will find the log-concave function maximizing the integral ratio. A reverse
functional affine isoperimetric inequality will be given, written in terms of this
integral ratio. This can be viewed as a stability version of the functional affine
isoperimetric inequality.Ministerio de Economía y CompetitividadFondo Europeo de Desarrollo RegionalConsejería de Industria, Turismo, Empresa e Innovación (Comunidad Autónoma de la Región de Murcia)Coordenação de aperfeiçoamento de pessoal de nivel superiorInstituto Nacional de Matemática Pura e Aplicad
- …