Search CORE

10 research outputs found

Detecting and ordering adjectival scalemates

Author: van Miltenburg Emiel
Publication venue
Publication date: 01/01/2015
Field of study

This paper presents a pattern-based method that can be used to infer adjectival scales, such as , from a corpus. Specifically, the proposed method uses lexical patterns to automatically identify and order pairs of scalemates, followed by a filtering phase in which unrelated pairs are discarded. For the filtering phase, several different similarity measures are implemented and compared. The model presented in this paper is evaluated using the current standard, along with a novel evaluation set, and shown to be at least as good as the current state-of-the-art.Comment: Paper presented at MAPLEX 2015, February 9-10, Yamagata, Japan (http://lang.cs.tut.ac.jp/maplex2015/

arXiv.org e-Print Archive

VU Research Portal

“Was it good? It was provocative.” Learning the meaning of scalar adjectives

Author: Christopher D. Manning
Christopher Potts
de Marneffe Marie-Catherine
Publication venue
Publication date: 01/01/2010
Field of study

Texts and dialogues often express information indirectly. For instance, speakers’ answers to yes/no questions do not always straightforwardly convey a ‘yes’ or ‘no’ answer. The intended reply is clear in some cases (Was it good? It was great!) but uncertain in others (Was it acceptable? It was unprecedented.). In this paper, we present methods for interpreting the answers to questions like these which involve scalar modifiers. We show how to ground scalar modifier meaning based on data collected from the Web. We learn scales between modifiers and infer the extent to which a given answer conveys ‘yes’ or ‘no’. To evaluate the methods, we collected examples of question–answer pairs involving scalar modifiers from CNN transcripts and the Dialog Act corpus and use response distributions from Mechanical Turk workers to assess the degree to which each answer conveys ‘yes’ or ‘no’. Our experimental results closely match the Turkers’ response data, demonstrating that meanings can be learned from Web data and that such meanings can drive pragmatic inferenc

DIAL UCLouvain

Numeracy for Language Models: Evaluating and Improving their Ability to Predict Numbers

Author: Riedel Sebastian
Spithourakis Georgios P.
Publication venue
Publication date: 01/01/2018
Field of study

Numeracy is the ability to understand and work with numbers. It is a necessary skill for composing and understanding documents in clinical, scientific, and other technical domains. In this paper, we explore different strategies for modelling numerals with language models, such as memorisation and digit-by-digit composition, and propose a novel neural architecture that uses a continuous probability density function to model numerals from an open vocabulary. Our evaluation on clinical and scientific datasets shows that using hierarchical models to distinguish numerals from words improves a perplexity metric on the subset of numerals by 2 and 4 orders of magnitude, respectively, over non-hierarchical models. A combination of strategies can further improve perplexity. Our continuous probability density function model reduces mean absolute percentage errors by 18% and 54% in comparison to the second best strategy for each dataset, respectively.Comment: accepted at ACL 201

arXiv.org e-Print Archive

UCL Discovery

Numeracy for language models: Evaluating and improving their ability to predict numbers

Author: Riedel S
Spithourakis GP
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2018
Field of study

UCL Discovery

The value of numbers in clinical text classification

Author: Corcoran Padraig
Miok Kristian
Spasic Irena
Publication venue: MDPI
Publication date: 07/07/2023
Field of study

Clinical text often includes numbers of various types and formats. However, most current text classification approaches do not take advantage of these numbers. This study aims to demonstrate that using numbers as features can significantly improve the performance of text classification models. This study also demonstrates the feasibility of extracting such features from clinical text. Unsupervised learning was used to identify patterns of number usage in clinical text. These patterns were analyzed manually and converted into pattern-matching rules. Information extraction was used to incorporate numbers as features into a document representation model. We evaluated text classification models trained on such representation. Our experiments were performed with two document representation models (vector space model and word embedding model) and two classification models (support vector machines and neural networks). The results showed that even a handful of numerical features can significantly improve text classification performance. We conclude that commonly used document representations do not represent numbers in a way that machine learning algorithms can effectively utilize them as features. Although we demonstrated that traditional information extraction can be effective in converting numbers into features, further community-wide research is required to systematically incorporate number representation into the word embedding process

Online Research @ Cardiff

MULDASA:Multifactor Lexical Sentiment Analysis of Social-Media Content in Nonstandard Arabic Social Media

Author: Alanazi Saad
Alwakid Ghadah
El-Haj Mahmoud
Humayun Mamoona
Osman Taha
Us Sama Najm
Publication venue: 'MDPI AG'
Publication date: 01/01/2022
Field of study

The semantically complicated Arabic natural vocabulary, and the shortage of available techniques and skills to capture Arabic emotions from text hinder Arabic sentiment analysis (ASA). Evaluating Arabic idioms that do not follow a conventional linguistic framework, such as contemporary standard Arabic (MSA), complicates an incredibly difficult procedure. Here, we define a novel lexical sentiment analysis approach for studying Arabic language tweets (TTs) from specialized digital media platforms. Many elements comprising emoji, intensifiers, negations, and other nonstandard expressions such as supplications, proverbs, and interjections are incorporated into the MULDASA algorithm to enhance the precision of opinion classifications. Root words in multidialectal sentiment LX are associated with emotions found in the content under study via a simple stemming procedure. Furthermore, a feature–sentiment correlation procedure is incorporated into the proposed technique to exclude viewpoints expressed that seem to be irrelevant to the area of concern. As part of our research into Saudi Arabian employability, we compiled a large sample of TTs in 6 different Arabic dialects. This research shows that this sentiment categorization method is useful, and that using all of the characteristics listed earlier improves the ability to accurately classify people’s feelings. The classification accuracy of the proposed algorithm improved from 83.84% to 89.80%. Our approach also outperformed two existing research projects that employed a lexical approach for the sentiment analysis of Saudi dialect

Multidisciplinary Digital Publishing Institute

Nottingham Trent Institutional Repository (IRep)

Directory of Open Access Journals

Lancaster E-Prints

HiER 2015. Proceedings des 9. Hildesheimer Evaluierungs- und Retrievalworkshop

Author: Elbeshausen Stefanie
Faaß Getrud
Friesbaum Joachim
Heuwing Ben
Jürgens Julia
Publication venue: Hildesheim : Universitätsverlag Hildesheim
Publication date: 01/01/2015
Field of study

Die Digitalisierung formt unsere Informationsumwelten. Disruptive Technologien dringen verstärkt und immer schneller in unseren Alltag ein und verändern unser Informations- und Kommunikationsverhalten. Informationsmärkte wandeln sich. Der 9. Hildesheimer Evaluierungs- und Retrievalworkshop HIER 2015 thematisiert die Gestaltung und Evaluierung von Informationssystemen vor dem Hintergrund der sich beschleunigenden Digitalisierung. Im Fokus stehen die folgenden Themen: Digital Humanities, Internetsuche und Online Marketing, Information Seeking und nutzerzentrierte Entwicklung, E-Learning

University of Hildesheim

HiER 2015 - Proceedings des 9. Hildesheimer Evaluierungs- und Retrievalworkshop

Author: Elbeshausen Stefanie
Faaß Gertrud
Griesbaum Joachim
Heuwing Ben
Jürgens Julia
Publication venue: Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)
Publication date: 18/07/2023
Field of study

Dieser Band fasst die Vorträge des 9. Hildesheimer Evaluierungs- und Retrieval-Workshops (HIER) zusammen, der am 9. und 10. Juli 2015 an der Universität Hildesheim stattfand. Die HIER Workshop-Reihe begann im Jahr 2001 mit dem Ziel, die Forschungsergebnisse der Hildesheimer Informationswissenschaft zu präsentieren und zu diskutieren. Mittlerweile nehmen immer wieder Kooperationspartner von anderen Institutionen teil, was wir sehr begrüßen. HIER schafft auch ein Forum für Systemvorstellungen und praxisorientierte Beiträge

Publikationsserver des Instituts für Deutsche Sprache

Recommended from our members

Sentiment analysis of dialectical Arabic social media content using a hybrid linguistic-machine learning approach

Author: Alwakid GN
Publication venue
Publication date: 01/01/2020
Field of study

Despite the enormous increase in the number of Arabic posts on social networks, the sentiment analysis research into extracting opinions from these posts lags behind that for the English language. This is largely attributed to the challenges in processing the morphologically complex Arabic natural language and the scarcity of Arabic NLP tools and resources. This complex task is further exacerbated when analysing dialectal Arabic that do not abide by the formal grammatical structure. Based on the semantic modelling of the target domain’s knowledge and multi-factor lexicon-based sentiment analysis, the intent of this research is to use a hybrid approach, integrating linguistic and machine learning methods for sentiment analysis classification of dialectal Arabic. First, a dataset of dialectal Arabic tweets was collected focusing on the unemployment domain, which is annotated manually. The tweets cover different dialectal Arabic in Saudi Arabia for which a comprehensive Arabic sentiment lexicon was constructed. This approach to sentiment analysis also integrated a novel light stemming mechanism towards improved Saudi dialectal Arabic stemming. Subsequently, a novel multi-factor lexicon-based sentiment analysis algorithm was developed for domain-specific social media posts written in dialectal Arabic. The algorithm considers several factors (emoji, intensifiers, negations, supplications) to improve the accuracy of the classifications. Applying this model to a central problem of sentiment analysis in dialectical Arabic, these operational techniques were deployed in order to assess analytical performance across social media channels which are vulnerable to semantic and colloquial variations. Finally, this study presented a new hybrid approach to sentiment analysis where domain knowledge is utilised in two methods to combine computational linguistics and machine learning; the first method integrates the problem domain semantic knowledgebase in the machine learning training features set, while the second uses the outcome of the lexicon-based sentiment classification in the training of the machine learning methods. By integrating these techniques into a single, hybridised solution, a greater degree of accuracy and consistency was achieved than applying each approach independently, confirming a pragmatic solution to sentiment classification in dialectical Arabic text

Nottingham Trent Institutional Repository (IRep)