705 research outputs found
Dialogue-Oriented Review Summary Generation for Spoken Dialogue Recommendation Systems
In this paper we present an opinion summarization technique in spoken dialogue systems. Opinion mining has been well studied for years, but very few have considered its application in spoken dialogue systems. Review summarization, when applied to real dialogue systems, is much more complicated than pure text-based summarization. We conduct a systematic study on dialogue-system-oriented review analysis and propose a three-level framework for a recommendation dialogue system. In previous work we have explored a linguistic parsing approach to phrase extraction from reviews. In this paper we will describe an approach using statistical models such as decision trees and SVMs to select the most representative phrases from the extracted phrase set. We will also explain how to generate informative yet concise review summaries for dialogue purposes. Experimental results in the restaurant domain show that the proposed approach using decision tree algorithms achieves an outperformance of 13% compared to SVM models and an improvement of 36% over a heuristic rule baseline. Experiments also show that the decision-tree-based phrase selection model can achieve rather reliable predictions on the phrase label, comparable to human judgment. The proposed statistical approach is based on domain-independent learning features and can be extended to other domains effectively
ReadMe++: Benchmarking Multilingual Language Models for Multi-Domain Readability Assessment
We present a systematic study and comprehensive evaluation of large language
models for automatic multilingual readability assessment. In particular, we
construct ReadMe++, a multilingual multi-domain dataset with human annotations
of 9757 sentences in Arabic, English, French, Hindi, and Russian collected from
112 different data sources. ReadMe++ offers more domain and language diversity
than existing readability datasets, making it ideal for benchmarking
multilingual and non-English language models (including mBERT, XLM-R, mT5,
Llama-2, GPT-4, etc.) in the supervised, unsupervised, and few-shot prompting
settings. Our experiments reveal that models fine-tuned on ReadMe++ outperform
those trained on single-domain datasets, showcasing superior performance on
multi-domain readability assessment and cross-lingual transfer capabilities. We
also compare to traditional readability metrics (such as Flesch-Kincaid Grade
Level and Open Source Metric for Measuring Arabic Narratives), as well as the
state-of-the-art unsupervised metric RSRS (Martinc et al., 2021). We will make
our data and code publicly available at: https://github.com/tareknaous/readme.Comment: We have added French and Russian as two new languages to the corpu
Multilingual sentiment analysis in social media.
252 p.This thesis addresses the task of analysing sentiment in messages coming from social media. The ultimate goal was to develop a Sentiment Analysis system for Basque. However, because of the socio-linguistic reality of the Basque language a tool providing only analysis for Basque would not be enough for a real world application. Thus, we set out to develop a multilingual system, including Basque, English, French and Spanish.The thesis addresses the following challenges to build such a system:- Analysing methods for creating Sentiment lexicons, suitable for less resourced languages.- Analysis of social media (specifically Twitter): Tweets pose several challenges in order to understand and extract opinions from such messages. Language identification and microtext normalization are addressed.- Research the state of the art in polarity classification, and develop a supervised classifier that is tested against well known social media benchmarks.- Develop a social media monitor capable of analysing sentiment with respect to specific events, products or organizations
A practical guide to conversation research: how to study what people say to each other
Conversation—a verbal interaction between two or more people—is a complex, pervasive, and consequential human behavior. Conversations have been studied across many academic disciplines. However, advances in recording and analysis techniques over the last decade have allowed researchers to more directly and precisely examine conversations in natural contexts and at a larger scale than ever before, and these advances open new paths to understand humanity and the social world. Existing reviews of text analysis and conversation research have focused on text generated by a single author (e.g., product reviews, news articles, and public speeches) and thus leave open questions about the unique challenges presented by interactive conversation data (i.e., dialogue). In this article, we suggest approaches to overcome common challenges in the workflow of conversation science, including recording and transcribing conversations, structuring data (to merge turn-level and speaker-level data sets), extracting and aggregating linguistic features, estimating effects, and sharing data. This practical guide is meant to shed light on current best practices and empower more researchers to study conversations more directly—to expand the community of conversation scholars and contribute to a greater cumulative scientific understanding of the social world
Multilingual sentiment analysis in social media.
252 p.This thesis addresses the task of analysing sentiment in messages coming from social media. The ultimate goal was to develop a Sentiment Analysis system for Basque. However, because of the socio-linguistic reality of the Basque language a tool providing only analysis for Basque would not be enough for a real world application. Thus, we set out to develop a multilingual system, including Basque, English, French and Spanish.The thesis addresses the following challenges to build such a system:- Analysing methods for creating Sentiment lexicons, suitable for less resourced languages.- Analysis of social media (specifically Twitter): Tweets pose several challenges in order to understand and extract opinions from such messages. Language identification and microtext normalization are addressed.- Research the state of the art in polarity classification, and develop a supervised classifier that is tested against well known social media benchmarks.- Develop a social media monitor capable of analysing sentiment with respect to specific events, products or organizations
Controlling Personality-Based Stylistic Variation with Neural Natural Language Generators
Natural language generators for task-oriented dialogue must effectively
realize system dialogue actions and their associated semantics. In many
applications, it is also desirable for generators to control the style of an
utterance. To date, work on task-oriented neural generation has primarily
focused on semantic fidelity rather than achieving stylistic goals, while work
on style has been done in contexts where it is difficult to measure content
preservation. Here we present three different sequence-to-sequence models and
carefully test how well they disentangle content and style. We use a
statistical generator, Personage, to synthesize a new corpus of over 88,000
restaurant domain utterances whose style varies according to models of
personality, giving us total control over both the semantic content and the
stylistic variation in the training data. We then vary the amount of explicit
stylistic supervision given to the three models. We show that our most explicit
model can simultaneously achieve high fidelity to both semantic and stylistic
goals: this model adds a context vector of 36 stylistic parameters as input to
the hidden state of the encoder at each time step, showing the benefits of
explicit stylistic supervision, even when the amount of training data is large.Comment: To appear at SIGDIAL 201
- …