23,422 research outputs found
How to improve TTS systems for emotional expressivity
Several experiments have been carried out that revealed weaknesses of the current Text-To-Speech (TTS) systems in their emotional expressivity. Although some TTS systems allow XML-based representations of prosodic and/or phonetic variables, few publications considered, as a pre-processing stage, the use of intelligent text processing to detect affective information that can be used to tailor the parameters needed for emotional expressivity. This paper describes a technique for an automatic prosodic parameterization based on affective clues. This technique recognizes the affective information conveyed in a text and, accordingly to its emotional connotation, assigns appropriate pitch accents and other prosodic parameters by XML-tagging. This pre-processing assists the TTS system to generate synthesized speech that contains emotional clues. The experimental results are encouraging and suggest the possibility of suitable emotional expressivity in speech synthesis
A Machine Learning Approach For Opinion Holder Extraction In Arabic Language
Opinion mining aims at extracting useful subjective information from reliable
amounts of text. Opinion mining holder recognition is a task that has not been
considered yet in Arabic Language. This task essentially requires deep
understanding of clauses structures. Unfortunately, the lack of a robust,
publicly available, Arabic parser further complicates the research. This paper
presents a leading research for the opinion holder extraction in Arabic news
independent from any lexical parsers. We investigate constructing a
comprehensive feature set to compensate the lack of parsing structural
outcomes. The proposed feature set is tuned from English previous works coupled
with our proposed semantic field and named entities features. Our feature
analysis is based on Conditional Random Fields (CRF) and semi-supervised
pattern recognition techniques. Different research models are evaluated via
cross-validation experiments achieving 54.03 F-measure. We publicly release our
own research outcome corpus and lexicon for opinion mining community to
encourage further research
Affect-LM: A Neural Language Model for Customizable Affective Text Generation
Human verbal communication includes affective messages which are conveyed
through use of emotionally colored words. There has been a lot of research in
this direction but the problem of integrating state-of-the-art neural language
models with affective information remains an area ripe for exploration. In this
paper, we propose an extension to an LSTM (Long Short-Term Memory) language
model for generating conversational text, conditioned on affect categories. Our
proposed model, Affect-LM enables us to customize the degree of emotional
content in generated sentences through an additional design parameter.
Perception studies conducted using Amazon Mechanical Turk show that Affect-LM
generates naturally looking emotional sentences without sacrificing grammatical
correctness. Affect-LM also learns affect-discriminative word representations,
and perplexity experiments show that additional affective information in
conversational text can improve language model prediction
Generating Music from Literature
We present a system, TransProse, that automatically generates musical pieces
from text. TransProse uses known relations between elements of music such as
tempo and scale, and the emotions they evoke. Further, it uses a novel
mechanism to determine sequences of notes that capture the emotional activity
in the text. The work has applications in information visualization, in
creating audio-visual e-books, and in developing music apps
Expressive speech synthesis using sentiment embeddings
In this paper we present a DNN based speech synthesis system trained on an audiobook including sentiment features predicted by the Stanford sentiment parser. The baseline system uses DNN to predict acoustic parameters based on conventional linguistic features, as they have been used in statistical parametric speech synthesis. The predicted parameters are transformed into speech using a conventional high-quality vocoder. In this paper, the conventional linguistic features are enriched using sentiment features. Different sentiment representations have been considered, combining sentiment probabilities with hierarchical distance and context. After preliminary analysis a listening experiment is conducted, where participants evaluate the different systems. The results show the usefulness of the proposed features and reveal differences between expert and non-expert TTS user.Peer ReviewedPostprint (published version
Semantics, Modelling, and the Problem of Representation of Meaning -- a Brief Survey of Recent Literature
Over the past 50 years many have debated what representation should be used
to capture the meaning of natural language utterances. Recently new needs of
such representations have been raised in research. Here I survey some of the
interesting representations suggested to answer for these new needs.Comment: 15 pages, no figure
Tensorized Self-Attention: Efficiently Modeling Pairwise and Global Dependencies Together
Neural networks equipped with self-attention have parallelizable computation,
light-weight structure, and the ability to capture both long-range and local
dependencies. Further, their expressive power and performance can be boosted by
using a vector to measure pairwise dependency, but this requires to expand the
alignment matrix to a tensor, which results in memory and computation
bottlenecks. In this paper, we propose a novel attention mechanism called
"Multi-mask Tensorized Self-Attention" (MTSA), which is as fast and as
memory-efficient as a CNN, but significantly outperforms previous
CNN-/RNN-/attention-based models. MTSA 1) captures both pairwise (token2token)
and global (source2token) dependencies by a novel compatibility function
composed of dot-product and additive attentions, 2) uses a tensor to represent
the feature-wise alignment scores for better expressive power but only requires
parallelizable matrix multiplications, and 3) combines multi-head with
multi-dimensional attentions, and applies a distinct positional mask to each
head (subspace), so the memory and computation can be distributed to multiple
heads, each with sequential information encoded independently. The experiments
show that a CNN/RNN-free model based on MTSA achieves state-of-the-art or
competitive performance on nine NLP benchmarks with compelling memory- and
time-efficiency
Unveiling What is Written in The Stars: Analyzing Explicit, Implicit, and Discourse Patterns of Sentiment in Social Media
Deciphering consumers' sentiment expressions from big data (e.g., online reviews) has become a managerial priority to monitor product and service evaluations. However, sentiment analysis, the process of automatically distilling sentiment from text, provides little insight regarding the language granularities beyond the use of positive and negative words. Drawing on speech act theory, this study provides a fine-grained analysis of the implicit and explicit language used by consumers to express sentiment in text. An empirical text-mining study using more than 45,000 consumer reviews demonstrates the differential impacts of activation levels (e.g., tentative language), implicit sentiment expressions (e.g., commissive language), and discourse patterns (e.g., incoherence) on overall consumer sentiment (i.e., star ratings). In two follow-up studies, we demonstrate that these speech act features also influence the readers' behavior and are generalizable to other social media contexts, such as Twitter and Facebook. We contribute to research on consumer sentiment analysis by offering a more nuanced understanding of consumer sentiments and their implications
- …