391 research outputs found
Paronyms for Accelerated Correction of Semantic Errors
* Work done under partial support of Mexican Government (CONACyT, SNI), IPN (CGPI, COFAA) and Korean Government (KIPA
Professorship for Visiting Faculty Positions). The second author is currently on Sabbatical leave at Chung-Ang University.The errors usually made by authors during text preparation are classified. The notion of semantic
errors is elaborated, and malapropisms are pointed among them as “similar” to the intended word but
essentially distorting the meaning of the text. For whatever method of malapropism correction, we propose to
beforehand compile dictionaries of paronyms, i.e. of words similar to each other in letters, sounds or morphs.
The proposed classification of errors and paronyms is illustrated by English and Russian examples being valid
for many languages. Specific dictionaries of literal and morphemic paronyms are compiled for Russian. It is
shown that literal paronyms drastically cut down (up to 360 times) the search of correction candidates, while
morphemic paronyms permit to correct errors not studied so far and characteristic for foreigners
DialogueRNN: An Attentive RNN for Emotion Detection in Conversations
Emotion detection in conversations is a necessary step for a number of
applications, including opinion mining over chat history, social media threads,
debates, argumentation mining, understanding consumer feedback in live
conversations, etc. Currently, systems do not treat the parties in the
conversation individually by adapting to the speaker of each utterance. In this
paper, we describe a new method based on recurrent neural networks that keeps
track of the individual party states throughout the conversation and uses this
information for emotion classification. Our model outperforms the state of the
art by a significant margin on two different datasets.Comment: AAAI 201
PolyHope: Two-Level Hope Speech Detection from Tweets
Hope is characterized as openness of spirit toward the future, a desire,
expectation, and wish for something to happen or to be true that remarkably
affects human's state of mind, emotions, behaviors, and decisions. Hope is
usually associated with concepts of desired expectations and
possibility/probability concerning the future. Despite its importance, hope has
rarely been studied as a social media analysis task. This paper presents a hope
speech dataset that classifies each tweet first into "Hope" and "Not Hope",
then into three fine-grained hope categories: "Generalized Hope", "Realistic
Hope", and "Unrealistic Hope" (along with "Not Hope"). English tweets in the
first half of 2022 were collected to build this dataset. Furthermore, we
describe our annotation process and guidelines in detail and discuss the
challenges of classifying hope and the limitations of the existing hope speech
detection corpora. In addition, we reported several baselines based on
different learning approaches, such as traditional machine learning, deep
learning, and transformers, to benchmark our dataset. We evaluated our
baselines using weighted-averaged and macro-averaged F1-scores. Observations
show that a strict process for annotator selection and detailed annotation
guidelines enhanced the dataset's quality. This strict annotation process
resulted in promising performance for simple machine learning classifiers with
only bi-grams; however, binary and multiclass hope speech detection results
reveal that contextual embedding models have higher performance in this
dataset.Comment: 20 pages, 9 figure
Fractal Power Law in Literary English
We present in this paper a numerical investigation of literary texts by
various well-known English writers, covering the first half of the twentieth
century, based upon the results obtained through corpus analysis of the texts.
A fractal power law is obtained for the lexical wealth defined as the ratio
between the number of different words and the total number of words of a given
text. By considering as a signature of each author the exponent and the
amplitude of the power law, and the standard deviation of the lexical wealth,
it is possible to discriminate works of different genres and writers and show
that each writer has a very distinct signature, either considered among other
literary writers or compared with writers of non-literary texts. It is also
shown that, for a given author, the signature is able to discriminate between
short stories and novels.Comment: 27 pages, 10 tables,15 figures. Revised version accepted in Physica
- …