27,894 research outputs found
Normalization of Solitude: The Task to Be Done
The pandemic situation showed us, that the separation from other people is an important part of our lives and makes even more impact than we thought. Yet, the dominant picture of this separation is hugely negative and was so for a long time. Solitude, being nor positive nor negative on its own, just as being with others, is not something that people as a whole acknowledge in their lives – it is perceived as a state to endure, something that can be useful, but not something we should just live with, at least partially. This paper shows how important it is to create a new picture of solitude – as something normal, not exceptional. It indicates the task to create a new vocabulary around the phenomena of solitude, free of its negative connotations, which will enable us to incorporate the solitude back to our lives. I argue, that such change in vocabulary may enable us normalization of solitude and that such normalization should be our goal.The pandemic situation showed us, that separation from other people is an important part of our lives and makes even more impact than we thought. Yet, the dominant picture of this separation is hugely negative and was so for a long time. Solitude, being nor positive nor negative on its own, just as being with others, is not something that people as a whole acknowledge in their lives – it is perceived as a state to endure, something that can be useful, but not something we should just live with, at least partially. This paper shows how important it is to create a new picture of solitude – as something normal, not exceptional. It indicates the task to create a new vocabulary around the phenomenon of solitude, free of its negative connotations, which will enable us to incorporate solitude back into our lives. I argue, that such a change in vocabulary may enable us normalization of solitude and that such normalization should be our goal
Holaaa!! Writin like u talk is kewl but kinda hard 4 NLP
We present work in progress aiming to build tools for the normalization of User-Generated Content (UGC). As we will see, the task requires the revisiting of the initial steps of NLP processing, since UGC (micro-blog, blog, and, generally, Web 2.0 user texts) presents a number of non-standard communicative and linguistic characteristics, and is in fact much closer to oral and colloquial language than to edited text. We present and characterize a corpus of UGC text in Spanish from three different sources: Twitter, consumer reviews and blogs. We motivate the need for UGC text normalization by analyzing the problems found when processing this type of text through a conventional language processing pipeline, particularly in the tasks of lemmatization and morphosyntactic tagging, and finally we propose a strategy for automatically normalizing UGC using a selector of correct forms on top of a pre-existing spell-checker.Postprint (published version
Adapting Sequence to Sequence models for Text Normalization in Social Media
Social media offer an abundant source of valuable raw data, however informal
writing can quickly become a bottleneck for many natural language processing
(NLP) tasks. Off-the-shelf tools are usually trained on formal text and cannot
explicitly handle noise found in short online posts. Moreover, the variety of
frequently occurring linguistic variations presents several challenges, even
for humans who might not be able to comprehend the meaning of such posts,
especially when they contain slang and abbreviations. Text Normalization aims
to transform online user-generated text to a canonical form. Current text
normalization systems rely on string or phonetic similarity and classification
models that work on a local fashion. We argue that processing contextual
information is crucial for this task and introduce a social media text
normalization hybrid word-character attention-based encoder-decoder model that
can serve as a pre-processing step for NLP applications to adapt to noisy text
in social media. Our character-based component is trained on synthetic
adversarial examples that are designed to capture errors commonly found in
online user-generated text. Experiments show that our model surpasses neural
architectures designed for text normalization and achieves comparable
performance with state-of-the-art related work.Comment: Accepted at the 13th International AAAI Conference on Web and Social
Media (ICWSM 2019
Development of a speech recognition system for Spanish broadcast news
This paper reports on the development process of a speech recognition system for Spanish broadcast news within the MESH FP6 project. The system uses the SONIC recognizer developed at the Center for Spoken Language Research (CSLR), University of Colorado. Acoustic and language models were trained using Hub4 broadcast news data. Experiments and evaluation results are reported
A Large-Scale Comparison of Historical Text Normalization Systems
There is no consensus on the state-of-the-art approach to historical text
normalization. Many techniques have been proposed, including rule-based
methods, distance metrics, character-based statistical machine translation, and
neural encoder--decoder models, but studies have used different datasets,
different evaluation methods, and have come to different conclusions. This
paper presents the largest study of historical text normalization done so far.
We critically survey the existing literature and report experiments on eight
languages, comparing systems spanning all categories of proposed normalization
techniques, analysing the effect of training data quantity, and using different
evaluation methods. The datasets and scripts are made publicly available.Comment: Accepted at NAACL 201
- …