27,894 research outputs found

    Normalization of Solitude: The Task to Be Done

    Get PDF
    The pandemic situation showed us, that the separation from other people is an important part of our lives and makes even more impact than we thought. Yet, the dominant picture of this separation is hugely negative and was so for a long time. Solitude, being nor positive nor negative on its own, just as being with others, is not something that people as a whole acknowledge in their lives – it is perceived as a state to endure, something that can be useful, but not something we should just live with, at least partially. This paper shows how important it is to create a new picture of solitude – as something normal, not exceptional. It indicates the task to create a new vocabulary around the phenomena of solitude, free of its negative connotations, which will enable us to incorporate the solitude back to our lives. I argue, that such change in vocabulary may enable us normalization of solitude and that such normalization should be our goal.The pandemic situation showed us, that separation from other people is an important part of our lives and makes even more impact than we thought. Yet, the dominant picture of this separation is hugely negative and was so for a long time. Solitude, being nor positive nor negative on its own, just as being with others, is not something that people as a whole acknowledge in their lives – it is perceived as a state to endure, something that can be useful, but not something we should just live with, at least partially. This paper shows how important it is to create a new picture of solitude – as something normal, not exceptional. It indicates the task to create a new vocabulary around the phenomenon of solitude, free of its negative connotations, which will enable us to incorporate solitude back into our lives. I argue, that such a change in vocabulary may enable us normalization of solitude and that such normalization should be our goal

    Holaaa!! Writin like u talk is kewl but kinda hard 4 NLP

    Get PDF
    We present work in progress aiming to build tools for the normalization of User-Generated Content (UGC). As we will see, the task requires the revisiting of the initial steps of NLP processing, since UGC (micro-blog, blog, and, generally, Web 2.0 user texts) presents a number of non-standard communicative and linguistic characteristics, and is in fact much closer to oral and colloquial language than to edited text. We present and characterize a corpus of UGC text in Spanish from three different sources: Twitter, consumer reviews and blogs. We motivate the need for UGC text normalization by analyzing the problems found when processing this type of text through a conventional language processing pipeline, particularly in the tasks of lemmatization and morphosyntactic tagging, and finally we propose a strategy for automatically normalizing UGC using a selector of correct forms on top of a pre-existing spell-checker.Postprint (published version

    Adapting Sequence to Sequence models for Text Normalization in Social Media

    Full text link
    Social media offer an abundant source of valuable raw data, however informal writing can quickly become a bottleneck for many natural language processing (NLP) tasks. Off-the-shelf tools are usually trained on formal text and cannot explicitly handle noise found in short online posts. Moreover, the variety of frequently occurring linguistic variations presents several challenges, even for humans who might not be able to comprehend the meaning of such posts, especially when they contain slang and abbreviations. Text Normalization aims to transform online user-generated text to a canonical form. Current text normalization systems rely on string or phonetic similarity and classification models that work on a local fashion. We argue that processing contextual information is crucial for this task and introduce a social media text normalization hybrid word-character attention-based encoder-decoder model that can serve as a pre-processing step for NLP applications to adapt to noisy text in social media. Our character-based component is trained on synthetic adversarial examples that are designed to capture errors commonly found in online user-generated text. Experiments show that our model surpasses neural architectures designed for text normalization and achieves comparable performance with state-of-the-art related work.Comment: Accepted at the 13th International AAAI Conference on Web and Social Media (ICWSM 2019

    Development of a speech recognition system for Spanish broadcast news

    Get PDF
    This paper reports on the development process of a speech recognition system for Spanish broadcast news within the MESH FP6 project. The system uses the SONIC recognizer developed at the Center for Spoken Language Research (CSLR), University of Colorado. Acoustic and language models were trained using Hub4 broadcast news data. Experiments and evaluation results are reported

    A Large-Scale Comparison of Historical Text Normalization Systems

    Get PDF
    There is no consensus on the state-of-the-art approach to historical text normalization. Many techniques have been proposed, including rule-based methods, distance metrics, character-based statistical machine translation, and neural encoder--decoder models, but studies have used different datasets, different evaluation methods, and have come to different conclusions. This paper presents the largest study of historical text normalization done so far. We critically survey the existing literature and report experiments on eight languages, comparing systems spanning all categories of proposed normalization techniques, analysing the effect of training data quantity, and using different evaluation methods. The datasets and scripts are made publicly available.Comment: Accepted at NAACL 201
    corecore