47 research outputs found

    Exploiting user-frequency information for mining regionalisms in Argentinian Spanish from Twitter

    Get PDF
    The task of detecting regionalisms (expressions or words used in certain regions) has traditionally relied on the use of questionnaires and surveys, heavily depending on the expertise and intuition of the surveyor. The emergence of social media and microblogging services has produced an unprecedented wealth of content (mainly informal text generated by users), opening new opportunities for linguists to extend their studies of language variation. Previous work on the automatic detection of regionalisms depended mostly on word frequencies. In this work, we present a novel metric based on Information Theory that incorporates user frequency. We tested this metric on a corpus of Argentinian Spanish tweets in two ways: via manual annotation of the relevance of the retrieved terms, and also as a feature selection method for geolocation of users. In either case, our metric outperformed other techniques based on word frequency, suggesting that measuring the amount of users that use a word is an informative feature. This tool has helped lexicographers discover several unregistered words of Argentinian Spanish, as well as di erent meanings assigned to registered words

    Neural Sequence Labeling on Social Media Text

    Get PDF
    As social media (SM) brings opportunities to study societies across the world, it also brings a variety of challenges to automate the processing of SM language. In particular, most of the textual content in SM is considered noisy; it does not always stick to the rules of the written language, and it tends to have misspellings, arbitrary abbreviations, orthographic inconsistencies, and flexible grammar. Additionally, SM platforms provide a unique space for multilingual content. This polyglot environment requires modern systems to adapt to a diverse range of languages, imposing another linguistic barrier to processing and understanding of text from SM domains. This thesis aims at providing novel sequence labeling approaches to handle noise and linguistic code-switching (i.e., the alternation of languages in the same utterance) in SM text. In particular, the first part of this thesis focuses on named entity recognition for English SM text, where I propose linguistically-inspired methods to address phonological writing and flexible syntax. Besides, I investigate whether the performance of current state-of-the-art models relies on memorization or contextual generalization of entities. In the second part of this thesis, I focus on three sequence labeling tasks for code-switched SM text: language identification, part-of-speech tagging, and named entity recognition. Specifically, I propose transfer learning methods from state-of-the-art monolingual and multilingual models, such as ELMo and BERT, to the code-switching setting for sequence labeling. These methods reduce the demand for code-switching annotations and resources while exploiting multilingual knowledge from large pre-trained unsupervised models. The methods presented in this thesis are meant to benefit higher-level NLP applications oriented to social media domains, including but not limited to question-answering, conversational systems, and information extraction

    QUEER APPALACHIA: TOWARD GEOGRAPHIES OF POSSIBILITY

    Get PDF
    Stereotypes about Appalachia abound through dubious and reductive representations of the ‘hillbilly’ icon. Sexuality and how it functions in Appalachia is usually cast from the outside as wild, violent, bestial, incestuous and generally base. Movies such as Deliverance and television shows such as The Beverly Hillbillies and The Dukes of Hazard render images of Appalachian sexuality as hyper-sexual, both naive and violent. These images of Appalachian sexual ignorance and violence that permeate popular culture have had problematic and reductive implications for rural gay/trans Appalachian folk. Mainstream gay culture has often used the perceived meanings of these images to circumscribe and foreclose upon the possibility of rural queer life, rendering the rural as monolithically homophobic and impenetrable. This research attempts to destabilize this perspective and critique the impulse for mainstream gay culture to further marginalize rural gay/trans folk in Appalachia. The project reveals the possibility for rural queer life to exist in Appalachia to show not only its presence, but also its varying forms of visibility. To do this, experimental methodologies are employed, drawing on autoethnography that have located my body as an active participant and research object in one particular Appalachian queer geography. By actively participating in a rural queer network, the possibility for Appalachian queer geographies to exist in ways that surpass popular representations emerge in a way that force us to renegotiate our understandings of homophobia and what sets its conditions. This project begins to uncover and theorize the ways in which kinship as a ‘social technology’ mitigates social strangeness and operates as a means for social protection and intimacy within rural queer populations. This research is presented in a way that neither dismisses nor emphasizes homophobic violence, but rather argues the imperative for strong political advocacy that recognizes both the struggles and accomplishments of rural gay/trans folk. Three interlinked approaches are used to highlight these possibilities and foreclosures: the exterior representation of Appalachian sexuality in American metropolitan gay cultures and its politico-cultural effects on rural gay/trans folk, a more nuanced interpretation of homophobia in Appalachia, and how ‘place’ is made through the operation of rural queer networks

    Lexicography of coronavirus-related neologisms

    Get PDF
    This volume brings together contributions by international experts reflecting on Covid19-related neologisms and their lexicographic processing and representation. The papers analyze new words, new meanings of existing words, and new multiword units, where they come from, how they are transmitted (or differ) across languages, and how their use and meaning are reflected in dictionaries of all sorts. Recent trends in as many as ten languages are considered, including general and specialized language, monolingual as well as bilingual and printed as well as online dictionaries

    Lexicography of Coronavirus-related Neologisms

    Get PDF
    This volume brings together contributions by international experts reflecting on Covid19-related neologisms and their lexicographic processing and representation. The papers analyze new words, new meanings of existing words, and new multiword units in as many as ten languages, considering both specialized and general language, monolingual as well as bilingual and printed as well as online dictionaries

    The Urban Digital Platform:Unravelling Alternative Spatial Patterns

    Get PDF

    The Urban Digital Platform:Unravelling Alternative Spatial Patterns

    Get PDF

    Narrative motion on the two-dimensional plane: the “video-ization” of photography and characterization of reality

    Get PDF
    "Art is not truth. Art is a lie that enables us to recognize truth" Pablo Picasso Time, as known to many, is an indispensable component of photography. Period(s) included in “single” photographs are usually and naturally much shorter than periods documented in video works. Yet, when it comes to combining photos taken at different times on one photographical surface, it becomes possible to see remnants of longer periods of time. Whatever method you use, the many traces left by different moments, lead to the positive notion of timelessness (lack of time dependence) due to the plural presences of time at once. This concept of timelessness sometimes carries the content of the photo to anonymity, the substance becomes multi-layered and hierarchy disappears. This paper focuses on creating photographical narratives within the two-dimensional world. The possibility of working in layers with transparency within the computer environment enables us to overlay succession of moments seized from time on top of each other, in order to create a storyline spread in time that is otherwise not possible to express in a single photograph, unless properly staged. Truth with the capital T is not taken as the departure point in this article; on the contrary, personal delineations of temporary yet experienced smaller realities is suggested
    corecore