192 research outputs found

    A semi-automatic approach to identifying and unifying ambiguously encoded Arabic-based characters.

    Get PDF
    In this study, we outline a potential problem in normalising texts that are based on a modified version of the Arabic alphabet. One of the main resources available for processing resource-scarce languages is raw text collected from the Internet. Many less-resourced languages, such as Kurdish, Farsi, Urdu, Pashtu, etc., use a modified version of the Arabic writing system. Many characters in harvested data from the Internet may have exactly the same form but encoded with different Unicode values (ambiguous characters). The existence of ambiguous characters in words leads to word duplication, thus it is important to identify and unify ambiguous characters during the normalisation stage. Here, we demonstrate cases related to ambiguous Kurdish and Farsi characters and propose a semi-automatic approach to identifying and unifying them

    A Semi-automatic Approach to Identifying and Unifying Ambiguously Encoded Arabic-Based Characters

    Get PDF
    In this study, we outline a potential problem in normalising texts that are based on a modified version of the Arabic alphabet. One of the main resources available for processing resource-scarce languages is raw text collected from the Internet. Many less-resourced languages, such as Kurdish, Farsi, Urdu, Pashtu, etc., use a modified version of the Arabic writing system. Many characters in harvested data from the Internet may have exactly the same form but encoded with different Unicode values (ambiguous characters). The existence of ambiguous characters in words leads to word duplication, thus it is important to identify and unify ambiguous characters during the normalisation stage. Here, we demonstrate cases related to ambiguous Kurdish and Farsi characters and propose a semi-automatic approach to identifying and unifying them

    A Simple Approach to Unify Ambiguously Encoded Kurdish Characters

    Get PDF
    In this study we outline a potential problem in the normalisation stage of processing texts that are based on a modified version of the Arabic alphabet. The main source of resources available for processing resource-scarce languages is raw text. We have identified an interesting challenge that must be addressed when normalising certain natural language texts. Many lessresourced languages, such as Kurdish, Farsi, Urdu, Pashtu, etc., use a modified version of the Arabic writing system. Many characters in harvested data from the Internet may have exactly the same form but encoded with different Unicode values (ambiguous characters). It is important to identify ambiguous characters during the normalisation stage of most text processing tasks. We will demonstrate cases related to ambiguous Kurdish and Farsi characters and propose a semi-automatic approach to identifying and unifying ambiguously encoded characters

    Digital Classical Philology

    Get PDF
    The buzzwords “Information Society” and “Age of Access” suggest that information is now universally accessible without any form of hindrance. Indeed, the German constitution calls for all citizens to have open access to information. Yet in reality, there are multifarious hurdles to information access – whether physical, economic, intellectual, linguistic, political, or technical. Thus, while new methods and practices for making information accessible arise on a daily basis, we are nevertheless confronted by limitations to information access in various domains. This new book series assembles academics and professionals in various fields in order to illuminate the various dimensions of information's inaccessability. While the series discusses principles and techniques for transcending the hurdles to information access, it also addresses necessary boundaries to accessability.This book describes the state of the art of digital philology with a focus on ancient Greek and Latin. It addresses problems such as accessibility of information about Greek and Latin sources, data entry, collection and analysis of Classical texts and describes the fundamental role of libraries in building digital catalogs and developing machine-readable citation systems

    The Impact of Ideology on Lexical Borrowing in Arabic: A Synergy of Corpus Linguistics and CDA

    Get PDF
    Lexical borrowing is a natural outcome of language contact and one source of neologisms. The traditional view of lexical borrowing explains it as motivated mainly by lexical need or prestige where loans in the recipient language have more or less similar if not identical meanings with the borrowing language. Linguistic adaptation has been often seen grammatically based where grammarians or linguists assume the major task of nativizing foreign terms. This is typical in many studies on linguistic borrowing in Arabic while a secondary attention is given to semantic, sociolinguistic, and educational perspectives. The present study approached lexical borrowing as more language users’ task emphasizing their role in meaning construction. Three English loanwords in Arabic (agenda, liberal, lobby) were studied in naturally occurring language to see if their meanings and co-occurrence patterns correspond to their equivalents in English and, thus, agree with the notion of lexical need to linguistic borrowing. Some of the meanings of the loans fall under the domain of sociopolitics which is a fertile site believed to show ideological impact. Using two analytical frameworks of Sinclair (2005, 1998) and Van Dijk (2014, 2016b, 2016a), the three loanwords were investigated from corpus linguistics and CDA angles. The findings revealed different co-occurrence patterns in Arabic characterized by negative associations than in English. Negative associations were motivated by (religious, political, linguistic) ideological stances often implied in the connotations and attitudinal meanings of real language use. Ideological influence was also reproduced in Arabic dictionaries where some loanwords or their meanings are vi absent or excluded though used in formal settings. The connection between dictionary making and learning as influenced by dominant ideology was also explored

    Proceedings of the Eighth Italian Conference on Computational Linguistics CliC-it 2021

    Get PDF
    The eighth edition of the Italian Conference on Computational Linguistics (CLiC-it 2021) was held at Università degli Studi di Milano-Bicocca from 26th to 28th January 2022. After the edition of 2020, which was held in fully virtual mode due to the health emergency related to Covid-19, CLiC-it 2021 represented the first moment for the Italian research community of Computational Linguistics to meet in person after more than one year of full/partial lockdown

    K + K = 120 : Papers dedicated to László Kálmán and András Kornai on the occasion of their 60th birthdays

    Get PDF

    Affective Computing

    Get PDF
    This book provides an overview of state of the art research in Affective Computing. It presents new ideas, original results and practical experiences in this increasingly important research field. The book consists of 23 chapters categorized into four sections. Since one of the most important means of human communication is facial expression, the first section of this book (Chapters 1 to 7) presents a research on synthesis and recognition of facial expressions. Given that we not only use the face but also body movements to express ourselves, in the second section (Chapters 8 to 11) we present a research on perception and generation of emotional expressions by using full-body motions. The third section of the book (Chapters 12 to 16) presents computational models on emotion, as well as findings from neuroscience research. In the last section of the book (Chapters 17 to 22) we present applications related to affective computing

    Imagining Global Amsterdam

    Get PDF
    Imagining Global Amsterdam brings together new essays on the image of Amsterdam as articulated in film, literature, art, and urban discourse, considered within the context of globalization and its impact on urban culture. Subjects include: Amsterdam’s place in global cultural memory; expressions of global consciousness in Amsterdam in the ‘Golden Age’; articulations of Amsterdam as a tolerant, multicultural, and permissive ‘global village’; and globalization’s impact ‘on the ground’ through city branding, the cultural heritage industry, and cultural production in the city. Written by an interdisciplinary team of scholars, and united by a broad humanities approach, this collection forms a multifaceted inquiry into the dynamic relationship between Amsterdam, globalization, and the urban imaginary.Imagining Global Amsterdam gaat over het beeld van Amsterdam in film, literatuur, visuele kunst en in het moderne stedelijke discours, in het bijzonder in de context van de mondialisering. De essays gaan onder andere dieper in op Amsterdam als een lieu de mémoire van de vroeg-moderne wereldhandel. Wat betekent deze herinnering in de hedendaagse cultuur? Waarom verwijzen zo veel contemporaine films en romans naar dit verleden terug? Ook het (inter)nationale imago van Amsterdam als een multicultureel en ultra-tolerant ‘global village’ komt aan bod. Waarom is dit beeld zo persistent, en hoe heeft het zich in de loop van de laatste decennia ontwikkeld? Tot slot wordt ingegaan op de vraag hoe mondialiseringsprocessen ingrijpen in de stadscultuur, zoals in het prostitutiegebied op de Wallen en via de erfgoedindustrie. Hoe manifesteert de mondialisering zich in de stad, en welke rol speelt beeldvorming daarbij? Deze bundel vormt een rijk geschakeerd onderzoek naar de relatie tussen Amsterdam, mondialisering en stedelijke beeldvorming. Marco de Waard is als docent literatuurwetenschap verbonden aan het Amsterdam University College
    corecore