7 research outputs found

    Homograph Disambiguation Through Selective Diacritic Restoration

    Full text link
    Lexical ambiguity, a challenging phenomenon in all natural languages, is particularly prevalent for languages with diacritics that tend to be omitted in writing, such as Arabic. Omitting diacritics leads to an increase in the number of homographs: different words with the same spelling. Diacritic restoration could theoretically help disambiguate these words, but in practice, the increase in overall sparsity leads to performance degradation in NLP applications. In this paper, we propose approaches for automatically marking a subset of words for diacritic restoration, which leads to selective homograph disambiguation. Compared to full or no diacritic restoration, these approaches yield selectively-diacritized datasets that balance sparsity and lexical disambiguation. We evaluate the various selection strategies extrinsically on several downstream applications: neural machine translation, part-of-speech tagging, and semantic textual similarity. Our experiments on Arabic show promising results, where our devised strategies on selective diacritization lead to a more balanced and consistent performance in downstream applications.Comment: accepted in WANLP 201

    Efficient Convolutional Neural Networks for Diacritic Restoration

    Full text link
    Diacritic restoration has gained importance with the growing need for machines to understand written texts. The task is typically modeled as a sequence labeling problem and currently Bidirectional Long Short Term Memory (BiLSTM) models provide state-of-the-art results. Recently, Bai et al. (2018) show the advantages of Temporal Convolutional Neural Networks (TCN) over Recurrent Neural Networks (RNN) for sequence modeling in terms of performance and computational resources. As diacritic restoration benefits from both previous as well as subsequent timesteps, we further apply and evaluate a variant of TCN, Acausal TCN (A-TCN), which incorporates context from both directions (previous and future) rather than strictly incorporating previous context as in the case of TCN. A-TCN yields significant improvement over TCN for diacritization in three different languages: Arabic, Yoruba, and Vietnamese. Furthermore, A-TCN and BiLSTM have comparable performance, making A-TCN an efficient alternative over BiLSTM since convolutions can be trained in parallel. A-TCN is significantly faster than BiLSTM at inference time (270%-334% improvement in the amount of text diacritized per minute).Comment: accepted in EMNLP 201

    Neural Arabic Text Diacritization: State of the Art Results and a Novel Approach for Machine Translation

    Full text link
    In this work, we present several deep learning models for the automatic diacritization of Arabic text. Our models are built using two main approaches, viz. Feed-Forward Neural Network (FFNN) and Recurrent Neural Network (RNN), with several enhancements such as 100-hot encoding, embeddings, Conditional Random Field (CRF) and Block-Normalized Gradient (BNG). The models are tested on the only freely available benchmark dataset and the results show that our models are either better or on par with other models, which require language-dependent post-processing steps, unlike ours. Moreover, we show that diacritics in Arabic can be used to enhance the models of NLP tasks such as Machine Translation (MT) by proposing the Translation over Diacritization (ToD) approach.Comment: 18 pages, 17 figures, 14 table

    The Future of Information Sciences : INFuture2009 : Digital Resources and Knowledge Sharing

    Get PDF

    Ethno-Religious Conflict in Northern Nigeria: The Latency of Episodic Genocide

    Get PDF
    This dissertation explores the ethnic and religious dimensions of the northern Nigeria conflict in which gruesome killings have intermittently occurred, to determine whether there are genocidal inclinations to the episodic killings. The literature review provides the contextual framework for examining the conflict parties and causation factors to address the research questions: Are there genocidal inclinations to the ethno-religious conflict in northern Nigeria? To what extent does the interplay between ethnicity and religion help to foment and escalate the conflict in northern Nigeria? The study employs a mixed content analysis and grounded theory methodology based on the Strauss and Corbin (1990) approach. Data sourcing was from 197 newspaper articles on the conflict over the study period spanning from the 1966 northern Nigeria massacres of thousands of Ibos up to present, ongoing killings between Muslims and Christians or non-Muslims in the region. Available texts of the conflict cases over the research period were content-analyzed using Nvivo qualitative data analysis software involving processes of categorizing, coding and evaluation of the textual themes. The study structures a theoretical model for determining proclivity to genocide, and finds that there are genocidal inclinations to the northern Nigeria conflict, involving the specific intent to ‘cleanse’ the north through the exclusionary ideology of imposition of the Sharia law through enforced assimilation or extermination of Christians and other non-Muslims who do not assimilate or adopt the Muslim ideology. The study also suggests that there is latency in the recognition of these genocidal manifestations due to their episodic nature and intermittency of occurrence. he study provides further understanding of factors underlying and sustaining the violent conflict between Muslims and Christians in northern Nigeria. It contributes new perspectives and theoretical model for determining genocidal proclivity to the field of conflict analysis and resolution, and proffers alternative strategies for relationship building and peaceful coexistence among different religious groups. The findings will guide recommendations on policy formulations for eliminating religious intolerance in northern Nigeria. The study creates further awareness on the need for global intervention on the region’s sporadic killings to avert full blown Rwandan type genocide in Nigeria

    Dicionário de Biblioteconomia e Arquivologia

    Get PDF
    O objetivo deste dicionário é definir, de forma clara, sucinta e simples, os termos utilizados por bibliotecários, arquivistas e demais profissionais da ampla e multifacetada área de ciência da informação, facilitando a expansão de seus conhecimentos. O critério básico para inclusão de um termo foi seu uso potencial ao longo do exercício profissional desses especialistas. Em muitos verbetes foram incluídas abonações extraídas da literatura técnico-científica e de léxicos gerais e especializados. A tarefa de compilação sistemática de terminologia é vital para o desenvolvimento de qualquer ramo técnico-científico, pois é impossível atingir clareza e precisão sem uniformidade na linguagem pelos praticantes da área. Amplo em seu escopo, com mais de quatro mil verbetes, o dicionário inclui não somente a terminologia das várias especializações dentro da biblioteconomia, arquivologia, documentação e estudos de informação, mas também os principais termos de direito autoral, editoração, comércio livreiro, artes gráficas, história do livro, bibliografia, comunicação científica, telecomunicações e informática. Servirá, portanto, a bibliotecários, arquivistas, editores, livreiros, estudantes, pesquisadores e demais profissionais que trabalham na coleta, armazenamento, processamento, recuperação e difusão da informação, em seu formato tradicional impresso ou em meio eletrônico. Colaborará também para atender às necessidades daqueles estudiosos que necessitam da terminologia técnica em inglês
    corecore