221 research outputs found

    Arabic Query Expansion Using WordNet and Association Rules

    Get PDF
    Query expansion is the process of adding additional relevant terms to the original queries to improve the performance of information retrieval systems. However, previous studies showed that automatic query expansion using WordNet do not lead to an improvement in the performance. One of the main challenges of query expansion is the selection of appropriate terms. In this paper, we review this problem using Arabic WordNet and Association Rules within the context of Arabic Language. The results obtained confirmed that with an appropriate selection method, we are able to exploit Arabic WordNet to improve the retrieval performance. Our empirical results on a sub-corpus from the Xinhua collection showed that our automatic selection method has achieved a significant performance improvement in terms of MAP and recall and a better precision with the first top retrieved documents

    Language contact in West Africa

    Get PDF
    Non peer reviewe

    Towards a typology of predicative demonstratives

    Get PDF
    Although there has been growing interest in the study of demonstratives, a number of demonstrative categories remain largely unexplored. This article addresses one gap, presenting a preliminary typological overview of predicative demonstratives, a type of demonstrative used primarily in non-verbal predication constructions. The morphosyntax of predicative demonstratives is first briefly examined, followed by a typological characterization based primarily on semantic and morphosyntactic grounds. Predicative demonstratives focus on the immediately surrounding spatial, temporal, or textual environment of the speech act, showing restrictions on occurring in negated clauses or questions. In terms of lexical categorization, predicative demonstratives most commonly find themselves in a small closed word class of non-verbal predicators. Four types of predicative demonstratives are proposed here: Presentatives, identifiers, localizers, and the rare copular demonstratives.Peer reviewe

    Grammatical gender and linguistic complexity : Volume I: General issues and specific studies

    Get PDF
    Peer reviewe

    On Scripturology

    Get PDF
    In this contribution we present the principles and parameters of a discipline which remains—in our intended meaning—largely yet to be established: scripturology. This discipline concerns the study of different facets of writing, perceived in its generality, as the semiotic apparatus articulating language facts and spatial facts. We refer at the outset to the definition proposed in this volume: “script is a pluricode apparatus having a general usage within a situated human community; its plane..

    Grammatical gender and linguistic complexity, Volume 1

    Get PDF
    The many facets of grammatical gender remain one of the most fruitful areas of linguistic research, and pose fascinating questions about the origins and development of complexity in language. The present work is a two-volume collection of 13 chapters on the topic of grammatical gender seen through the prism of linguistic complexity. The contributions discuss what counts as complex and/or simple in grammatical gender systems, whether the distribution of gender systems across the world’s languages relates to the language ecology and social history of speech communities. This volume is complemented by volume two, which consists of three chapters providing diachronic and typological case studies, followed by a final chapter discussing old and new theoretical and empirical challenges in the study of the dynamics of gender complexity

    Reconstructing Syntax

    Get PDF
    Contributing to the vigorous discussion of the viability of syntactic reconstruction, this volume offers methods for identifying i) cognates in syntax, and ii) the directionality of syntactic change, thus providing historical syntacticians with evidence that syntactic reconstruction is indeed both theoretically and practically feasible.; Readership: This volume is of interest to all historical syntacticians and historial linguists, as well as to specialists within Indo-European, Semitic, Austronesian and native American languages

    Reconstructing Syntax

    Get PDF
    Contributing to the vigorous discussion of the viability of syntactic reconstruction, this volume offers methods for identifying i) cognates in syntax, and ii) the directionality of syntactic change, thus providing historical syntacticians with evidence that syntactic reconstruction is indeed both theoretically and practically feasible.; Readership: This volume is of interest to all historical syntacticians and historial linguists, as well as to specialists within Indo-European, Semitic, Austronesian and native American languages

    Dynamic language modeling for European Portuguese

    Get PDF
    Doutoramento em Engenharia InformáticaActualmente muitas das metodologias utilizadas para transcrição e indexação de transmissões noticiosas são baseadas em processos manuais. Com o processamento e transcrição deste tipo de dados os prestadores de serviços noticiosos procuram extrair informação semântica que permita a sua interpretação, sumarização, indexação e posterior disseminação selectiva. Pelo que, o desenvolvimento e implementação de técnicas automáticas para suporte deste tipo de tarefas têm suscitado ao longo dos últimos anos o interesse pela utilização de sistemas de reconhecimento automático de fala. Contudo, as especificidades que caracterizam este tipo de tarefas, nomeadamente a diversidade de tópicos presentes nos blocos de notícias, originam um elevado número de ocorrência de novas palavras não incluídas no vocabulário finito do sistema de reconhecimento, o que se traduz negativamente na qualidade das transcrições automáticas produzidas pelo mesmo. Para línguas altamente flexivas, como é o caso do Português Europeu, este problema torna-se ainda mais relevante. Para colmatar este tipo de problemas no sistema de reconhecimento, várias abordagens podem ser exploradas: a utilização de informações específicas de cada um dos blocos noticiosos a ser transcrito, como por exemplo os scripts previamente produzidos pelo pivot e restantes jornalistas, e outro tipo de fontes como notícias escritas diariamente disponibilizadas na Internet. Este trabalho engloba essencialmente três contribuições: um novo algoritmo para selecção e optimização do vocabulário, utilizando informação morfosintáctica de forma a compensar as diferenças linguísticas existentes entre os diferentes conjuntos de dados; uma metodologia diária para adaptação dinâmica e não supervisionada do modelo de linguagem, utilizando múltiplos passos de reconhecimento; metodologia para inclusão de novas palavras no vocabulário do sistema, mesmo em situações de não existência de dados de adaptação e sem necessidade re-estimação global do modelo de linguagem.Most of today methods for transcription and indexation of broadcast audio data are manual. Broadcasters process thousands hours of audio and video data on a daily basis, in order to transcribe that data, to extract semantic information, and to interpret and summarize the content of those documents. The development of automatic and efficient support for these manual tasks has been a great challenge and over the last decade there has been a growing interest in the usage of automatic speech recognition as a tool to provide automatic transcription and indexation of broadcast news and random and relevant access to large broadcast news databases. However, due to the common topic changing over time which characterizes this kind of tasks, the appearance of new events leads to high out-of-vocabulary (OOV) word rates and consequently to degradation of recognition performance. This is especially true for highly inflected languages like the European Portuguese language. Several innovative techniques can be exploited to reduce those errors. The use of news shows specific information, such as topic-based lexicons, pivot working script, and other sources such as the online written news daily available in the Internet can be added to the information sources employed by the automatic speech recognizer. In this thesis we are exploring the use of additional sources of information for vocabulary optimization and language model adaptation of a European Portuguese broadcast news transcription system. Hence, this thesis has 3 different main contributions: a novel approach for vocabulary selection using Part-Of-Speech (POS) tags to compensate for word usage differences across the various training corpora; language model adaptation frameworks performed on a daily basis for single-stage and multistage recognition approaches; a new method for inclusion of new words in the system vocabulary without the need of additional data or language model retraining

    Semantic and Contextual Knowledge Representation for Lexical Disambiguation: Case of Arabic-French Query Translation

    Get PDF
    We present in this paper, an automatic query translation system in cross-language information retrieval (Arabic-French). For the lexical disambiguation, our system combines between two resources: a bilingual dictionary and a parallel corpus. To select the best translation, our method is based on a correspondence measure between two semantic networks. The first one represents the senses of ambiguous terms of the query. The second one is a semantic network contextually enriched, representing the collection of sentences responding to the query. This collection forms the knowledge base of our disambiguation method and it is obtained by alignment with the relevant sentences in Arabic. The evaluation of the proposed system shows the advantage of the contextual enrichment on the quality of the translation. We obtained a high precision, relatively proportional to the precision provided by the used alignment. Finally, our translation demonstrates its potential by comparing its Bleu score with that of Google translate.</p
    corecore