Search CORE

433 research outputs found

Web 2.0, language resources and standards to automatically build a multilingual named entity lexicon

Author: Ferrández Sergio
Monachini Monica
Muñoz Rafael
Toral Antonio
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 17/06/2011
Field of study

This paper proposes to advance in the current state-of-the-art of automatic Language Resource (LR) building by taking into consideration three elements: (i) the knowledge available in existing LRs, (ii) the vast amount of information available from the collaborative paradigm that has emerged from the Web 2.0 and (iii) the use of standards to improve interoperability. We present a case study in which a set of LRs for diﬀerent languages (WordNet for English and Spanish and Parole-Simple-Clips for Italian) are extended with Named Entities (NE) by exploiting Wikipedia and the aforementioned LRs. The practical result is a multilingual NE lexicon connected to these LRs and to two ontologies: SUMO and SIMPLE. Furthermore, the paper addresses an important problem which aﬀects the Computational Linguistics area in the present, interoperability, by making use of the ISO LMF standard to encode this lexicon. The diﬀerent steps of the procedure (mapping, disambiguation, extraction, NE identiﬁcation and postprocessing) are comprehensively explained and evaluated. The resulting resource contains 974,567, 137,583 and 125,806 NEs for English, Spanish and Italian respectively. Finally, in order to check the usefulness of the constructed resource, we apply it into a state-of-the-art Question Answering system and evaluate its impact; the NE lexicon improves the system’s accuracy by 28.1%. Compared to previous approaches to build NE repositories, the current proposal represents a step forward in terms of automation, language independence, amount of NEs acquired and richness of the information represented

DCU Online Research Access Service

Creating a bilingual dictionary of collocations: A learner-oriented approach

Author: Vaupot Sonia
Publication venue: 'Universitas Pendidikan Indonesia (UPI)'
Publication date: 31/01/2021
Field of study

Considering the lack of specialised dictionaries in certain fields, a creative way of teaching through corpora-based work was proposed in a seminar for master’s students of translation studies (University of Ljubljana, Slovenia). Since phraseology and terminology play an important role both in specialised translation and in the learning path of students of translation studies, this article presents an active approach aimed at creating an online lexicographic resource in languages for specific purposes by using the didactic tool and database ARTES (Aide à la Rédaction de TExtes Scientifiques/Dictionary-assisted writing tool for scientific communication) previously developed at the Université de Paris (France). About thirty Slovene students enrolled in the first year of master’s study have been participating in the bilateral project since 2018. The aims of such an activity are multiple: students learn in a practical way how to compile comparable corpora from the internet, using the online corpus software Sketch Engine, to find similar linguistic constructions in the source and target languages. They also learn to create an online bilingual phraseological and terminological dictionary to facilitate the translation of specialised texts. In this way, they acquire skills and develop some knowledge in translation, terminology, and discourse phraseology. The article first describes the ARTES online database. Then, we present the teaching methodology and the students’ work, which consists of compiling corpora, extracting and translating collocations for the language pair French-Slovene, and entering them in the ARTES database. Finally, we propose an analysis of the most frequent collocation structures in both languages. The language pair considered here is French and Slovene, but the methodology can be applied to any other language pair

Indonesian Journal of Applied Linguistics

Knowledge Representation and WordNets

Author: Alexandra Gabriela Tudorache
Publication venue
Publication date
Field of study

Knowledge itself is a representation of “real facts”. Knowledge is a logical model that presents facts from “the real world” witch can be expressed in a formal language. Representation means the construction of a model of some part of reality. Knowledge representation is contingent to both cognitive science and artificial intelligence. In cognitive science it expresses the way people store and process the information. In the AI field the goal is to store knowledge in such way that permits intelligent programs to represent information as nearly as possible to human intelligence. Knowledge Representation is referred to the formal representation of knowledge intended to be processed and stored by computers and to draw conclusions from this knowledge. Examples of applications are expert systems, machine translation systems, computer-aided maintenance systems and information retrieval systems (including database front-ends).knowledge, representation, ai models, databases, cams

Research Papers in Economics

Recommended from our members

Lexical Co-occurrence: The Missing Link

Author: Smadja Frank A.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/1989
Field of study

Aside from syntax, linguistic knowledge can be separated into two distinct parts, encyclopedic knowledge and dictionary knowledge. Encyclopedic knowledge describes the world whereas the dictionary describes individual word features, thus capturing lexical knowledge. Among the various types of lexical knowledge, one has generally been overlooked and should bring new results in computational linguistics: co-occurrence knowledge. Co-occurrence knowledge stands for the extent to which an item is specified by its environment independently of syntactic or semantic reasons. The basic concept is that of a lexical relation due to Saussure [49]. A lexical relation between two units of language stands for a correlation of common appearance of the two units in the utterances of the language

Columbia University Academic Commons

From GLÀFF to PsychoGLÀFF: a large psycholinguistics-oriented French lexical resource

Author: Calderone Basilio
Hathout Nabil
Sajous Franck
Publication venue: HAL CCSD
Publication date: 01/01/2014
Field of study

International audienceIn this paper, we present two French lexical resources, GLÀFF and PsychoGLÀFF. The former, automatically extracted from the collaborative online dictionary Wiktionary, is a large-scale versatile lexicon exploitable in Natural Language Processing applications and linguistic studies. The latter, based on GLÀFF, is a lexicon specifically designed for psycholinguistic research. GLÀFF, counting more than 1.4 million entries, features an unprecedented size. It reports lemmas, main syntactic categories, inflectional features and phonemic transcriptions. PsychoGLÀFF contains additional information related to formal aspects of the lexicon and its distribution. It contains about 340,000 entries (120,000 lemmas) that are corpora-attested. We explain how the resources have been created and compare them to other known resources in terms of coverage and quality. Regarding PsychoGLÀFF, the comparison shows that it has an exceptionally large repertoire while having a comparable quality

Scientific Publications of the University of Toulouse II Le Mirail

HAL Descartes

The concept of 'Simultaneous feedback': towards a new methodology for compiling dictionaries

Author: de Schryver Gilles-Maurice
Prinsloo DJ
Publication venue
Publication date: 01/01/2000
Field of study

Ghent University Academic Bibliography

Lexikos at eighteen: an analysis

Author: de Schryver Gilles-Maurice
Publication venue
Publication date: 01/01/2009
Field of study

At eighteen, Lexikos became a major player in the field of linguistics, by being awarded an Impact Factor. This article presents a double analysis of the foundation that led to this success. On the one hand a thorough statistical study is undertaken with regard to all contributors and their contributions to Lexikos. To this end a metadata database was designed, with the aim to answer the question: 'Who publishes what type of material from where and when?' On the other hand a content analysis is carried out which focuses on the actual topics (i.e. 'keywords') in Lexikos. To this end an all-inclusive text corpus containing all the Lexikos material was built, with the aim to answer the question: 'What are the major trends in Lexikos?

Ghent University Academic Bibliography

Archivsystem Ask23

Automatic construction of lexical typological Questionnaires

Author: Paperno Denis
Ryzhova Daria
Publication venue: University of Hawai'i Press
Publication date: 01/01/2019
Field of study

Questionnaires constitute a crucial tool in linguistic typology and language description. By nature, a Questionnaire is both an instrument and a result of typological work: its purpose is to help the study of a particular phenomenon cross-linguistically or in a particular language, but the creation of a Questionnaire is in turn based on the analysis of cross-linguistic data. We attempt to alleviate linguists’ work by constructing lexical Questionnaires automatically prior to any manual analysis. A convenient Questionnaire format for revealing fine-grained semantic distinctions includes pairings of words with diagnostic contexts that trigger different lexicalizations across languages. Our method to construct this type of a Questionnaire relies on distributional vector representations of words and phrases which serve as input to a clustering algorithm. As an output, our system produces a compact prototype Questionnaire for crosslinguistic exploration of contextual equivalents of lexical items, with groups of three homogeneous contexts illustrating each usage. We provide examples of automatically generated Questionnaires based on 100 frequent adjectives of Russian, including veselyj ‘funny’, ploxoj ‘bad’, dobryj ‘kind’, bystryj ‘quick’, ogromnyj ‘huge’, krasnyj ‘red’, byvšij ‘former’ etc. Quantitative and qualitative evaluation of the Questionnaires confirms the viability of our method.National Foreign Language Resource Cente

ScholarSpace at University of Hawai'i at Manoa

English-sourced direct and indirect borrowings in a new lexicon of Polish Anglicisms

Author: Cierpich-Kozieł Agnieszka
Mańczak-Wohlfeld Elżbieta
Witalisz Alicja
Publication venue
Publication date: 01/01/2023
Field of study

In recent decades, Polish has experienced an unprecedented influx of English-sourced borrowings, both overt (loanwords) and covert (calques). This linguistic influence echoes the social, technological, environmental and ideological transformations, with these changes reflected in the Polish lexicon. The paper describes a lexicographic project aimed at updating the Słownik zapożyczeń angielskich w polszczyźnie (A Dictionary of Anglicisms in Polish) that was published in 2010. We discuss the theoretical assumptions, the content and the sources of the data for a new, corpus-based dictionary that is in the making, and illustrate the lexicographic solutions we adopted with regard to both well-established and the most recent direct and indirect Anglicisms. We also address the issue of the frequency and the usage of the latter in present-day Polish

Jagiellonian Univeristy Repository

A Language-Independent Approach to Extracting Derivational Relations from an Inflectional Lexicon

Author: Baranes Marion
Sagot Benoît
Publication venue: HAL CCSD
Publication date: 01/01/2014
Field of study

International audienceIn this paper, we describe and evaluate an unsupervised method for cquiring pairs of lexical entries belonging to the same morphological family, i.e., derivationally related words, starting from a purely inflectional lexicon. Our approach relies on transformation rules that relate lexical entries with the one another, and which are automatically extracted from the inflected lexicon based on surface form analogies and on part-of-speech information. It is generic enough to be applied to any language with a mainly concatenative derivational morphology. Results were obtained and evaluated on English, French, German and Spanish. Precision results are satisfying, and our French results favorably compare with another resource, although its construction relied on manually developed lexicographic information whereas our approach only requires an inflectional lexicon

CiteSeerX

INRIA a CCSD electronic archive server

Hal-Diderot