Search CORE

8,415 research outputs found

Workshop on Extracting and Using Constructions in Computational Linguistics

Author: Knutsson Ola
Sahlgren Magnus
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2010
Field of study

RISE – Research Institutes of Sweden

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database

Software institutes' Online Digital Archive

Evaluation of automatic hypernym extraction from technical corpora in English and Dutch

Author: Hoste Veronique
Lefever Els
Van de Kauter Marjan
Publication venue: European Language Resources Association (ELRA)
Publication date: 01/01/2014
Field of study

In this research, we evaluate different approaches for the automatic extraction of hypernym relations from English and Dutch technical text. The detected hypernym relations should enable us to semantically structure automatically obtained term lists from domain- and user-specific data. We investigated three different hypernymy extraction approaches for Dutch and English: a lexico-syntactic pattern-based approach, a distributional model and a morpho-syntactic method. To test the performance of the different approaches on domain-specific data, we collected and manually annotated English and Dutch data from two technical domains, viz. the dredging and financial domain. The experimental results show that especially the morpho-syntactic approach obtains good results for automatic hypernym extraction from technical and domain-specific texts

Ghent University Academic Bibliography

Part of Speech Tagging of Marathi Text Using Trigram Method

Author: Joshi Nisheeth
Mathur Iti
Singh Jyoti
Publication venue
Publication date: 01/04/2013
Field of study

In this paper we present a Marathi part of speech tagger. It is a morphologically rich language. It is spoken by the native people of Maharashtra. The general approach used for development of tagger is statistical using trigram Method. The main concept of trigram is to explore the most likely POS for a token based on given information of previous two tags by calculating probabilities to determine which is the best sequence of a tag. In this paper we show the development of the tagger. Moreover we have also shown the evaluation done

arXiv.org e-Print Archive

CogPrints Cognitive Sciences Eprint Archive

Identification of Fertile Translations in Medical Comparable Corpora: a Morpho-Compositional Approach

Author: Daille Béatrice
Delpech Estelle
Lemaire Claire
Morin Emmanuel
Publication venue
Publication date: 11/09/2012
Field of study

This paper defines a method for lexicon in the biomedical domain from comparable corpora. The method is based on compositional translation and exploits morpheme-level translation equivalences. It can generate translations for a large variety of morphologically constructed words and can also generate 'fertile' translations. We show that fertile translations increase the overall quality of the extracted lexicon for English to French translation

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

Danish Academic Vocabulary:Four studies on the words of academic written Danish

Author: Jakobsen Anne Sofie
Publication venue: Det Humanistiske Fakultet, Københavns Universitet
Publication date: 01/11/2018
Field of study

Copenhagen University Research Information System

Determination: a universal dimension for inter-language comparison : (preliminary version)

Author: Seiler Hansjakob
Publication venue
Publication date: 01/01/1976
Field of study

The basic idea I want to develop and to substantiate in this paper consists in replacing – where necessary – the traditional concept of linguistic category or linguistic relation understood as 'things', as reified hypostases, by the more dynamic concept of dimension. A dimension of language structure is not coterminous with one single category or relation but, instead, accommodates several of them. It corresponds to certain well circumscribed purposive functions of linguistic activity as well as to certain definite principles and techniques for satisfying these functions. The true universals of language are represented by these dimensions, principles, and techniques which constitute the true basis for non-historical inter-language comparison. The categories and relations used in grammar are condensations – hypostases as it were – of such dimensions, principles, and techniques. Elsewhere I have outlined the theory which I want to test here in a case study

Hochschulschriftenserver - Universität Frankfurt am Main

Macro- and microstructural issues in Mazuna lexicography

Author: Mavoungou Paul Achille
Publication venue
Publication date: 20/04/2009
Field of study

All the works in Mazuna lexicography have a common denominator: they are translation dictionaries biased towards French and were compiled by Catholic and Protestant missionaries or colonial administrators. These dictionaries have both strong and weak points. The macrostructure although it does not display features of sophistication, i.e. the use of niching and nesting procedures, tends to survey the full lexicon of the language which make these dictionaries real reservoirs of knowledge. The microstructure contains a lot of useful entries. However, no metalexicographic discussion is provided in the user's guide to make it accessible to the target reader. There are also some shortcomings especially in the areas of suprasegmental phonology (absence of tonal indications) and orthography.Tous les travaux en lexicographie Mazuna ont un dénominateur commun: ce sont des dictionnaires de traduction centrés sur le français et compilés par les missionnaires catholiques et protestants ou les administrateurs coloniaux. Ces dictionnaires ont à la fois des avantages et des inconvénients. Bien que ne présentant pas de caractéristiques de sophistication, par exemple l'usage de procédures de nichification et de nidification, la macrostructure tend à donner une vue d'ensemble du lexique de la langue, ce qui fait de ces dictionnaires de véritables réservoirs de connaissance. La microstructure contient de nombreuses entrées utiles. Mais aucune discussion métalexicographique n'est présentée dans le guide aux usagers pour les leurs rendre accessible. Il y a également des manquements, spécialement dans le domaine de la phonologie suprasegmentale (absence d'indications tonales) et de l'orthographe

Hochschulschriftenserver - Universität Frankfurt am Main

Automatic Discovery of Non-Compositional Compounds in Parallel Data

Author: Melamed I. Dan
Publication venue
Publication date: 01/01/1997
Field of study

Automatic segmentation of text into minimal content-bearing units is an unsolved problem even for languages like English. Spaces between words offer an easy first approximation, but this approximation is not good enough for machine translation (MT), where many word sequences are not translated word-for-word. This paper presents an efficient automatic method for discovering sequences of words that are translated as a unit. The method proceeds by comparing pairs of statistical translation models induced from parallel texts in two languages. It can discover hundreds of non-compositional compounds on each iteration, and constructs longer compounds out of shorter ones. Objective evaluation on a simple machine translation task has shown the method's potential to improve the quality of MT output. The method makes few assumptions about the data, so it can be applied to parallel data other than parallel texts, such as word spellings and pronunciations.Comment: 12 pages; uses natbib.sty, here.st

arXiv.org e-Print Archive

CiteSeerX

Using distributional similarity to organise biomedical terminology

Author: Dowdall James
Keller Bill
Schneider Gerold
Weeds Julie
Weir David
Publication venue: 'John Benjamins Publishing Company'
Publication date: 01/01/2005
Field of study

We investigate an application of distributional similarity techniques to the problem of structural organisation of biomedical terminology. Our application domain is the relatively small GENIA corpus. Using terms that have been accurately marked-up by hand within the corpus, we consider the problem of automatically determining semantic proximity. Terminological units are dened for our purposes as normalised classes of individual terms. Syntactic analysis of the corpus data is carried out using the Pro3Gres parser and provides the data required to calculate distributional similarity using a variety of dierent measures. Evaluation is performed against a hand-crafted gold standard for this domain in the form of the GENIA ontology. We show that distributional similarity can be used to predict semantic type with a good degree of accuracy

ZORA

Sussex Research Online