Search CORE

10 research outputs found

Training and Scaling Preference Functions for Disambiguation

Author: Alshawi Hiyan
Carter David
Publication venue
Publication date: 01/01/1994
Field of study

We present an automatic method for weighting the contributions of preference functions used in disambiguation. Initial scaling factors are derived as the solution to a least-squares minimization problem, and improvements are then made by hill-climbing. The method is applied to disambiguating sentences in the ATIS (Air Travel Information System) corpus, and the performance of the resulting scaling factors is compared with hand-tuned factors. We then focus on one class of preference function, those based on semantic lexical collocations. Experimental results are presented showing that such functions vary considerably in selecting correct analyses. In particular we define a function that performs significantly better than ones based on mutual information and likelihood ratios of lexical associations.Comment: To appear in Computational Linguistics (probably volume 20, December 94). LaTeX, 21 page

arXiv.org e-Print Archive

CiteSeerX

Lexical typology : a programmatic sketch

Author: Behrens Leila
Sasse Hans-Jürgen
Publication venue
Publication date: 01/01/1997
Field of study

The present paper is an attempt to lay the foundation for Lexical Typology as a new kind of linguistic typology.1 The goal of Lexical Typology is to investigate crosslinguistically significant patterns of interaction between lexicon and grammar

Hochschulschriftenserver - Universität Frankfurt am Main

Kamusi ya Kiswahili sanifu in test:: A computer system for analyzing dictionaries and for retrieving lexical data.

Author: Horskainen Arvi
Publication venue
Publication date: 01/01/1994
Field of study

The paper describes a computer system for testing the coherence and adequacy of dictionaries. The system suits also well for retiieving lexical material in context from computerized text archives Results are presented from a series of tests made with Kamusi ya Kiswahlli Sanifu (KKS), a monolingual Swahili dictionary.. The test of the intemal coherence of KKS shows that the text itself contains several hundreds of such words, for which there is no entry in the dictionary. Examples and frequency numbers of the most often occurring words are given The adequacy of KKS was also tested with a corpus of nearly one million words, and it was found out that 1.32% of words in book texts were not recognized by KKS, and with newspaper texts the amount was 2.24% The higher number in newspaper texts is partly due to numerous names occurring in news articles Some statistical results are given on frequencies of wordforms not recognized by KKS The tests shows that although KKS covers the modern vocabulary quite well, there are several ru·eas where the dictionary should be improved The internal coherence is far from satisfactory, and there are more than a thousand such rather common words in prose text which rue not included into KKS The system described in this article is au effective tool for `detecting problems and for retrieving lexical data in context for missing words

Qucosa - Publikationsserver der Universität Leipzig

Multilingual collocation extraction with a syntactic parser

Author: Seretan Violeta
Wehrli Eric
Publication venue
Publication date: 18/06/2018
Field of study

An impressive amount of work was devoted over the past few decades to collocation extraction. The state of the art shows that there is a sustained interest in the morphosyntactic preprocessing of texts in order to better identify candidate expressions; however, the treatment performed is, in most cases, limited (lemmatization, POS-tagging, or shallow parsing). This article presents a collocation extraction system based on the full parsing of source corpora, which supports four languages: English, French, Spanish, and Italian. The performance of the system is compared against that of the standard mobile-window method. The evaluation experiment investigates several levels of the significance lists, uses a fine-grained annotation schema, and covers all the languages supported. Consistent results were obtained for these languages: parsing, even if imperfect, leads to a significant improvement in the quality of results, in terms of collocational precision (between 16.4 and 29.7%, depending on the language; 20.1% overall), MWE precision (between 19.9 and 35.8%; 26.1% overall), and grammatical precision (between 47.3 and 67.4%; 55.6% overall). This positive result bears a high importance, especially in the perspective of the subsequent integration of extraction results in other NLP application

RERO DOC Digital Library

A MWE Acquisition and Lexicon Builder Web Service

Author: Frontini Francesca
Quochi Valeria
Rubino Francesco
Publication venue
Publication date
Field of study

This paper describes the development of a web-service tool for the automatic extraction of Multi-word expressions lexicons, which has been integrated in a distributed platform for the automatic creation of linguistic resources. The main purpose of the work described is thus to provide a (computationally "light") tool that produces a full lexical resource: multi-word terms/items with relevant and useful attached information that can be used for more complex processing tasks and applications (e.g. parsing, MT, IE, query expansion, etc.). The output of our tool is a MW lexicon formatted and encoded in XML according to the Lexical Mark-up Framework. The tool is already functional and available as a service. Evaluation experiments show that the tool precision is of about 80%

PUblication MAnagement

El corpus como herramienta para la traducción especializada italiano/español: una experiencia con textos de la industria cosmética

Author: Flores Acuña Estefanía
Publication venue: 'Universidad de Sevilla - Secretariado de Recursos Audiovisuales y Nuevas Tecnologias'
Publication date: 01/01/2014
Field of study

En este trabajo se presentan algunas posibilidades del corpus como fuente de información documental, terminológica y textual para la traducción especializada italiano/español. Proporcionamos pautas para la compilación y explotación en clase de un corpus ad hoc con el objetivo de que el alumno aprenda a recopilar rápidamente documentación fiable que le ayude a afrontar con mayor seguridad y garantías de éxito un encargo de traducción italiano/ español en un campo específico como la cosmética. En esta combinación lingüística, además, la escasez de recursos lexicográficos impresos y electrónicos justifica aún más la necesidad de aprender a elaborar una herramienta flexible, de bajo coste y de gran valor para el traductor profesional.This article examines some uses of corpora as efficient sources of terminological, textual and conceptual information. By compiling an ad hoc specialised corpus, students will have access to a variety of documents which will allow them to translate from Italian into Spanish more confidently and more successfully, especially in a specialised area such as cosmetics. Unfortunately, lexicographic resources for the language combination Italian-Spanish are not as numerous as those including other languages , particularly when it comes to specialised lexicography. That is why this article highlights the need of designing a flexible, low-cost tool -yet essential- for professional translators.Universidad de Málaga PIE13-05

idUS. Depósito de Investigación Universidad de Sevilla

Words and their secrets

Author: Finatto Maria José Bocorny
Santos Diana
Publication venue
Publication date: 01/01/2010
Field of study

Repositório Comum

D6.1: Technologies and Tools for Lexical Acquisition

Author: Abrate Matteo
Bacciu Clara
Bel Nuria
Caselli Tommaso
Gavrilidou Maria
Korhonen Anna
Monachini Monica
Padr? Muntsa
Poibeau Thierry
Prokopidis Prokopis
Quochi Valeria
Revilla Eva
Rimell Laura
Tesconi Maurizio
Publication venue
Publication date
Field of study

This report describes the technologies and tools to be used for Lexical Acquisition in PANACEA. It includes descriptions of existing technologies and tools which can be built on and improved within PANACEA, as well as of new technologies and tools to be developed and integrated in PANACEA platform. The report also specifies the Lexical Resources to be produced. Four main areas of lexical acquisition are included: Subcategorization frames (SCFs), Selectional Preferences (SPs), Lexical-semantic Classes (LCs), for both nouns and verbs, and Multi-Word Expressions (MWEs)

PUblication MAnagement

Bahasa Indonesia Menjelang Tahun 2000

Author: Ekoputranti Rini Adiati
Mu'jizah Mu'jizah
Murniah Dad
Mustakim Mustakim
Puryadi Dedi
Qodratillah Meity Taqdir
Sasangka Sri Satriya Tjatur Wisnu
Zabadi Fairul
Publication venue: Pusat Pembinaan dan Pengembangan Bahasa
Publication date: 01/01/1998
Field of study

Repositori Institusi Kemendikbud