10 research outputs found

    Training and Scaling Preference Functions for Disambiguation

    Get PDF
    We present an automatic method for weighting the contributions of preference functions used in disambiguation. Initial scaling factors are derived as the solution to a least-squares minimization problem, and improvements are then made by hill-climbing. The method is applied to disambiguating sentences in the ATIS (Air Travel Information System) corpus, and the performance of the resulting scaling factors is compared with hand-tuned factors. We then focus on one class of preference function, those based on semantic lexical collocations. Experimental results are presented showing that such functions vary considerably in selecting correct analyses. In particular we define a function that performs significantly better than ones based on mutual information and likelihood ratios of lexical associations.Comment: To appear in Computational Linguistics (probably volume 20, December 94). LaTeX, 21 page

    Lexical typology : a programmatic sketch

    Get PDF
    The present paper is an attempt to lay the foundation for Lexical Typology as a new kind of linguistic typology.1 The goal of Lexical Typology is to investigate crosslinguistically significant patterns of interaction between lexicon and grammar

    Kamusi ya Kiswahili sanifu in test:: A computer system for analyzing dictionaries and for retrieving lexical data.

    Get PDF
    The paper describes a computer system for testing the coherence and adequacy of dictionaries. The system suits also well for retiieving lexical material in context from computerized text archives Results are presented from a series of tests made with Kamusi ya Kiswahlli Sanifu (KKS), a monolingual Swahili dictionary.. The test of the intemal coherence of KKS shows that the text itself contains several hundreds of such words, for which there is no entry in the dictionary. Examples and frequency numbers of the most often occurring words are given The adequacy of KKS was also tested with a corpus of nearly one million words, and it was found out that 1.32% of words in book texts were not recognized by KKS, and with newspaper texts the amount was 2.24% The higher number in newspaper texts is partly due to numerous names occurring in news articles Some statistical results are given on frequencies of wordforms not recognized by KKS The tests shows that although KKS covers the modern vocabulary quite well, there are several ru·eas where the dictionary should be improved The internal coherence is far from satisfactory, and there are more than a thousand such rather common words in prose text which rue not included into KKS The system described in this article is au effective tool for `detecting problems and for retrieving lexical data in context for missing words

    Multilingual collocation extraction with a syntactic parser

    Get PDF
    An impressive amount of work was devoted over the past few decades to collocation extraction. The state of the art shows that there is a sustained interest in the morphosyntactic preprocessing of texts in order to better identify candidate expressions; however, the treatment performed is, in most cases, limited (lemmatization, POS-tagging, or shallow parsing). This article presents a collocation extraction system based on the full parsing of source corpora, which supports four languages: English, French, Spanish, and Italian. The performance of the system is compared against that of the standard mobile-window method. The evaluation experiment investigates several levels of the significance lists, uses a fine-grained annotation schema, and covers all the languages supported. Consistent results were obtained for these languages: parsing, even if imperfect, leads to a significant improvement in the quality of results, in terms of collocational precision (between 16.4 and 29.7%, depending on the language; 20.1% overall), MWE precision (between 19.9 and 35.8%; 26.1% overall), and grammatical precision (between 47.3 and 67.4%; 55.6% overall). This positive result bears a high importance, especially in the perspective of the subsequent integration of extraction results in other NLP application

    A MWE Acquisition and Lexicon Builder Web Service

    Get PDF
    This paper describes the development of a web-service tool for the automatic extraction of Multi-word expressions lexicons, which has been integrated in a distributed platform for the automatic creation of linguistic resources. The main purpose of the work described is thus to provide a (computationally "light") tool that produces a full lexical resource: multi-word terms/items with relevant and useful attached information that can be used for more complex processing tasks and applications (e.g. parsing, MT, IE, query expansion, etc.). The output of our tool is a MW lexicon formatted and encoded in XML according to the Lexical Mark-up Framework. The tool is already functional and available as a service. Evaluation experiments show that the tool precision is of about 80%

    El corpus como herramienta para la traducción especializada italiano/español: una experiencia con textos de la industria cosmética

    Get PDF
    En este trabajo se presentan algunas posibilidades del corpus como fuente de información documental, terminológica y textual para la traducción especializada italiano/español. Proporcionamos pautas para la compilación y explotación en clase de un corpus ad hoc con el objetivo de que el alumno aprenda a recopilar rápidamente documentación fiable que le ayude a afrontar con mayor seguridad y garantías de éxito un encargo de traducción italiano/ español en un campo específico como la cosmética. En esta combinación lingüística, además, la escasez de recursos lexicográficos impresos y electrónicos justifica aún más la necesidad de aprender a elaborar una herramienta flexible, de bajo coste y de gran valor para el traductor profesional.This article examines some uses of corpora as efficient sources of terminological, textual and conceptual information. By compiling an ad hoc specialised corpus, students will have access to a variety of documents which will allow them to translate from Italian into Spanish more confidently and more successfully, especially in a specialised area such as cosmetics. Unfortunately, lexicographic resources for the language combination Italian-Spanish are not as numerous as those including other languages , particularly when it comes to specialised lexicography. That is why this article highlights the need of designing a flexible, low-cost tool -yet essential- for professional translators.Universidad de Málaga PIE13-05

    Words and their secrets

    Get PDF

    D6.1: Technologies and Tools for Lexical Acquisition

    Get PDF
    This report describes the technologies and tools to be used for Lexical Acquisition in PANACEA. It includes descriptions of existing technologies and tools which can be built on and improved within PANACEA, as well as of new technologies and tools to be developed and integrated in PANACEA platform. The report also specifies the Lexical Resources to be produced. Four main areas of lexical acquisition are included: Subcategorization frames (SCFs), Selectional Preferences (SPs), Lexical-semantic Classes (LCs), for both nouns and verbs, and Multi-Word Expressions (MWEs)
    corecore