6 research outputs found

    A framework for lexical representation

    Full text link
    In this paper we present a unification-based lexical platform designed for highly inflected languages (like Roman ones). A formalism is proposed for encoding a lemma-based lexical source, well suited for linguistic generalizations. From this source, we automatically generate an allomorph indexed dictionary, adequate for efficient processing. A set of software tools have been implemented around this formalism: access libraries, morphological processors, etc.Comment: 9 page

    GRAMPAL: A Morphological Processor for Spanish implemented in Prolog

    Full text link
    A model for the full treatment of Spanish inflection for verbs, nouns and adjectives is presented. This model is based on feature unification and it relies upon a lexicon of allomorphs both for stems and morphemes. Word forms are built by the concatenation of allomorphs by means of special contextual features. We make use of standard Definite Clause Grammars (DCG) included in most Prolog implementations, instead of the typical finite-state approach. This allows us to take advantage of the declarativity and bidirectionality of Logic Programming for NLP. The most salient feature of this approach is simplicity: A really straightforward rule and lexical components. We have developed a very simple model for complex phenomena. Declarativity, bidirectionality, consistency and completeness of the model are discussed: all and only correct word forms are analysed or generated, even alternative ones and gaps in paradigms are preserved. A Prolog implementation has been developed for both analysis and generation of Spanish word forms. It consists of only six DCG rules, because our {\em lexicalist\/} approach --i.e. most information is in the dictionary. Although it is quite efficient, the current implementation could be improved for analysis by using the non logical features of Prolog, especially in word segmentation and dictionary access.Comment: 11 page

    The ARIES toolbox: a continuing R+D effort

    Full text link
    The effort under the ARIES toolbox spans through the last six years. The core of the toolbox is its lexical platform, including a large Spanish lexicon, lexical maintenance and access tools and morphological analyser/generator. Upon this platform a set of tools have been implemented, including tokenizers, spell checker, unification-based parser and grammar, stochastic and neural morphosyntactic taggers, etc. On the side of applications, the current work is oriented towards offering networking linguistic services for the publishing industry

    Una propuesta y un etiquetador de codificación morfosintáctica para corpus de referencia en lengua española

    Get PDF
    Este trabajo presenta una propuesta de codificación morfosintáctica para corpus de referencia en lengua española basada en los estándares de la Text Encoding Initiative (TEI), The Network of European Reference Corpora (NERC) y The Expert Advisory Group on Language Engineering Standards (EAGLES) tal y como se presenta en (Martín de Santa Olalla, 1994). Presentamos también el trabajo de creación de etiquetador morfosintáctico que utiliza el conjunto de etiquetas que ésta contiene

    Unsupervised Language Acquisition

    Full text link
    This thesis presents a computational theory of unsupervised language acquisition, precisely defining procedures for learning language from ordinary spoken or written utterances, with no explicit help from a teacher. The theory is based heavily on concepts borrowed from machine learning and statistical estimation. In particular, learning takes place by fitting a stochastic, generative model of language to the evidence. Much of the thesis is devoted to explaining conditions that must hold for this general learning strategy to arrive at linguistically desirable grammars. The thesis introduces a variety of technical innovations, among them a common representation for evidence and grammars, and a learning strategy that separates the ``content'' of linguistic parameters from their representation. Algorithms based on it suffer from few of the search problems that have plagued other computational approaches to language acquisition. The theory has been tested on problems of learning vocabularies and grammars from unsegmented text and continuous speech, and mappings between sound and representations of meaning. It performs extremely well on various objective criteria, acquiring knowledge that causes it to assign almost exactly the same structure to utterances as humans do. This work has application to data compression, language modeling, speech recognition, machine translation, information retrieval, and other tasks that rely on either structural or stochastic descriptions of language.Comment: PhD thesis, 133 page
    corecore