4 research outputs found

    Italian Lemmatization by Rules with GETARUNS

    Get PDF
    We present an approach to lemmatization based on exhaustive morphological analysis and use of external knowledge sources to help disambiguation which is the most relevant issue to cope with. Our system GETARUNS was not concerned with lemmatization directly and used morphological analysis only as backoff solution in case the word was not retrieved in the wordform dictionaries available. We found out that both the rules and the root dictionary needed amending. This was started during development and before testset was distributed, but not completed for lack of time. Thus the task final results only depict an incomplete system, which has now eventually come to a complete version with rather different outcome. We moved from 98.42 to 99.82 in the testset and from 99.82 to 99.91 in the devset. As said above, this is produced by rules and is not subject to statistical evaluation which may change according to different training sets. In this version of the paper we perform additional experiments with WordForm dictionaries of Italian freely available online

    Can humain association norm evaluate latent semantic analysis?

    Get PDF
    This paper presents the comparison of word association norm created by a psycholinguistic experiment to association lists generated by algorithms operating on text corpora. We compare lists generated by Church and Hanks algorithm and lists generated by LSA algorithm. An argument is presented on how those automatically generated lists reflect real semantic relations

    Implementaci贸n de un lematizador para una lengua de escasos recursos: caso shipibo-konibo

    Get PDF
    Desde que el Ministerio de Educaci贸n oficializ贸 el alfabeto shipibo-konibo, existe la necesidad de generar una gran cantidad de documentos educativos y oficiales para los hablantes de esta lengua, los cuales solo se realizan actualmente mediante el apoyo de traductores o personas biling眉es. Sin embargo, en el campo de la ling眉铆stica computacional existen herramientas que permiten facilitar estas labores, como es el caso de un lematizador, el cual se encarga de obtener el lema o forma base de una palabra a partir de su forma flexionada. Su realizaci贸n se da com煤nmente mediante dos m茅todos: el uso de reglas morfol贸gicas y el uso de diccionarios. Debido a esto, este proyecto tiene como objetivo principal desarrollar una herramienta de lematizaci贸n para el shipibo-konibo usando un corpus de palabras, la cual se base en los est谩ndares de anotaci贸n utilizados en otras lenguas, y que sea f谩cil de utilizar mediante una librer铆a de funciones y un servicio web. Esta herramienta final se realiz贸 utilizando principalmente el m茅todo de clasificaci贸n de los k-vecinos m谩s cercanos, el cual permite estimar la clase de un nuevo caso mediante la comparaci贸n de sus caracter铆sticas con las de casos previamente clasificados y dando como resultado la clase m谩s frecuente para valores similares. Finalmente, la herramienta de lematizaci贸n desarrollada logr贸 alcanzar una precisi贸n de 0.736 y de esta manera superar a herramientas utilizadas en otros idiomas.Tesi

    Italian Lemmatization by Rules with Getaruns

    No full text
    Abstract. We present an approach to lemmatization based on exhaustive morphological analysis and use of external knowledge sources to help disambiguation which is the most relevant issue to cope with. Our system GETARUNS was not concerned with lemmatization directly and used morphological analysis only as backoff solution in case the word was not retrieved in the wordform dictionaries available. We found out that both the rules and the root dictionary needed amending. This was started during development and before testset was distributed, but not completed for lack of time. Thus the task final results only depict an incomplete system, which has now eventually come to a complete version with rather different outcome. We moved from 98.42 to 99.82 in the testset and from 99.85 to 99.91 in the devset. As said above, this is produced by rules and is not subject to statistical evaluation which may change according to different training sets
    corecore