23 research outputs found

    Exploring the automatic selection of basic level concepts

    Get PDF
    We present a very simple method for selecting Base Level Concepts using basic structural properties of WordNet. We also empirically demonstrate that these automatically derived set of Base Level Concepts group senses into an adequate level of abstraction in order to perform class-based Word Sense Disambiguation. In fact a very naive Most Frequent classifier using the classes selected is able to perform a semantic tagging with accuracy figures over 75%.Union Europea bajo proyecto QALL-ME (FP6 IST-033860) y el Gobierno Español bajo el proyecto Text-Mess (TIN2006-15265-C06-01) y KNOW (TIN2006-15049-C03-01

    Word vs. Class-Based Word Sense Disambiguation

    Get PDF
    As empirically demonstrated by the Word Sense Disambiguation (WSD) tasks of the last SensEval/SemEval exercises, assigning the appropriate meaning to words in context has resisted all attempts to be successfully addressed. Many authors argue that one possible reason could be the use of inappropriate sets of word meanings. In particular, WordNet has been used as a de-facto standard repository of word meanings in most of these tasks. Thus, instead of using the word senses defined in WordNet, some approaches have derived semantic classes representing groups of word senses. However, the meanings represented by WordNet have been only used for WSD at a very fine-grained sense level or at a very coarse-grained semantic class level (also called SuperSenses). We suspect that an appropriate level of abstraction could be on between both levels. The contributions of this paper are manifold. First, we propose a simple method to automatically derive semantic classes at intermediate levels of abstraction covering all nominal and verbal WordNet meanings. Second, we empirically demonstrate that our automatically derived semantic classes outperform classical approaches based on word senses and more coarse-grained sense groupings. Third, we also demonstrate that our supervised WSD system benefits from using these new semantic classes as additional semantic features while reducing the amount of training examples. Finally, we also demonstrate the robustness of our supervised semantic class-based WSD system when tested on out of domain corpus.This work has been partially supported by the NewsReader project (ICT-2011-316404), the Spanish project SKaTer (TIN2012-38584-C06-02)

    Una aproximación a la desambiguación del sentido de las palabras basada en clases semánticas y aprendizaje automático

    Get PDF
    Tesis doctoral en Informática realizada por Rubén Izquierdo en la Universidad de Alicante (UA) bajo la dirección del Dr. Armando Suárez Cueto (UA) y del Dr. German Rigau Claramunt (EHU/UPV). El acto de defensa de la tesis tuvo lugar en Alicante el 17 de Septiembre de 2010 ante el tribunal formado por los doctores Manuel Palomar (UA), Paloma Moreda (UA), María Teresa Martín (UJA), Lluís Padró (UPC) e Irene Castellón (UB). La calificación obtenida fue Sobresaliente Cum Laude por unanimidad.Ph.D Thesis in Computer Science, specifically in the field of Computational Linguistics, written by Rubén Izquierdo at the University of Alicante (UA), under the supervision of Dr. Armando Suárez Cueto (UA) and Dr. German Rigau Claramunt (EHU/UPV). The author was examined on September 17th 2010, by a panel formed by Dr. Manuel Palomar (UA), Dr. Paloma Moreda (UA), Dr. María Teresa Martín (UJA), Dr. Lluís Padró (UPC) and Dr. Irene Castellón (UB). The grade obtained was Sobresaliente Cum Laude.Este trabajo ha sido co-financiado por el Ministerio de Ciencia e Innovación (proyecto TIN2009-13391-C04-01), y la Conselleria de Educación de la Generalitat Valenciana (proyectos PROMETEO/2009/119 y ACOMP/2011/001)

    A proposal of automatic selection of coarse-grained semantic classes for WSD

    Get PDF
    Presentamos un método muy simple para seleccionar conceptos base (Base Level Concepts) usando algunas propiedades estructurales básicas de WordNet. Demostramos empíricamente que el conjunto de Base Level Concepts obtenido agrupa sentidos de palabras en un nivel de abstracción adecuado para la desambiguación del sentido de las palabras basada en clases. De hecho, un sencillo clasificador basado en el sentido más frecuente usando las clases generadas, es capaz de alcanzar un acierto próximo a 75% para la tarea de etiquetado semántico.We present a very simple method for selecting Base Level Concepts using some basic structural properties of WordNet. We also empirically demonstrate that these automatically derived set of Base Level Concepts group senses into an adequate level of abstraction in order to perform class-based Word Sense Disambiguation. In fact, a very naive Most Frequent classifier using the classes selected is able to perform a semantic tagging with accuracy figures over 75%.This paper has been supported by the European Union under the project QALL-ME (FP6 IST-033860) and the Spanish Government under the project Text-Mess (TIN2006-15265-C06-01) and KNOW (TIN2006-15049-C03-01

    A probabilistic, text and knowledge-based image retrieval system

    Get PDF
    This paper describes the development of an image retrieval system that combines probabilistic and ontological information1. The process is divided in two different stages: indexing and retrieval. Three information flows have been created with different kind of information each one: word forms, stems and stemmed bigrams. The final result combines the results obtained in the three streams. Knowledge is added to the system by means of an ontology created automatically from the St. Andrews Corpus. The system has been evaluated at CLEF05 image retrieval task.This work has been partially supported by the Spanish Government (CICYT) with grant TIC2003-07158-c04-01

    Modelado de Categorías y Desambiguación del Sentido de las Palabras en el corpus Ancora

    No full text
    In this paper we present an approach to Word Sense Disambiguation based on Topic Modeling (LDA). Our approach consists of two different steps, where first a binary classifier is applied to decide whether the most frequent sense applies or not, and then another classifier deals with the non most frequent sense cases. An exhaustive evaluation is performed on the Spanish corpus Ancora, to analyze the performance of our two-step system and the impact of the context and the different parameters in the system. Our best experiment reaches an accuracy of 74.53, which is 6 points over the highest baseline. All the software developed for these experiments has been made freely available, to enable reproducibility and allow the re-usage of the software.En este artículo se presenta una aproximación a la Desambiguación del Sentido de las Palabras basada en Modelado de Categorías (LDA). Nuestra aproximación consiste en dos pasos diferenciados, donde primero un clasificador binario se ejecuta para decidir si la heurística del sentido más frecuente se debe aplicar, y posteriormente otro clasificador se encarga del resto de sentidos donde esta heurística no corresponde. Se ha realizado una evaluación exhaustiva en el corpus en español Ancora, para analizar el funcionamiento de nuestro sistema de dos pasos y el impacto del contexto y de diferentes parámetros en dicho sistema. Nuestro mejor experimento alcanza un acierto de 74.53, lo cual es 6 puntos superior al baseline más alto. Todo el software desarrollado para estos experimentos se ha puesto disponible libremente para permitir la reprodubilidad de los experimentos y la reutilización del software

    An user-centred ontology- and entailment-based Question Answering system

    Get PDF
    Este artículo presenta un sistema de Búsqueda de Respuestas basado en ontologías, implicación textual y requerimientos de usuario. Se propone una metodología para la construcción de una base de conocimiento de usuario que nos permite asociar preguntas en lenguaje natural con una representación formal de datos. El núcleo de nuestra estrategia se basa en la implicación textual, la cual permite detectar implicaciones entre preguntas y la base de conocimiento. El sistema ha sido desarrollado para el español y sobre el dominio de cine obteniendo unos resultados prometedores para su utilización en entornos reales.This paper presents an user-centred ontology- and entailment-based Question Answering system. A methodology is proposed in order to carry out the construction of the user knowledge database. This knowledge database allows us to fill the gap between natural language expressions and formal expressions such as database queries. The core of the system relies on an entailment engine capable of deducting inferences between queries and the knowledge database. The system has been developed for Spanish, covering the cinema domain and obtaining very promising results within real environments.Esta investigación ha sido parcialmente financiada bajo los proyectos QALL-ME, dentro del Sexto Programa Marco de Investigación de la Unión Europea con referencia FP6-IST-033860, y el Gobierno de España proyecto CICyT número TIN2006-15265-C06-01

    X-Not@rial : sistema de recuperación y extracción de información notarial

    Get PDF
    El sistema X-Not@rial realiza tareas de recuperación y extracción de información. Las tareas de extracción de información se realizan en el dominio notarial y más concretamente en la de las escrituras de compraventa. El sistema selecciona los documentos relacionados con escrituras de compraventa de una colección de textos heterogénea y posteriormente aplica las técnicas de extracción de información para identificar la información relevante.X-Not@rial system solves information retrieval and information extraction tasks. The information extraction tasks have been developed in deed domain. The system selects a subset of document related to deed documents. After thats, the information extraction techniques selects the relevant information

    Spanish all-words semantic class disambiguation using Cast3LB corpus

    Get PDF
    In this paper, an approach to semantic disambiguation based on machine learning and semantic classes for Spanish is presented. A critical issue in a corpus-based approach for Word Sense Disambiguation (WSD) is the lack of wide-coverage resources to automatically learn the linguistic information. In particular, all-words sense annotated corpora such as SemCor do not have enough examples for many senses when used in a machine learning method. Using semantic classes instead of senses allows to collect a larger number of examples for each class while polysemy is reduced, improving the accuracy of semantic disambiguation. Cast3LB, a SemCor-like corpus, manually annotated with Spanish WordNet 1.5 senses, has been used in this paper to perform semantic disambiguation based on several sets of classes: lexicographer files of WordNet, WordNet Domains, and SUMO ontology.This paper has been supported by the Spanish Government under projects CESS-ECE (HUM2004-21127-E) and R2D2 (TIC2003-07158-C04-01)

    Influencia de los estilos de aprendizaje en el uso de redes sociales para docencia [póster]

    Get PDF
    Póster presentado en las IX Jornadas de Redes de Investigación en Docencia Universitaria, Alicante, 16-17 junio 2011.Analizar los estilos de aprendizaje de los alumnos. Estudiar cómo interactúan los alumnos al usar redes sociales. Determinar cómo influye el estilo de aprendizaje en el éxito de las tareas colaborativas