47 research outputs found

    Multilingual knowledge resources for wide–coverage semantic processing

    Get PDF
    Este artículo presenta el resultado del estudio de un amplio conjunto de bases de conocimiento multilíngües actualmente disponibles que pueden ser de interés para un gran número de tareas de procesamiento semántico a gran escala. El estudio incluye una amplia gama de recursos derivados de forma manual y automática para el inglés y castellano. Con ello pretendemos mostrar una imagen clara de su estado actual. Para establecer una comparación justa y neutral, la calidad de cada recurso se ha evaluado indirectamente usando el mismo método en dos tareas de resolución de la ambigüedad semántica de las palabras (WSD, del inglés Word Sense Disambiguation). En concreto, las tareas de muestra léxica del inglés del Senseval-3.This report presents a wide survey of publicly available multilingual Knowledge Resources that could be of interest for wide–coverage semantic processing tasks. We also include an empirical evaluation in a multilingual scenario of the relative quality of some of these large-scale knowledge resources. The study includes a wide range of manually and automatically derived large-scale knowledge resources for English and Spanish. In order to establish a fair and neutral comparison, the quality of each knowledge resource is indirectly evaluated using the same method on a Word Sense Disambiguation task (Senseval-3 English Lexical Sample Task).Este trabajo ha sido parcialmente financiado por grupo IXA de la UPV/EHU y los proyectos KNOW (TIN2006-15049-C03-01) y ADIMEN (EHU06/113)

    SemEval-2007 Task 16: evaluation of wide coverage knowledge resources

    Get PDF
    This task tries to establish the relative quality of available semantic resources (derived by manual or automatic means). The quality of each large-scale knowledge resource is indirectly evaluated on a Word Sense Disambiguation task. In particular, we use Senseval-3 and SemEval-2007 English Lexical Sample tasks as evaluation bechmarks to evaluate the relative quality of each resource. Furthermore, trying to be as neutral as possible with respect the knowledge bases studied, we apply systematically the same disambiguation method to all the resources. A completely different behaviour is observed on both lexical data sets (Senseval-3 and SemEval-2007).Peer ReviewedPostprint (author’s final draft

    Multilingual evaluation of KnowNet

    Get PDF
    Este artículo presenta un nuevo método totalmente automático de construcción de bases de conocimiento muy densas y precisas a partir de recursos semánticos preexistentes. Básicamente, el método usa un algoritmo de Interpretación Semántica de las palabras preciso y de amplia cobertura para asignar el sentido mas apropiado a grandes conjuntos de palabras de un mismo tópico que han sido obtenidas de la web. KnowNet, la base de conocimiento resultante que conecta grandes conjuntos de conceptos semánticamente relacionados es un paso importante hacia la adquisición automática de conocimiento a partir de corpus. De hecho, KnowNet es varias veces mas grande que cualquier otro recurso de conocimiento disponible que codifique relaciones entre sentidos, y el conocimiento que KnowNet contiene supera cualquier otro recurso cuando es empíricamente evaluado en un marco multilingüe común. This paper presents a new fully automatic method for building highly dense and accurate knowledge bases from existing semantic resources. Basically, the method uses a wide-coverage and accurate knowledge-based Word Sense Disambiguation Algorithm to assign the most appropriate senses to large sets of topically related words acquired from the web. KnowNet, the resulting knowledge-base which connects large sets of semantically-related concepts is a major step towards the autonomous acquisition of knowledge from raw corpora. In fact, KnowNet is several times larger than any available knowledge resource encoding relations between synsets, and the knowledge KnowNet contains outperform any other resource when is empirically evaluated in a common multilingual framework.Peer ReviewedPostprint (published version

    KnowNet: A proposal for building highly connected and dense knowledge bases from the web

    Get PDF
    This paper presents a new fully automatic method for building highly dense and accurate knowledge bases from existing semantic resources. Basically, the method uses a wide-coverage and accurate nowledge-based Word Sense Disambiguation algorithm to assign the most appropriate senses to large sets of topically related words acquired from the web. KnowNet, the resulting knowledge-base which connects large sets of semantically-related concepts is a major step towards the autonomous acquisition of knowledge from raw corpora. In fact, KnowNet is several times larger than any available knowledge resource encoding relations between synsets, and the knowledge KnowNet contains outperform any other resource when is empirically evaluated in a common multilingual framework.Peer ReviewedPreprint (author's version

    Highlighting relevant concepts from Topic Signatures

    Get PDF
    This paper presents deepKnowNet, a new fully automatic method for building highly dense and accurate knowledge bases from existing semantic resources. Basically, the method applies a knowledge-based Word Sense Disambiguation algorithm to assign the most appropriate WordNet sense to large sets of topically related words acquired from the web, named TSWEB. This Word Sense Disambiguation algorithm is the personalized PageRank algorithm implemented in UKB. This new method improves by automatic means the current content of WordNet by creating large volumes of new and accurate semantic relations between synsets. KnowNet was our first attempt towards the acquisition of large volumes of semantic relations. However, KnowNet had some limitations that have been overcomed with deepKnowNet. deepKnowNet disambiguates the first hundred words of all Topic Signatures from the web (TSWEB). In this case, the method highlights the most relevant word senses of each Topic Signature and filter out the ones that are not so related to the topic. In fact, the knowledge it contains outperforms any other resource when is empirically evaluated in a common framework based on a similarity task annotated with human judgementsPostprint (published version

    Evaluating large-scale knowledge resources across languages

    Get PDF
    This paper presents an empirical evaluation in a multilingual scenario of the semantic knowledge present on publicly available large-scale knowledge resources. The study covers a wide range of manually and automatically derived large-scale knowledge resources for English and Spanish. In order to establish a fair and neutral comparison, the knowledge resources are evaluated using the same method on two Word Sense Disambiguation tasks (Senseval-3 English and Spanish Lexical Sample Tasks). First, this study empirically demonstrates that the combination of the knowledge contained in these resources surpass the most frequent sense classi er for English. Second, we also show that this large-scale topical knowledge acquired from one language can be successfully ported to other languages.Peer ReviewedPostprint (author’s final draft

    Etiquetado no supervisado de la polaridad de las palabras utilizando representaciones continuas de palabras

    Get PDF
    Sentiment analysis is the area of Natural Language Processing that aims to determine the polarity (positive, negative, neutral) contained in an opinionated text. A usual resource employed in many of these approaches are the so-called polarity lexicons. A polarity lexicon acts as a dictionary that assigns a sentiment polarity value to words. In this work we explore the possibility of automatically generating domain adapted polarity lexicons employing continuous word representations, in particular the popular tool Word2Vec. First we show a qualitative evaluation of a small set of words, and then we show our results in the SemEval-2015 task 12 using the presented method.El análisis de sentimiento es un campo del procesamiento del lenguaje natural que se encarga de determinar la polaridad (positiva, negativa, neutral) en los textos en los que se vierten opiniones. Un recurso habitual en los sistemas de análisis de sentimiento son los lexicones de polaridad. Un lexicón de polaridad es un diccionario que asigna un valor predeterminado de polaridad a una palabra. En este trabajo exploramos la posibilidad de generar de manera automática lexicones de polaridad adaptados a un dominio usando representaciones continuas de palabras, en concreto la popular herramienta Word2Vec. Primero mostramos una evaluación cualitativa de la polaridad sobre un pequeño conjunto de palabras, y después mostramos los resultados de nuestra competición en la tarea 12 del SemEval-2015 usando este método.This work has been supported by Vicomtech-IK4

    Bases de conocimiento multilíngües para el procesamiento semántico a gran escala

    Get PDF
    Este artículo presenta el resultado del estudio de un amplio conjunto de bases de conocimiento multilingües actualmente disponibles que pueden ser de interés para un gran número de tareas de procesamiento semántico a gran escala. El estudio incluye una amplia gama de recursos derivados de forma manual y automática para el inglés y castellano. Con ello pretendemos mostrar una imagen clara de su estado actual. Para establecer una comparación justa y neutral, la calidad de cada recurso se ha evaluado indirectamente usando el mismo método en dos tareas de resolución de la ambigüedad semántica de las palabras (WSD, del inglés Word Sense Disambiguation). En concreto, las tareas de muestra léxica del ingles del Senseval-3. --- This report presents a wide survey of publicly available multilingual Knowledge Resources that could be of interest for wide–coverage semantic processing tasks. We also include an empirical evaluation in a multilingual scenario of the relative quality of some of these large-scale knowledge resources. The study includes a wide range of manually and automatically derived large-scale knowledge resources for English and Spanish. In order to establish a fair and neutral comparison, the quality of each knowledge resource is indirectly evaluated using the same method on a Word Sense Disambiguation task (Senseval-3 English Lexical Sample Task).Peer ReviewedPostprint (published version

    Adquisición no supervisada de aspectos de un dominio para Minería de Opiniones Basada en Aspectos

    Get PDF
    The automatic analysis of opinions, which usually receives the name of opinion mining or sentiment analysis, has gained a great importance during the last decade. This is mainly due to the overgrown of online content in the Internet. The so-called aspect based opinion mining systems aim to detect the sentiment at “aspect” level (i.e. the precise feature being opinionated in a clause or sentence). In order to detect such aspects it is required some knowledge about the domain under analysis. The vocabulary in different domains may vary, and different words are interesting features in different domains. We aim to generate a list of domain related words and expressions from unlabeled domain texts, in a completely unsupervised way, as a first step to a more complex opinion mining system.El análisis automático de la opinión, que usualmente recibe el nombre minería de opinión o análisis del sentimiento, ha cobrado una gran importancia durante la última década. La minería de opinión basada en aspectos se centra en detectar el sentimiento con respecto a “aspectos” de la entidad examinada (i.e. características o partes concretas evaluadas en una sentencia). De cara a detectar dichos aspectos se requiere una cierta información sobre el dominio o temática del contenido analizado, ya que el vocabulario varía de un dominio a otro. El objetivo de este trabajo es generar de manera automática una lista de aspectos del dominio partiendo de un set de textos sin etiquetar, de manera completamente no supervisada, como primer paso para el desarrollo de un sistema más completo.This work has been partially funded by OpeNER (FP7-ICT-2011-SME-DCL-296451) and SKaTer (TIN2012-38584-C06-02)

    Word-sense disambiguated multilingual Wikipedia corpus

    Get PDF
    This article presents a new freely available trilingual corpus (Catalan, Spanish, English) that contains large portions of the Wikipedia and has been automatically enriched with linguistic information. To our knowledge, this is the largest such corpus that is freely available to the community: In its present version, it contains over 750 million words. The corpora have been annotated with lemma and part of speech information using the open source library FreeLing. Also, they have been sense annotated with the state of the art Word Sense Disambiguation algorithm UKB. As UKB assignsWordNet senses, andWordNet has been aligned across languages via the InterLingual Index, this sort of annotation opens the way to massive explorations in lexical semantics that were not possible before. We present a first attempt at creating a trilingual lexical resource from the sense-tagged Wikipedia corpora, namely, WikiNet. Moreover, we present two by-products of the project that are of use for the NLP community: An open source Java-based parser for Wikipedia pages developed for the construction of the corpus, and the integration of the WSD algorithm UKB in FreeLing.Peer ReviewedPostprint (published version
    corecore