Search CORE

19 research outputs found

Methodology and evaluation of the Galician WordNet expansion with the WN-Toolkit

Author: Gómez Guinovart Xavier
Oliver González Antoni
Publication venue: Procesamiento del Lenguaje Natural
Publication date: 01/01/2014
Field of study

In this paper the methodology and a detailed evaluation of the results of the expansion of the Galician WordNet using the WN-Toolkit are presented. This toolkit allows the creation and expansion of wordnets using the expand model. In our experiments we have used methodologies based on dictionaries and parallel corpora. The evaluation of the results has been performed both in an automatic and in a manual way, allowing a comparison of the precision values obtained with both evaluation procedures. The manual evaluation provides details about the source of the errors. This information has been very useful for the improvement of the toolkit and for the correction of some errors in the reference WordNet for Galician.En este artículo se presenta la metodología utilizada en la expansión del WordNet del gallego mediante el WN-Toolkit, así como una evaluación detallada de los resultados obtenidos. El conjunto de herramientas incluido en el WN-Toolkit permite la creación o expansión de wordnets siguiendo la estrategia de expansión. En los experimentos presentados en este artículo se han utilizado estrategias basadas en diccionarios y en corpus paralelos. La evaluación de los resultados se ha realizado de manera tanto automática como manual, permitiendo así la comparación de los valores de precisión obtenidos. La evaluación manual también detalla la fuente de los errores, lo que ha sido de utilidad tanto para mejorar el propio WN-Toolkit, como para corregir los errores del WordNet de referencia para el gallego.En aquest article es presenta la metodologia utilitzada en l'expansió del WordNet del gallec mitjançant el WN-Toolkit, així com una avaluació detallada dels resultats obtinguts. El conjunt d'eines inclòs en el WN-Toolkit permet la creació o expansió de wordnets seguint l'estratègia d'expansió. En els experiments presentats en aquest article s'han utilitzat estratègies basades en diccionaris i en corpus paral·lels. L'avaluació dels resultats s'ha realitzat de manera tant automàtica com a manual, permetent així la comparació dels valors de precisió obtinguts. L'avaluació manual també detalla la font dels errors, la qual cosa ha estat d'utilitat tant per millorar el propi WN-Toolkit, com per corregir els errors del WordNet de referència per al gallec

The Oberta in open access

Metodología y evaluación de la expansión del WordNet del gallego con WN-Toolkit

Author: Gómez Guinovart Xavier
Oliver González Antoni
Publication venue: Sociedad Española para el Procesamiento del Lenguaje Natural
Publication date: 01/01/2014
Field of study

In this paper the methodology and a detailed evaluation of the results of the expansion of the Galician WordNet using the WN-Toolkit are presented. This toolkit allows the creation and expansion of wordnets using the expand model. In our experiments we have used methodologies based on dictionaries and parallel corpora. The evaluation of the results has been performed both in an automatic and in a manual way, allowing a comparison of the precision values obtained with both evaluation procedures. The manual evaluation provides details about the source of the errors. This information has been very useful for the improvement of the toolkit and for the correction of some errors in the reference WordNet for Galician.En este artículo se presenta la metodología utilizada en la expansión del WordNet del gallego mediante el WN-Toolkit, así como una evaluación detallada de los resultados obtenidos. El conjunto de herramientas incluido en el WN-Toolkit permite la creación o expansión de wordnets siguiendo la estrategia de expansión. En los experimentos presentados en este artículo se han utilizado estrategias basadas en diccionarios y en corpus paralelos. La evaluación de los resultados se ha realizado de manera tanto automática como manual, permitiendo así la comparación de los valores de precisión obtenidos. La evaluación manual también detalla la fuente de los errores, lo que ha sido de utilidad tanto para mejorar el propio WN-Toolkit, como para corregir los errores del WordNet de referencia para el gallego.This research has been carried out thanks to the Project SKATeR (TIN2012-38584-C06-01 and TIN2012-38584-C06-04) supported by the Ministry of Economy and Competitiveness of the Spanish Government

Repositorio Institucional de la Universidad de Alicante

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Extending the Galician Wordnet Using a Multilingual Bible Through Lexical Alignment and Semantic Annotation

Author
Publication venue: OASIcs - OpenAccess Series in Informatics. 7th Symposium on Languages, Applications and Technologies (SLATE 2018)
Publication date: 01/01/2018
Field of study

In this paper we describe the methodology and evaluation of the expansion of Galnet - the Galician wordnet - using a multilingual Bible through lexical alignment and semantic annotation. For this experiment we used the Galician, Portuguese, Spanish, Catalan and English versions of the Bible. They were annotated with part-of-speech and WordNet sense using FreeLing. The resulting synsets were aligned, and new variants for the Galician language were extracted. After manual evaluation the approach presented a 96.8% accuracy

Dagstuhl Research Online Publication Server

Bootstrapping a Portuguese WordNet from Galician, Spanish and English wordnets

Author: A. Fernández Montraveta
A. Simões
A. Simões
A.M. Simões
E.G. Maziero
F.J. Och
G. Melo de
G.A. Miller
H. Gonçalo Oliveira
L. Padró
V.I. Levenshtein
X. Gómez Guinovart
X. Gómez Guinovart
X. Gómez Guinovart
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Series: Lecture notes in computer science, ISSN 0302-9743, vol. 8854In this article we exploit the possibility on bootstrapping an European Portuguese WordNet from the English, Spanish and Galician wordnets using Probabilistic Translation Dictionaries automatically created from parallel corpora. The process generated a total of 56~770 synsets and 97~058 variants. An evaluation of the results using the Brazilian OpenWordNet-PT as a gold standard resulted on a precision varying from 53\% to 75\% percent, depending on the cut-line. The results were satisfying and comparable to similar experiments using the WN-Toolkit.PEst-OE/EEI/UI0752/2014, TIN2012-38584-C06-01, TIN2012-38584-C06-0

Universidade do Minho: RepositoriUM

Crossref

Acquiring Domain-Specific Knowledge for WordNet from a Terminological Database

Author
Publication venue: OASIcs - OpenAccess Series in Informatics. 8th Symposium on Languages, Applications and Technologies (SLATE 2019)
Publication date: 01/01/2019
Field of study

In this research we explore a terminological database (Termoteca) in order to expand the Portuguese and Galician wordnets (PULO and Galnet) with the addition of new synset variants (word forms for a concept), usage examples for the variants, and synset glosses or definitions. The methodology applied in this experiment is based on the alignment between concepts of WordNet (synsets) and concepts described in Termoteca (terminological records), taking into account the lexical forms in both resources, their morphological category and their knowledge domains, using the information provided by the WordNet Domains Hierarchy and the Termoteca field domains to reduce the incidence of polysemy and homography in the results of the experiment. The results obtained confirm our hypothesis that the combined use of the semantic domain information included in both resources makes it possible to minimise the problem of lexical ambiguity and to obtain a very acceptable index of precision in terminological information extraction tasks, attaining a precision above 89% when there are two or more different languages sharing at least one lexical form between the synset in Galnet and the Termoteca record

Dagstuhl Research Online Publication Server

Expansión de wordnets mediante unidades pluriverbales extraídas de corpus paralelos

Author: Gómez Guinovart Xavier
Simões Alberto Manuel
Publication venue: Sociedad Española para el Procesamiento del Lenguaje Natural
Publication date: 01/01/2020
Field of study

In this paper we present a method for enlarging wordnets focusing on multi-word terms and utilising data from parallel corpora. Our approach is validated using the Galician and Portuguese wordnets. The multi-word candidates obtained in this experiment were manually validated, obtaining a 73.2% accuracy for the Galician language and a 75.5% for the Portuguese language.Presentamos un método para la ampliación de wordnets en el ámbito de las unidades pluriverbales, usando datos de corpus paralelos y aplicando el método a la expansión de los wordnets del gallego y del portugués. Las unidades pluriverbales que se obtienen en este experimento se validaron manualmente, obteniendo una precisión del 73.2% para el gallego y del 75.5% para el portugués.This research has been carried out thanks to the project DeepReading (RTI2018-096846-B-C21) supported by the Ministry of Science, Innovation and Universities of the Spanish Government and the European Fund for Regional Development (MCIU/AEI/FEDER), and was partially funded by Portuguese National funds (PIDDAC), through the FCT – Fundação para a Ciência e Tecnologia and FCT/MCTES under the scope of the project UIDB/05549/2020

Repositorio Institucional de la Universidad de Alicante

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Termonet: Terminology construction from WordNet and technical corpora

Author: Gómez Guinovart Xavier
Solla Portela Miguel Anxo
Publication venue: Sociedad Española para el Procesamiento del Lenguaje Natural
Publication date: 01/01/2015
Field of study

En esta presentación, mostraremos la metodología y los recursos utilizados en el desarrollo de Termonet, una herramienta para la consulta y verificación en corpus de los léxicos de especialidad incluidos en WordNet. Termonet realiza una identificación en WordNet de los synsets pertenecientes a un ámbito terminológico a partir de las relaciones léxico-semánticas establecidas entre los synsets, y valida los términos identificándolos en un corpus especializado desambiguado semánticamente. La construcción de esta herramienta forma parte de las tareas del proyecto de investigación SKATeR-UVigo, orientado al desarrollo y aplicación de recursos para el procesamiento lingüístico del gallego.In this presentation, we review the methodology and the resources used in the development of Termonet, a tool for checking and verifying in a corpus the specialty lexicons embedded in WordNet. This tool performs an identification of the synsets in WordNet belonging to a terminological domain from the lexical-semantic relations established among synsets, and validates the terms identifying them by means of a semantically disambiguated specialized corpus. The construction of this tool is part of the tasks of the SKATeR-UVigo research project, aimed at the development and application of resources for Galician language processing.Esta investigación se realiza en el marco del proyecto Adquisición de escenarios de conocimiento a través de la lectura de textos: Desarrollo y aplicación de recursos para el procesamiento lingüístico del gallego (SKATeR-UVigo) financiado por el Ministerio de Economía y Competitividad, TIN2012-38584-C06-04

Repositorio Institucional de la Universidad de Alicante

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Multilingual sentiment analysis in social media.

Author: San Vicente Roncal Iñaki
Publication venue
Publication date: 01/01/2019
Field of study

252 p.This thesis addresses the task of analysing sentiment in messages coming from social media. The ultimate goal was to develop a Sentiment Analysis system for Basque. However, because of the socio-linguistic reality of the Basque language a tool providing only analysis for Basque would not be enough for a real world application. Thus, we set out to develop a multilingual system, including Basque, English, French and Spanish.The thesis addresses the following challenges to build such a system:- Analysing methods for creating Sentiment lexicons, suitable for less resourced languages.- Analysis of social media (specifically Twitter): Tweets pose several challenges in order to understand and extract opinions from such messages. Language identification and microtext normalization are addressed.- Research the state of the art in polarity classification, and develop a supervised classifier that is tested against well known social media benchmarks.- Develop a social media monitor capable of analysing sentiment with respect to specific events, products or organizations

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital para la Docencia y la Investigación

Linguistic, lexicographic, and computational perspectives

Author: Barbu Mititelu Verginica
Giouli Voula
Publication venue
Publication date: 01/01/2024
Field of study

Institutional Repository of the Freie Universität Berlin