Search CORE

58 research outputs found

Boosting terminology extraction through crosslingual resources

Author: Cajal Mariñosa Sergio
Rodríguez Hontoria Horacio
Publication venue
Publication date: 01/01/2014
Field of study

Terminology Extraction is an important Natural Language Processing task with multiple applications in many areas. The task has been approached from different points of view using different techniques. Language and domain independent systems have been proposed as well. Our contribution in this paper focuses on the improvements on Terminology Extraction using crosslingual resources and specifically the Wikipedia and on the use of a variant of PageRank for scoring the candidate terms. // La extracción de terminología es una tarea de procesamiento de la lengua sumamente importante y aplicable en numerosas áreas. La tarea se ha abordado desde múltiples perspectivas y utilizando técnicas diversas. También se han propuesto sistemas independientes de la lengua y del dominio. La contribución de este artículo se centra en las mejoras que los sistemas de extracción de terminología pueden lograr utilizando recursos translingües, y concretamente la Wikipedia y en el uso de una variante de PageRank para valorar los candidatos a términoPeer ReviewedPostprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Mejora de la extracción de terminología usando recursos translingües

Author: Cajal Sergio
Rodríguez Hontoria Horacio
Publication venue: Sociedad Española para el Procesamiento del Lenguaje Natural
Publication date: 01/01/2014
Field of study

Terminology Extraction is an important Natural Language Processing task with multiple applications in many areas. The task has been approached from different points of view using different techniques. Language and domain independent systems have been proposed as well. Our contribution in this paper focuses on the improvements on Terminology Extraction using crosslingual resources and specifically the Wikipedia and on the use of a variant of PageRank for scoring the candidate terms.La extracción de terminología es una tarea de procesamiento de la lengua sumamente importante y aplicable en numerosas áreas. La tarea se ha abordado desde múltiples perspectivas y utilizando técnicas diversas. También se han propuesto sistemas independientes de la lengua y del dominio. La contribución de este artículo se centra en las mejoras que los sistemas de extracción de terminología pueden lograr utilizando recursos translingües, y concretamente la Wikipedia y en el uso de una variante de PageRank para valorar los candidatos a término.The research described in this article has been partially funded by Spanish MINECO in the framework of project SKATER: Scenario Knowledge Acquisition by Textual Reading (TIN2012-38584-C06-01)

Repositorio Institucional de la Universidad de Alicante

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

A Survey on Awesome Korean NLP Datasets

Author: Ban Byunghyun
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 06/12/2021
Field of study

English based datasets are commonly available from Kaggle, GitHub, or recently published papers. Although benchmark tests with English datasets are sufficient to show off the performances of new models and methods, still a researcher need to train and validate the models on Korean based datasets to produce a technology or product, suitable for Korean processing. This paper introduces 15 popular Korean based NLP datasets with summarized details such as volume, license, repositories, and other research results inspired by the datasets. Also, I provide high-resolution instructions with sample or statistics of datasets. The main characteristics of datasets are presented on a single table to provide a rapid summarization of datasets for researchers.Comment: 11 pages, 1 horizontal page for large tabl

arXiv.org e-Print Archive

Recommended from our members

Crosslingual Topic Transfer

Author: Hao Shudong
Publication venue: University of Colorado Boulder
Publication date: 16/11/2019
Field of study

Probabilistic topic modeling has been used as an efficient tool for extracting high-level abstracts from large corpus, and is also commonly used as a feature extraction technique for many natural language processing tasks. As a natural extension, multilingual topic models extract language-consistent features from corpora in multiple languages, enabling knowledge transfer for crosslingual tasks. While many models have been proposed, they mostly require very specific crosslingual supervision data, which limits the generalization to languages without rich linguistic resources. In this thesis, we will start by designing an efficient multilingual topic model evaluation as the foundation of subsequent works. We then formulate the model training as a knowledge transfer process by defining a transfer operation. Based on this formulation, we are able to identify factors that actually affect the performance of crosslingual learning in topic models, and thus we introduce a new model that achieves competitive performance while using significantly less linguistic resource.</p

CU Scholar Institutional Repository

Extracción de una terminología multilingüe de Wikipedia

Author: Cajal Mariñosa Sergio
Publication venue: Universitat Politècnica de Catalunya
Publication date: 03/05/2014
Field of study

Disseny i avaluació d'un algorisme que extrau una terminologia multilingüe fent servir com a font d'informació Wikipedia, i ordena els termes per termhood fent servir una versió modificada de l'algorisme de PageRank de Google

UPCommons. Portal del coneixement obert de la UPC

Modeling Language Variation and Universals: A Survey on Typological Linguistics for Natural Language Processing

Author: Berzak Yevgeni
Korhonen Anna
O'Horan Helen
Poibeau Thierry
Ponti Edoardo Maria
Reichart Roi
Shutova Ekaterina
Vulić Ivan
Publication venue
Publication date: 27/02/2019
Field of study

Linguistic typology aims to capture structural and semantic variation across the world's languages. A large-scale typology could provide excellent guidance for multilingual Natural Language Processing (NLP), particularly for languages that suffer from the lack of human labeled resources. We present an extensive literature survey on the use of typological information in the development of NLP techniques. Our survey demonstrates that to date, the use of information in existing typological databases has resulted in consistent but modest improvements in system performance. We show that this is due to both intrinsic limitations of databases (in terms of coverage and feature granularity) and under-employment of the typological features included in them. We advocate for a new approach that adapts the broad and discrete nature of typological categories to the contextual and continuous nature of machine learning algorithms used in contemporary NLP. In particular, we suggest that such approach could be facilitated by recent developments in data-driven induction of typological knowledge

arXiv.org e-Print Archive

Edinburgh Research Explorer

Apollo (Cambridge)

UvA-DARE

International Migration, Integration and Social Cohesion online publications

Language technologies for a multilingual Europe

Author
Publication venue: Language Science Press
Publication date: 01/04/2020
Field of study

This volume of the series “Translation and Multilingual Natural Language Processing” includes most of the papers presented at the Workshop “Language Technology for a Multilingual Europe”, held at the University of Hamburg on September 27, 2011 in the framework of the conference GSCL 2011 with the topic “Multilingual Resources and Multilingual Applications”, along with several additional contributions. In addition to an overview article on Machine Translation and two contributions on the European initiatives META-NET and Multilingual Web, the volume includes six full research articles. Our intention with this workshop was to bring together various groups concerned with the umbrella topics of multilingualism and language technology, especially multilingual technologies. This encompassed, on the one hand, representatives from research and development in the field of language technologies, and, on the other hand, users from diverse areas such as, among others, industry, administration and funding agencies. The Workshop “Language Technology for a Multilingual Europe” was co-organised by the two GSCL working groups “Text Technology” and “Machine Translation” (http://gscl.info) as well as by META-NET (http://www.meta-net.eu)

Directory of Open Access Books (DOAB)

Language technologies for a multilingual Europe

Author
Publication venue
Publication date
Field of study

OAPEN Library

Modeling Language Variation and Universals: A Survey on Typological Linguistics for Natural Language Processing

Author: Berzak Yevgeni
Korhonen Anna
O'Horan Helen
Poibeau Thierry
Ponti Edoardo Maria
Reichart Roi
Shutova Ekaterina
Vulic Ivan
Publication venue: COMPUTATIONAL LINGUISTICS
Publication date: 09/08/2018
Field of study

Linguistic typology aims to capture structural and semantic variation across the world’s languages. A large-scale typology could provide excellent guidance for multilingual Natural Language Processing (NLP), particularly for languages that suffer from the lack of human labeled resources. We present an extensive literature survey on the use of typological information in the development of NLP techniques. Our survey demonstrates that to date, the use of information in existing typological databases has resulted in consistent but modest improvements in system performance. We show that this is due to both intrinsic limitations of databases (in terms of coverage and feature granularity) and under-utilization of the typological features included in them. We advocate for a new approach that adapts the broad and discrete nature of typological categories to the contextual and continuous nature of machine learning algorithms used in contemporary NLP. In particular, we suggest that such an approach could be facilitated by recent developments in data-driven induction of typological knowledge.</jats:p

arXiv.org e-Print Archive

Edinburgh Research Explorer

Apollo (Cambridge)

UvA-DARE

International Migration, Integration and Social Cohesion online publications