Search CORE

133 research outputs found

Boosting terminology extraction through crosslingual resources

Author: Cajal Mariñosa Sergio
Rodríguez Hontoria Horacio
Publication venue
Publication date: 01/01/2014
Field of study

Terminology Extraction is an important Natural Language Processing task with multiple applications in many areas. The task has been approached from different points of view using different techniques. Language and domain independent systems have been proposed as well. Our contribution in this paper focuses on the improvements on Terminology Extraction using crosslingual resources and specifically the Wikipedia and on the use of a variant of PageRank for scoring the candidate terms. // La extracción de terminología es una tarea de procesamiento de la lengua sumamente importante y aplicable en numerosas áreas. La tarea se ha abordado desde múltiples perspectivas y utilizando técnicas diversas. También se han propuesto sistemas independientes de la lengua y del dominio. La contribución de este artículo se centra en las mejoras que los sistemas de extracción de terminología pueden lograr utilizando recursos translingües, y concretamente la Wikipedia y en el uso de una variante de PageRank para valorar los candidatos a términoPeer ReviewedPostprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Cross-Lingual Data Quality for Knowledge Base Acceleration across Wikipedia Editions

Author: Göbölös-Szabó Júlia
Prytkova N
Spaniol M
Weikum G
Publication venue: 'Purdue University (bepress)'
Publication date: 01/01/2012
Field of study

International audienc

HAL - Normandie Université

SZTAKI Publication Repository

MPG.PuRe

Mejora de la extracción de terminología usando recursos translingües

Author: Cajal Sergio
Rodríguez Hontoria Horacio
Publication venue: Sociedad Española para el Procesamiento del Lenguaje Natural
Publication date: 01/01/2014
Field of study

Terminology Extraction is an important Natural Language Processing task with multiple applications in many areas. The task has been approached from different points of view using different techniques. Language and domain independent systems have been proposed as well. Our contribution in this paper focuses on the improvements on Terminology Extraction using crosslingual resources and specifically the Wikipedia and on the use of a variant of PageRank for scoring the candidate terms.La extracción de terminología es una tarea de procesamiento de la lengua sumamente importante y aplicable en numerosas áreas. La tarea se ha abordado desde múltiples perspectivas y utilizando técnicas diversas. También se han propuesto sistemas independientes de la lengua y del dominio. La contribución de este artículo se centra en las mejoras que los sistemas de extracción de terminología pueden lograr utilizando recursos translingües, y concretamente la Wikipedia y en el uso de una variante de PageRank para valorar los candidatos a término.The research described in this article has been partially funded by Spanish MINECO in the framework of project SKATER: Scenario Knowledge Acquisition by Textual Reading (TIN2012-38584-C06-01)

Repositorio Institucional de la Universidad de Alicante

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Evaluation of ILP-based approaches for partitioning into colorful components

Author: Bruckner S.
Hüffner F.
Komusiewicz Ch.
Niedermeier R.
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2013
Field of study

The NP-hard Colorful Components problem is a graph partitioning problem on vertex-colored graphs. We identify a new application of Colorful Components in the correction of Wikipedia interlanguage links, and describe and compare three exact and two heuristic approaches. In particular, we devise two ILP formulations, one based on Hitting Set and one based on Clique Partition. Furthermore, we use the recently proposed implicit hitting set framework [Karp, JCSS 2011; Chandrasekaran et al., SODA 2011] to solve Colorful Components. Finally, we study a move-based and a merge-based heuristic for Colorful Components. We can optimally solve Colorful Components for Wikipedia link correction data; while the Clique Partition-based ILP outperforms the other two exact approaches, the implicit hitting set is a simple and competitive alternative. The merge-based heuristic is very accurate and outperforms the move-based one. The above results for Wikipedia data are confirmed by experiments with synthetic instances

Repository: Freie Universität Berlin (FU), Math Department (fu_mi_publications)

{YAGO}3: A Knowledge Base from Multilingual Wikipedias

Author: Biega J.
Mahdisoltani F.
Suchanek F.
Publication venue
Publication date: 01/01/2014
Field of study

MPG.PuRe

Data mining and fusion

Author: Addis M. J.
Choi F.
Taylor S. J.
Upstill C.
Watkins E. R.
Publication venue: s.n.
Publication date: 01/04/2006
Field of study

Southampton (e-Prints Soton)

Commonsense Knowledge in Sentiment Analysis of Ordinance Reactions for Smart Governance

Author: Puri Manish
Publication venue: Montclair State University Digital Commons
Publication date: 01/05/2019
Field of study

Smart Governance is an emerging research area which has attracted scientific as well as policy interests, and aims to improve collaboration between government and citizens, as well as other stakeholders. Our project aims to enable lawmakers to incorporate data driven decision making in enacting ordinances. Our first objective is to create a mechanism for mapping ordinances (local laws) and tweets to Smart City Characteristics (SCC). The use of SCC has allowed us to create a mapping between a huge number of ordinances and tweets, and the use of Commonsense Knowledge (CSK) has allowed us to utilize human judgment in mapping. We have then enhanced the mapping technique to link multiple tweets to SCC. In order to promote transparency in government through increased public participation, we have conducted sentiment analysis of tweets in order to evaluate the opinion of the public with respect to ordinances passed in a particular region. Our final objective is to develop a mapping algorithm in order to directly relate ordinances to tweets. In order to fulfill this objective, we have developed a mapping technique known as TOLCS (Tweets Ordinance Linkage by Commonsense and Semantics). This technique uses pragmatic aspects in Commonsense Knowledge as well as semantic aspects by domain knowledge. By reducing the sample space of big data to be processed, this method represents an efficient way to accomplish this task. The ultimate goal of the project is to see how closely a given region is heading towards the concept of Smart City

Montclair State University Digital Commons

Searching to Translate and Translating to Search: When Information Retrieval Meets Machine Translation

Author: Ture Ferhan
Publication venue
Publication date: 01/01/2013
Field of study

With the adoption of web services in daily life, people have access to tremendous amounts of information, beyond any human's reading and comprehension capabilities. As a result, search technologies have become a fundamental tool for accessing information. Furthermore, the web contains information in multiple languages, introducing another barrier between people and information. Therefore, search technologies need to handle content written in multiple languages, which requires techniques to account for the linguistic differences. Information Retrieval (IR) is the study of search techniques, in which the task is to find material relevant to a given information need. Cross-Language Information Retrieval (CLIR) is a special case of IR when the search takes place in a multi-lingual collection. Of course, it is not helpful to retrieve content in languages the user cannot understand. Machine Translation (MT) studies the translation of text from one language into another efficiently (within a reasonable amount of time) and effectively (fluent and retaining the original meaning), which helps people understand what is being written, regardless of the source language. Putting these together, we observe that search and translation technologies are part of an important user application, calling for a better integration of search (IR) and translation (MT), since these two technologies need to work together to produce high-quality output. In this dissertation, the main goal is to build better connections between IR and MT, for which we present solutions to two problems: Searching to translate explores approximate search techniques for extracting bilingual data from multilingual Wikipedia collections to train better translation models. Translating to search explores the integration of a modern statistical MT system into the cross-language search processes. In both cases, our best-performing approach yielded improvements over strong baselines for a variety of language pairs. Finally, we propose a general architecture, in which various components of IR and MT systems can be connected together into a feedback loop, with potential improvements to both search and translation tasks. We hope that the ideas presented in this dissertation will spur more interest in the integration of search and translation technologies

CiteSeerX

Digital Repository at the University of Maryland