Search CORE

52 research outputs found

SMT and Hybrid systems of the QTLeap project in the WMT16 IT-task

Author: Agirre Eneko
Branco António
Gaudio Rosa
Gomes Luís
Labaka Gorka
Neale Steven
Oele Dieke
Osenova Petya
Popel Martin
Querido Andreia
Rendeiro Nuno
Rodrigues João
Silva João
Simov Kiril
van Noord Gertjan
Publication venue
Publication date: 01/01/2016
Field of study

This paper presents the description of 12 systems submitted to the WMT16 IT-task, covering six different languages, namely Basque, Bulgarian, Dutch, Czech, Portuguese and Spanish. All these systems were developed under the scope of the QTLeap project, presenting a common strategy. For each language two different systems were submitted, namely a phrase-based MT system built using Moses, and a system exploiting deep language engineering approaches, that in all the languages but Bulgarian was implemented using TectoMT. For 4 of the 6 languages, the TectoMT-based system performs better than the Moses-based one

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Biblio at Institute of Formal and Applied Linguistics

Dissertations of the University of Groningen

Traducción automática basada en tectogramática para inglés-español e inglés-euskara

Author: Agirre Bengoa Eneko
Alegría Loinaz Iñaki
Aranberri Nora
Díaz de Ilarraza Sánchez Arantza
Jauregi Oneka
Labaka Intxauspe Gorka
Publication venue: Sociedad Española para el Procesamiento del Lenguaje Natural
Publication date: 01/01/2016
Field of study

Presentamos los primeros sistemas de traducción automática para inglés-español e inglés-euskara basados en tectogramática. A partir del modelo ya existente inglés-checo, describimos las herramientas para el análisis y síntesis, y los recursos para la trasferencia. La evaluación muestra el potencial de estos sistemas para adaptarse a nuevas lenguas y dominios.We present the first attempt to build machine translation systems for the English-Spanish and English-Basque language pairs following the tectogrammar approach. Based on the English-Czech system, we describe the language-specific tools added in the analysis and synthesis steps, and the resources for bilingual transfer. Evaluation shows the potential of these systems for new languages and domains.The research leading to these results has received funding from FP7-ICT-2013-10-610516 (QTLeap project, qtleap.eu)

Repositorio Institucional de la Universidad de Alicante

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Using Parallel Texts and Lexicons for Verbal Word Sense Disambiguation

Author: Dušek Ondřej
Fučíková Eva
Hajič Jan
Popel Martin
Urešová Zdeňka
Šindlerová Jana
Publication venue
Publication date: 01/01/2015
Field of study

We present a system for verbal Word Sense Disambiguation (WSD) that is able to exploit additional information from parallel texts and lexicons. It is an extension of our previous WSD method, which gave promising results but used only monolingual features. In the follow-up work described here, we have explored two additional ideas: using English-Czech bilingual resources (as features only - the task itself remains a monolingual WSD task), and using a 'hybrid' approach, adding features extracted both from a parallel corpus and from manually aligned bilingual valency lexicon entries, which contain subcategorization information. Albeit not all types of features proved useful, both ideas and additions have led to significant improvements for both languages explored

Biblio at Institute of Formal and Applied Linguistics

Extrakce znalostních grafů z projektové dokumentace

Author: Helešic Tomáš
Publication venue: Univerzita Karlova, Matematicko-fyzikální fakulta
Publication date: 01/01/2014
Field of study

Název práce: Extrakce znalostních grafů z projektové dokumentace Autor: Bc. Tomáš Helešic Katedra: Katedra softwarového inženýrství Vedoucí diplomové práce: Mgr. Martin Nečaský, Ph.D. Abstrakt: Cílem této práce je prozkoumat možnosti automatické extrakce infor- mací z firemní projektové dokumentace s využitím nástroje pro strojové zpra- cování přirozeného jazyka a analýza přesnosti lingvistického zpracování těchto dokumentů. Dále navrhnout metody, jak získat klíčové pojmy a vazby mezi nimi. Z těchto pojmů a vazeb se vytváří znalostní grafy, které se uchovávají ve vhodném úložisti s vyhledávací službou. Práce se snaží propojit již ex- istující technologie, implementovat je do jednoduché aplikace a ověřit jejich připravenost pro praktické využití. Cílem je inspirovat budoucí výzkum v této oblasti, identifikovat kritická místa a navhrnout zlepšení. Hlavní přínos tkví v propojení zpracování přirozeného jazyka, metod extrakce informací, sémantické vyhledávání s firemnímy dokumenty. Přínos praktické části spočívá ve způsobu identifikace důležitých informací, které popisují jednotlivé dokumenty a jejich využití ve vyhledávání. Klíčová slova: Znalostní grafy, Extrakce informace, Zpracování...Title: Knowledge Graph Extraction from Project Documentation Author: Bc. Tomáš Helešic Department: Department of Software Engineering Supervisor: Mgr. Martin Nečaský, Ph.D. Abstract: The goal of this thesis is to explore the possibilities of automatic in- formation extraction from company project documentation with the use of ma- chine natural language processing and the analysis of the precision of linguistic processing of these documents. Furthermore suggest methods how acquire key terms and dependencies between them. From this terms and dependencies cre- ate knowledge graphs, that are stored in an appropriate database with search engine. The work is trying to interconnect already existing technologies in a shape of a simple application and test their readiness for a practical use. The goal is to inspire future research in this field, identify critical parts and propose improvements. The main gain is in the interconnection between natural lan- guage processing, methods of information extraction and semantic searching in corporate documents. The gain of the practical part reside in the way how to identify key information that is uniquely describing each document and its use in search. Keywords: Knowledge graphs, Information extraction, Natural language pro- cessing, Resource Description Framework 1Katedra softwarového inženýrstvíDepartment of Software EngineeringMatematicko-fyzikální fakultaFaculty of Mathematics and Physic

CU Digital Repository

New Language Pairs in TectoMT

Author: Dušek Ondřej
Gomes Luís
Novák Michal
Popel Martin
Rosa Rudolf
Publication venue
Publication date: 01/01/2015
Field of study

The TectoMT tree-to-tree machine translation system has been updated this year to support easier retraining for more translation directions. We use multilingual standards for morphology and syntax annotation and language-independent base rules. We include a simple, non-parametric way of combining TectoMT’s transfer model outputs

Biblio at Institute of Formal and Applied Linguistics

The Weight Function in the Subtree Kernel is Decisive

Author: Azaïs Romain
Ingels Florian
Publication venue
Publication date: 12/04/2019
Field of study

Tree data are ubiquitous because they model a large variety of situations, e.g., the architecture of plants, the secondary structure of RNA, or the hierarchy of XML files. Nevertheless, the analysis of these non-Euclidean data is difficult per se. In this paper, we focus on the subtree kernel that is a convolution kernel for tree data introduced by Vishwanathan and Smola in the early 2000's. More precisely, we investigate the influence of the weight function from a theoretical perspective and in real data applications. We establish on a 2-classes stochastic model that the performance of the subtree kernel is improved when the weight of leaves vanishes, which motivates the definition of a new weight function, learned from the data and not fixed by the user as usually done. To this end, we define a unified framework for computing the subtree kernel from ordered or unordered trees, that is particularly suitable for tuning parameters. We show through eight real data classification problems the great efficiency of our approach, in particular for small datasets, which also states the high importance of the weight function. Finally, a visualization tool of the significant features is derived.Comment: 36 page

arXiv.org e-Print Archive

HAL-ENS-LYON

INRIA a CCSD electronic archive server

Hal-Diderot

HAL-Rennes 1

Recommended from our members

Bayesian Logic Programs for plan recognition and machine reading

Author: Vijaya Raghavan Sindhu
Publication venue
Publication date: 22/02/2013
Field of study

textSeveral real world tasks involve data that is uncertain and relational in nature. Traditional approaches like first-order logic and probabilistic models either deal with structured data or uncertainty, but not both. To address these limitations, statistical relational learning (SRL), a new area in machine learning integrating both first-order logic and probabilistic graphical models, has emerged in the recent past. The advantage of SRL models is that they can handle both uncertainty and structured/relational data. As a result, they are widely used in domains like social network analysis, biological data analysis, and natural language processing. Bayesian Logic Programs (BLPs), which integrate both first-order logic and Bayesian net- works are a powerful SRL formalism developed in the recent past. In this dissertation, we develop approaches using BLPs to solve two real world tasks – plan recognition and machine reading. Plan recognition is the task of predicting an agent’s top-level plans based on its observed actions. It is an abductive reasoning task that involves inferring cause from effect. In the first part of the dissertation, we develop an approach to abductive plan recognition using BLPs. Since BLPs employ logical deduction to construct the networks, they cannot be used effectively for abductive plan recognition as is. Therefore, we extend BLPs to use logical abduction to construct Bayesian networks and call the resulting model Bayesian Abductive Logic Programs (BALPs). In the second part of the dissertation, we apply BLPs to the task of machine reading, which involves automatic extraction of knowledge from natural language text. Most information extraction (IE) systems identify facts that are explicitly stated in text. However, much of the information conveyed in text must be inferred from what is explicitly stated since easily inferable facts are rarely mentioned. Human readers naturally use common sense knowledge and “read between the lines” to infer such implicit information from the explicitly stated facts. Since IE systems do not have access to common sense knowledge, they cannot perform deeper reasoning to infer implicitly stated facts. Here, we first develop an approach using BLPs to infer implicitly stated facts from natural language text. It involves learning uncertain common sense knowledge in the form of probabilistic first-order rules by mining a large corpus of automatically extracted facts using an existing rule learner. These rules are then used to derive additional facts from extracted information using BLP inference. We then develop an online rule learner that handles the concise, incomplete nature of natural-language text and learns first-order rules from noisy IE extractions. Finally, we develop a novel approach to calculate the weights of the rules using a curated lexical ontology like WordNet. Both tasks described above involve inference and learning from partially observed or incomplete data. In plan recognition, the underlying cause or the top-level plan that resulted in the observed actions is not known or observed. Further, only a subset of the executed actions can be observed by the plan recognition system resulting in partially observed data. Similarly, in machine reading, since some information is implicitly stated, they are rarely observed in the data. In this dissertation, we demonstrate the efficacy of BLPs for inference and learning from incomplete data. Experimental comparison on various benchmark data sets on both tasks demonstrate the superior performance of BLPs over state-of-the-art methods.Computer Science

Texas ScholarWorks

Target-Side Context for Discriminative Models in Statistical Machine Translation

Author: Bojar Ondřej
Fraser Alexander
Junczys-Dowmunt Marcin
Tamchyna Aleš
Publication venue
Publication date: 01/01/2016
Field of study

Discriminative translation models utilizing source context have been shown to help statistical machine translation performance. We propose a novel extension of this work using target context information. Surprisingly, we show that this model can be efficiently integrated directly in the decoding process. Our approach scales to large training data sizes and results in consistent improvements in translation quality on four language pairs. We also provide an analysis comparing the strengths of the baseline source-context model with our extended source-context and target-context model and we show that our extension allows us to better capture morphological coherence. Our work is freely available as part of Moses.Comment: Accepted as a long paper for ACL 201

arXiv.org e-Print Archive

Crossref

Biblio at Institute of Formal and Applied Linguistics