Search CORE

46 research outputs found

A Domain Agnostic Approach to Verbalizing n-ary Events without Parallel Corpora

Author: Cerisara Christophe
Gardent Claire
Gyawali Bikash
Publication venue: HAL CCSD
Publication date: 01/01/2015
Field of study

International audienceWe present a method for automatically generating descriptions of biological events encoded in the KB Bio 101 Knowledge base. We evaluate our approach on a corpus of 336 event descriptions, provide a qualitative and quantitative analysis of the results obtained and discuss possible directions for further work

Crossref

INRIA a CCSD electronic archive server

User Interfaces to the Web of Data based on Natural Language Generation

Author: Ell Basil
Publication venue: KIT Scientific Publishing, Karlsruhe
Publication date: 01/01/2017
Field of study

We explore how Virtual Research Environments based on Semantic Web technologies support research interactions with RDF data in various stages of corpus-based analysis, analyze the Web of Data in terms of human readability, derive labels from variables in SPARQL queries, apply Natural Language Generation to improve user interfaces to the Web of Data by verbalizing SPARQL queries and RDF graphs, and present a method to automatically induce RDF graph verbalization templates via distant supervision

KITopen

Directory of Open Access Books (DOAB)

Statistical Extraction of Multilingual Natural Language Patterns for RDF Predicates: Algorithms and Applications

Author: Gerber Daniel
Publication venue
Publication date: 07/06/2016
Field of study

The Data Web has undergone a tremendous growth period. It currently consists of more then 3300 publicly available knowledge bases describing millions of resources from various domains, such as life sciences, government or geography, with over 89 billion facts. In the same way, the Document Web grew to the state where approximately 4.55 billion websites exist, 300 million photos are uploaded on Facebook as well as 3.5 billion Google searches are performed on average every day. However, there is a gap between the Document Web and the Data Web, since for example knowledge bases available on the Data Web are most commonly extracted from structured or semi-structured sources, but the majority of information available on the Web is contained in unstructured sources such as news articles, blog post, photos, forum discussions, etc. As a result, data on the Data Web not only misses a significant fragment of information but also suffers from a lack of actuality since typical extraction methods are time-consuming and can only be carried out periodically. Furthermore, provenance information is rarely taken into consideration and therefore gets lost in the transformation process. In addition, users are accustomed to entering keyword queries to satisfy their information needs. With the availability of machine-readable knowledge bases, lay users could be empowered to issue more specific questions and get more precise answers. In this thesis, we address the problem of Relation Extraction, one of the key challenges pertaining to closing the gap between the Document Web and the Data Web by four means. First, we present a distant supervision approach that allows finding multilingual natural language representations of formal relations already contained in the Data Web. We use these natural language representations to find sentences on the Document Web that contain unseen instances of this relation between two entities. Second, we address the problem of data actuality by presenting a real-time data stream RDF extraction framework and utilize this framework to extract RDF from RSS news feeds. Third, we present a novel fact validation algorithm, based on natural language representations, able to not only verify or falsify a given triple, but also to find trustworthy sources for it on the Web and estimating a time scope in which the triple holds true. The features used by this algorithm to determine if a website is indeed trustworthy are used as provenance information and therewith help to create metadata for facts in the Data Web. Finally, we present a question answering system that uses the natural language representations to map natural language question to formal SPARQL queries, allowing lay users to make use of the large amounts of data available on the Data Web to satisfy their information need

Qucosa - Publikationsserver der Universität Leipzig

Advances in formal Slavic linguistics 2021

Author
Publication venue
Publication date: 01/01/2023
Field of study

Synopsis: Advances in formal Slavic linguistics 2021 offers a selection of articles that were prepared on the basis of talks given at the conference Formal Description of Slavic Languages 14 or at the satellite workshop on secondary imperfectives in Slavic, which were held on June 2–5, 2021, at the University of Leipzig. The volume covers all branches of Slavic languages and features synchronic as well as diachronic analyses. It comprises a wide array of topics, such as degree achievements, clitic climbing in Czech and Polish, typology of Slavic l-participles, aspectual markers in Russian and Czech, doubling in South Slavic relative clauses, congruence and case-agreement in close apposition in Russian, cataphora in Slovenian, Russian and Polish participles, prefixation and telicity in Serbo-Croatian, Bulgarian adjectives, negative questions in Russian and German and imperfectivity in discourse. The numerous topics addressed demonstrate the importance of Slavic data and the analyses presented in this collection make a significant contribution to Slavic linguistics as well as to linguistics in general

Institutional Repository of the Freie Universität Berlin

Active Learning for Reducing Labeling Effort in Text Classification Tasks

Author: Jacobs Pieter Floris
Maillette De Buy Wenniger Gideon
Schomaker Lambert
Wiering Marco
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 10/09/2021
Field of study

Labeling data can be an expensive task as it is usually performed manually by domain experts. This is cumbersome for deep learning, as it is dependent on large labeled datasets. Active learning (AL) is a paradigm that aims to reduce labeling effort by only using the data which the used model deems most informative. Little research has been done on AL in a text classification setting and next to none has involved the more recent, state-of-the-art Natural Language Processing (NLP) models. Here, we present an empirical study that compares different uncertainty-based algorithms with BERT

_{base}

as the used classifier. We evaluate the algorithms on two NLP classification datasets: Stanford Sentiment Treebank and KvK-Frontpages. Additionally, we explore heuristics that aim to solve presupposed problems of uncertainty-based AL; namely, that it is unscalable and that it is prone to selecting outliers. Furthermore, we explore the influence of the query-pool size on the performance of AL. Whereas it was found that the proposed heuristics for AL did not improve performance of AL; our results show that using uncertainty-based AL with BERT

_{base}

outperforms random sampling of data. This difference in performance can decrease as the query-pool size gets larger.Comment: Accepted as a conference paper at the joint 33rd Benelux Conference on Artificial Intelligence and the 30th Belgian Dutch Conference on Machine Learning (BNAIC/BENELEARN 2021). This camera-ready version submitted to BNAIC/BENELEARN, adds several improvements including a more thorough discussion of related work plus an extended discussion section. 28 pages including references and appendice

arXiv.org e-Print Archive

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Engineering Background Knowledge for Social Robots

Author: Asprino Luigi <1988>
Publication venue: Alma Mater Studiorum - Università di Bologna
Publication date: 03/04/2019
Field of study

Social robots are embodied agents that continuously perform knowledge-intensive tasks involving several kinds of information coming from different heterogeneous sources. Providing a framework for engineering robots' knowledge raises several problems like identifying sources of information and modeling solutions suitable for robots' activities, integrating knowledge coming from different sources, evolving this knowledge with information learned during robots' activities, grounding perceptions on robots' knowledge, assessing robots' knowledge with respect humans' one and so on. In this thesis we investigated feasibility and benefits of engineering background knowledge of Social Robots with a framework based on Semantic Web technologies and Linked Data. This research has been supported and guided by a case study that provided a proof of concept through a prototype tested in a real socially assistive context

Crossref

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

AMS Tesi di Dottorato

On looking into words (and beyond): Structures, Relations, Analyses

Author: Aissen Judith
Aronoff Mark
Bat-El Outi
Blevins Juliette
Bowern Claire
Chung Sandra
de Chene Brent
Deo Ashwini
Goldstein Louis
Hale Mark
Hammond Michael
Hendrick Randall
Horn Laurence
Horvath Julia
Hyman Larry M.
Inkelas Sharon
Jenga Fred
Kaisse Ellen
Kavitskaya Darya
Kiparsky Paul
Lepic Ryan
Maiden Martin
Napoli Donna Jo
Newmeyer Frederick J.
Padden Carol
Round Erich R.
Siloni Tal
Spencer Andrew
Stump Gregory
Thráinsson Höskuldur
Timberlake Alan
Yang Charles
Zanuttini Raffaella
Publication venue: Language Science Press
Publication date: 18/05/2017
Field of study

On Looking into Words is a wide-ranging volume spanning current research into word structure and morphology, with a focus on historical linguistics and linguistic theory. The papers are offered as a tribute to Stephen R. Anderson, the Dorothy R. Diebold Professor of Linguistics at Yale, who is retiring at the end of the 2016-2017 academic year. The contributors are friends, colleagues, and former students of Professor Anderson, all important contributors to linguistics in their own right. As is typical for such volumes, the contributions span a variety of topics relating to the interests of the honorand. In this case, the central contributions that Anderson has made to so many areas of linguistics and cognitive science, drawing on synchronic and diachronic phenomena in diverse linguistic systems, are represented through the papers in the volume. The 26 papers that constitute this volume are unified by their discussion of the interplay between synchrony and diachrony, theory and empirical results, and the role of diachronic evidence in understanding the nature of language. Central concerns of the volume include morphological gaps, learnability, increases and declines in productivity, and the interaction of different components of the grammar. The papers deal with a range of linked synchronic and diachronic topics in phonology, morphology, and syntax (in particular, cliticization), and their implications for linguistic theory

Language Science Press