Search CORE

134 research outputs found

A model for verbalising relations with roles in multiple languages

Author: A Bosca
B Davis
CM Keet
CM Keet
CN Li
I Androutsopoulos
J Byamugisha
J Leo
J McCrae
K Fine
K Kaljurand
N Bouayad-Agha
NN Mathonsi
P Buitelaar
PR Fillottrani
R Denaux
R Stevens
T Baldwin
T Halpin
T Kuhn
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2016
Field of study

Natural language renderings of ontologies facilitate communication with domain experts. While for ontologies with terms in English this is fairly straightforward, it is problematic for grammatically richer languages due to conjugation of verbs, an article that may be dependent on the preposition, or a preposition that modifies the noun. There is no systematic way to deal with such `complex' names of OWL object properties, or their verbalisation with existing language models for annotating ontologies. The modifications occur only when the object performs some {\em role} in a relation, so we propose a conceptual model that can handle this. This requires reconciling the standard view with relational expressions to a positionalist view, which is included in the model and in the formalisation of the mapping between the two. This eases verbalisation and it allows for a more precise representation of the knowledge, yet is still compatible with existing technologies. We have implemented it as a Prot\'eg\'e plugin and validated its adequacy with several languages that need it, such as German and isiZulu

Crossref

UCT Computer Science Research Document Archive

Natural Language-based Approach for Helping in the Reuse of Ontology Design Patterns

Author: Aguado de Cea G.
Gómez-Pérez A.
Montiel-Ponsoda Elena
Suárez-Figueroa Mari Carmen
Publication venue: Facultad de Informática (UPM)
Publication date: 01/01/2008
Field of study

Experiments in the reuse of Ontology Design Patterns (ODPs) have revealed that users with different levels of expertise in ontology modelling face difficulties when reusing ODPs. With the aim of tackling this problem we propose a method and a tool for supporting a semi-automatic reuse of ODPs that takes as input formulations in natural language (NL) of the domain aspect to be modelled, and obtains as output a set of ODPs for solving the initial ontological needs. The correspondence between ODPs and NL formulations is done through Lexico-Syntactic Patterns, linguistic constructs that convey the semantic relations present in ODPs, and which constitute the main contribution of this paper. The main benefit of the proposed approach is the use of non-restricted NL formulations in various languages for obtaining ODPs. The use of full NL poses challenges in the disambiguation of linguistic expressions that we expect to solve with user interaction, among other strategies

CiteSeerX

Archivo Digital UPM

User Interfaces to the Web of Data based on Natural Language Generation

Author: Ell Basil
Publication venue: KIT Scientific Publishing, Karlsruhe
Publication date: 01/01/2017
Field of study

We explore how Virtual Research Environments based on Semantic Web technologies support research interactions with RDF data in various stages of corpus-based analysis, analyze the Web of Data in terms of human readability, derive labels from variables in SPARQL queries, apply Natural Language Generation to improve user interfaces to the Web of Data by verbalizing SPARQL queries and RDF graphs, and present a method to automatically induce RDF graph verbalization templates via distant supervision

KITopen

Directory of Open Access Books (DOAB)

Developments in classroom-based research on L2 writing

Author: Lee Icy
Publication venue: 'Adam Mickiewicz University Poznan'
Publication date: 01/12/2022
Field of study

This paper reviews and reflects on developments in classroom-based research on second or foreign language (L2) writing from 2001 to 2020, based on scholarship drawn from the Journal of Second Language Writing, the flagship journal of the field. The review covers a total of 75 classroom-based studies and examines the major research themes and key findings under three research strands: (1) students and student learning of writing; (2) teachers and teaching of writing; and (3) classroom assessment and feedback, as well as the key theories and research methodologies adopted in extant classroom-based studies on L2 writing. The article ends with a discussion of the practical implications arising from the review, as well as potential research gaps that inform future directions for L2 writing classroom-based research. By providing a state-of-the-art review of developments in classroom-based research on L2 writing, this article contributes to a nuanced understanding of salient issues about learning, teaching and assessment of writing that take place in naturalistic classroom contexts, with relevant implications for both L2 writing practitioners and researchers

Studies in Second Language Learning and Teaching

Directory of Open Access Journals

Text2KGBench: A Benchmark for Ontology-Driven Knowledge Graph Generation from Text

Author: Enguix Carlos F.
Lata Kusum
Mihindukulasooriya Nandana
Tiwari Sanju
Publication venue
Publication date: 04/08/2023
Field of study

The recent advances in large language models (LLM) and foundation models with emergent capabilities have been shown to improve the performance of many NLP tasks. LLMs and Knowledge Graphs (KG) can complement each other such that LLMs can be used for KG construction or completion while existing KGs can be used for different tasks such as making LLM outputs explainable or fact-checking in Neuro-Symbolic manner. In this paper, we present Text2KGBench, a benchmark to evaluate the capabilities of language models to generate KGs from natural language text guided by an ontology. Given an input ontology and a set of sentences, the task is to extract facts from the text while complying with the given ontology (concepts, relations, domain/range constraints) and being faithful to the input sentences. We provide two datasets (i) Wikidata-TekGen with 10 ontologies and 13,474 sentences and (ii) DBpedia-WebNLG with 19 ontologies and 4,860 sentences. We define seven evaluation metrics to measure fact extraction performance, ontology conformance, and hallucinations by LLMs. Furthermore, we provide results for two baseline models, Vicuna-13B and Alpaca-LoRA-13B using automatic prompt generation from test cases. The baseline results show that there is room for improvement using both Semantic Web and Natural Language Processing techniques.Comment: 15 pages, 3 figures, 4 tables. Accepted at ISWC 2023 (Resources Track

arXiv.org e-Print Archive

Statistical Extraction of Multilingual Natural Language Patterns for RDF Predicates: Algorithms and Applications

Author: Gerber Daniel
Publication venue
Publication date: 07/06/2016
Field of study

The Data Web has undergone a tremendous growth period. It currently consists of more then 3300 publicly available knowledge bases describing millions of resources from various domains, such as life sciences, government or geography, with over 89 billion facts. In the same way, the Document Web grew to the state where approximately 4.55 billion websites exist, 300 million photos are uploaded on Facebook as well as 3.5 billion Google searches are performed on average every day. However, there is a gap between the Document Web and the Data Web, since for example knowledge bases available on the Data Web are most commonly extracted from structured or semi-structured sources, but the majority of information available on the Web is contained in unstructured sources such as news articles, blog post, photos, forum discussions, etc. As a result, data on the Data Web not only misses a significant fragment of information but also suffers from a lack of actuality since typical extraction methods are time-consuming and can only be carried out periodically. Furthermore, provenance information is rarely taken into consideration and therefore gets lost in the transformation process. In addition, users are accustomed to entering keyword queries to satisfy their information needs. With the availability of machine-readable knowledge bases, lay users could be empowered to issue more specific questions and get more precise answers. In this thesis, we address the problem of Relation Extraction, one of the key challenges pertaining to closing the gap between the Document Web and the Data Web by four means. First, we present a distant supervision approach that allows finding multilingual natural language representations of formal relations already contained in the Data Web. We use these natural language representations to find sentences on the Document Web that contain unseen instances of this relation between two entities. Second, we address the problem of data actuality by presenting a real-time data stream RDF extraction framework and utilize this framework to extract RDF from RSS news feeds. Third, we present a novel fact validation algorithm, based on natural language representations, able to not only verify or falsify a given triple, but also to find trustworthy sources for it on the Web and estimating a time scope in which the triple holds true. The features used by this algorithm to determine if a website is indeed trustworthy are used as provenance information and therewith help to create metadata for facts in the Data Web. Finally, we present a question answering system that uses the natural language representations to map natural language question to formal SPARQL queries, allowing lay users to make use of the large amounts of data available on the Data Web to satisfy their information need

Qucosa - Publikationsserver der Universität Leipzig

Advancing Long-Term Care Science Through Using Common Data Elements: Candidate Measures for Care Outcomes of Personhood, Well-Being, and Quality of Life

Author: Anderson R.A.
Baxter R.
Beeber A.
Boas P. V.
Corazzini K.N.
Corneliusson L.
Edvardsson D.
Gordon A. L.
Hanratty B.
Jacinto A.
Lepore M.
Leung A. Y. M.
McGilton K.
Meyer J.
Schols J.
Schwartz L. M.
Shepherd V.
Skoldunger A.
Thompson R.
Toles M.
Wachholz P.
Wang J.
Wu B.
Zúñiga F.
Publication venue: 'SAGE Publications'
Publication date: 01/01/2019
Field of study

To support the development of internationally comparable common data elements (CDEs) that can be used to measure essential aspects of long-term care (LTC) across low-, middle-, and high-income countries, a group of researchers in medicine, nursing, behavioral, and social sciences from 21 different countries have joined forces and launched the Worldwide Elements to Harmonize Research in LTC Living Environments (WE-THRIVE) initiative. This initiative aims to develop a common data infrastructure for international use across the domains of organizational context, workforce and staffing, person-centered care, and care outcomes, as these are critical to LTC quality, experiences, and outcomes. This article reports measurement recommendations for the care outcomes domain, focusing on previously prioritized care outcomes concepts of well-being, quality of life (QoL), and personhood for residents in LTC. Through literature review and expert ranking, we recommend nine measures of well-being, QoL, and personhood, as a basis for developing CDEs for long-term care outcomes across countries. Data in LTC have often included deficit-oriented measures; while important, reductions do not necessarily mean that residents are concurrently experiencing well-being. Enhancing measurement efforts with the inclusion of these positive LTC outcomes across countries would facilitate international LTC research and align with global shifts toward healthy aging and person-centered LTC models

Maastricht University Research Portal

City Research Online

Crossref

Repository@Nottingham

Publikationer från Umeå universitet

Online Research @ Cardiff

Directory of Open Access Journals

edoc

Digitala Vetenskapliga Arkivet - Academic Archive On-line

The Making of Educationally Manageable Immigrant Schoolchildren in Denmark, 1970–2013:A Critical Prism for Studying the Fabrication of a Danish Welfare Nation State

Author: Padovan-Özdemir Marta
Publication venue: Det Humanistiske Fakultet, Københavns Universitet
Publication date: 01/01/2016
Field of study

Copenhagen University Research Information System

Compounding in Namagowab and English: (exploring meaning creation in compounds)

Author: Caroline Kloppert
Publication venue: 'Japanese Association of Sign Linguistics'
Publication date: 01/01/2016
Field of study

This essay investigates compounding in Namagowab and English, which belong to two widely divergent groups of languages, the Khoesan and Indo-European, respectively. The first motive is to investigate how and why new words are created from existing ones. The reading and data interpretation seeks an understanding of word formation and an overview of semantic compositionality, structure and productivity, within the broad context of cognitive, lexicalist and distributed morphology paradigms. This coupled with history reading about the languages and its people, is used to speculate about why compounds feature in lexical creation. Compounding is prevalent in both languages and their distance in terms of phylogenetic relationships should allow limited generalizing about these processes of formation. Word lists taken from dictionaries in both languages were analyzed by entering the words in Excel spreadsheets so that various attributes of these words, such as word type, compound class (Noun, Verb, Preposition, Adjective and Adverb) and constituent class could be counted, and described with formulae, and compound and constituent meaning analyzed. The conclusion was that socio historical factors such as language contact, and aspects of cognition such as memory and transparency, account for compounding in a language in addition to typology

Cape Town University OpenUCT

Active Learning for Reducing Labeling Effort in Text Classification Tasks

Author: Jacobs Pieter Floris
Maillette De Buy Wenniger Gideon
Schomaker Lambert
Wiering Marco
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 10/09/2021
Field of study

Labeling data can be an expensive task as it is usually performed manually by domain experts. This is cumbersome for deep learning, as it is dependent on large labeled datasets. Active learning (AL) is a paradigm that aims to reduce labeling effort by only using the data which the used model deems most informative. Little research has been done on AL in a text classification setting and next to none has involved the more recent, state-of-the-art Natural Language Processing (NLP) models. Here, we present an empirical study that compares different uncertainty-based algorithms with BERT

_{base}

as the used classifier. We evaluate the algorithms on two NLP classification datasets: Stanford Sentiment Treebank and KvK-Frontpages. Additionally, we explore heuristics that aim to solve presupposed problems of uncertainty-based AL; namely, that it is unscalable and that it is prone to selecting outliers. Furthermore, we explore the influence of the query-pool size on the performance of AL. Whereas it was found that the proposed heuristics for AL did not improve performance of AL; our results show that using uncertainty-based AL with BERT

_{base}

outperforms random sampling of data. This difference in performance can decrease as the query-pool size gets larger.Comment: Accepted as a conference paper at the joint 33rd Benelux Conference on Artificial Intelligence and the 30th Belgian Dutch Conference on Machine Learning (BNAIC/BENELEARN 2021). This camera-ready version submitted to BNAIC/BENELEARN, adds several improvements including a more thorough discussion of related work plus an extended discussion section. 28 pages including references and appendice

arXiv.org e-Print Archive

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen