134 research outputs found
A model for verbalising relations with roles in multiple languages
Natural language renderings of ontologies facilitate communication with domain experts. While for ontologies with terms in English this is fairly straightforward, it is problematic for grammatically richer languages due to conjugation of verbs, an article that may be dependent on the preposition, or a preposition that modifies the noun.
There is no systematic way to deal with such `complex' names of OWL object properties, or their verbalisation with existing language models for annotating ontologies.
The modifications occur only when the object performs some {\em role} in a relation, so we propose a conceptual model that can handle this. This requires reconciling the standard view with relational expressions to a positionalist view, which is included in the model and in the formalisation of the mapping between the two. This eases verbalisation and it allows for a more precise representation of the knowledge, yet is still compatible with existing technologies. We have implemented it as a Prot\'eg\'e plugin and validated its adequacy with several languages that need it, such as German and isiZulu
Natural Language-based Approach for Helping in the Reuse of Ontology Design Patterns
Experiments in the reuse of Ontology Design Patterns (ODPs) have
revealed that users with different levels of expertise in ontology modelling face
difficulties when reusing ODPs. With the aim of tackling this problem we propose
a method and a tool for supporting a semi-automatic reuse of ODPs that
takes as input formulations in natural language (NL) of the domain aspect to be
modelled, and obtains as output a set of ODPs for solving the initial ontological
needs. The correspondence between ODPs and NL formulations is done
through Lexico-Syntactic Patterns, linguistic constructs that convey the semantic
relations present in ODPs, and which constitute the main contribution of this
paper. The main benefit of the proposed approach is the use of non-restricted
NL formulations in various languages for obtaining ODPs. The use of full NL
poses challenges in the disambiguation of linguistic expressions that we expect
to solve with user interaction, among other strategies
User Interfaces to the Web of Data based on Natural Language Generation
We explore how Virtual Research Environments based on Semantic Web technologies support research interactions with RDF data in various stages of corpus-based analysis, analyze the Web of Data in terms of human readability, derive labels from variables in SPARQL queries, apply Natural Language Generation to improve user interfaces to the Web of Data by verbalizing SPARQL queries and RDF graphs, and present a method to automatically induce RDF graph verbalization templates via distant supervision
Developments in classroom-based research on L2 writing
This paper reviews and reflects on developments in classroom-based research on second or foreign language (L2) writing from 2001 to 2020, based on scholarship drawn from the Journal of Second Language Writing, the flagship journal of the field. The review covers a total of 75 classroom-based studies and examines the major research themes and key findings under three research strands: (1) students and student learning of writing; (2) teachers and teaching of writing; and (3) classroom assessment and feedback, as well as the key theories and research methodologies adopted in extant classroom-based studies on L2 writing. The article ends with a discussion of the practical implications arising from the review, as well as potential research gaps that inform future directions for L2 writing classroom-based research. By providing a state-of-the-art review of developments in classroom-based research on L2 writing, this article contributes to a nuanced understanding of salient issues about learning, teaching and assessment of writing that take place in naturalistic classroom contexts, with relevant implications for both L2 writing practitioners and researchers
Text2KGBench: A Benchmark for Ontology-Driven Knowledge Graph Generation from Text
The recent advances in large language models (LLM) and foundation models with
emergent capabilities have been shown to improve the performance of many NLP
tasks. LLMs and Knowledge Graphs (KG) can complement each other such that LLMs
can be used for KG construction or completion while existing KGs can be used
for different tasks such as making LLM outputs explainable or fact-checking in
Neuro-Symbolic manner. In this paper, we present Text2KGBench, a benchmark to
evaluate the capabilities of language models to generate KGs from natural
language text guided by an ontology. Given an input ontology and a set of
sentences, the task is to extract facts from the text while complying with the
given ontology (concepts, relations, domain/range constraints) and being
faithful to the input sentences. We provide two datasets (i) Wikidata-TekGen
with 10 ontologies and 13,474 sentences and (ii) DBpedia-WebNLG with 19
ontologies and 4,860 sentences. We define seven evaluation metrics to measure
fact extraction performance, ontology conformance, and hallucinations by LLMs.
Furthermore, we provide results for two baseline models, Vicuna-13B and
Alpaca-LoRA-13B using automatic prompt generation from test cases. The baseline
results show that there is room for improvement using both Semantic Web and
Natural Language Processing techniques.Comment: 15 pages, 3 figures, 4 tables. Accepted at ISWC 2023 (Resources
Track
Statistical Extraction of Multilingual Natural Language Patterns for RDF Predicates: Algorithms and Applications
The Data Web has undergone a tremendous growth period.
It currently consists of more then 3300 publicly available knowledge bases describing millions of resources from various domains, such as life sciences, government or geography, with over 89 billion facts.
In the same way, the Document Web grew to the state where approximately 4.55 billion websites exist, 300 million photos are uploaded on Facebook as well as 3.5 billion Google searches are performed on average every day.
However, there is a gap between the Document Web and the Data Web, since for example knowledge bases available on the Data Web are most commonly extracted from structured or semi-structured sources, but the majority of information available on the Web is contained in unstructured sources such as news articles, blog post, photos, forum discussions, etc.
As a result, data on the Data Web not only misses a significant fragment of information but also suffers from a lack of actuality since typical extraction methods are time-consuming and can only be carried out periodically.
Furthermore, provenance information is rarely taken into consideration and therefore gets lost in the transformation process.
In addition, users are accustomed to entering keyword queries to satisfy their information needs.
With the availability of machine-readable knowledge bases, lay users could be empowered to issue more specific questions and get more precise answers.
In this thesis, we address the problem of Relation Extraction, one of the key challenges pertaining to closing the gap between the Document Web and the Data Web by four means.
First, we present a distant supervision approach that allows finding multilingual natural language representations of formal relations already contained in the Data Web.
We use these natural language representations to find sentences on the Document Web that contain unseen instances of this relation between two entities.
Second, we address the problem of data actuality by presenting a real-time data stream RDF extraction framework and utilize this framework to extract RDF from RSS news feeds.
Third, we present a novel fact validation algorithm, based on natural language representations, able to not only verify or falsify a given triple, but also to find trustworthy sources for it on the Web and estimating a time scope in which the triple holds true.
The features used by this algorithm to determine if a website is indeed trustworthy are used as provenance information and therewith help to create metadata for facts in the Data Web.
Finally, we present a question answering system that uses the natural language representations to map natural language question to formal SPARQL queries, allowing lay users to make use of the large amounts of data available on the Data Web to satisfy their information need
Advancing Long-Term Care Science Through Using Common Data Elements: Candidate Measures for Care Outcomes of Personhood, Well-Being, and Quality of Life
To support the development of internationally comparable common data elements (CDEs) that can be used to measure essential aspects of long-term care (LTC) across low-, middle-, and high-income countries, a group of researchers in medicine, nursing, behavioral, and social sciences from 21 different countries have joined forces and launched the Worldwide Elements to Harmonize Research in LTC Living Environments (WE-THRIVE) initiative. This initiative aims to develop a common data infrastructure for international use across the domains of organizational context, workforce and staffing, person-centered care, and care outcomes, as these are critical to LTC quality, experiences, and outcomes. This article reports measurement recommendations for the care outcomes domain, focusing on previously prioritized care outcomes concepts of well-being, quality of life (QoL), and personhood for residents in LTC. Through literature review and expert ranking, we recommend nine measures of well-being, QoL, and personhood, as a basis for developing CDEs for long-term care outcomes across countries. Data in LTC have often included deficit-oriented measures; while important, reductions do not necessarily mean that residents are concurrently experiencing well-being. Enhancing measurement efforts with the inclusion of these positive LTC outcomes across countries would facilitate international LTC research and align with global shifts toward healthy aging and person-centered LTC models
Compounding in Namagowab and English: (exploring meaning creation in compounds)
This essay investigates compounding in Namagowab and English, which belong to two widely divergent groups of languages, the Khoesan and Indo-European, respectively. The first motive is to investigate how and why new words are created from existing ones. The reading and data interpretation seeks an understanding of word formation and an overview of semantic compositionality, structure and productivity, within the broad context of cognitive, lexicalist and distributed morphology paradigms. This coupled with history reading about the languages and its people, is used to speculate about why compounds feature in lexical creation. Compounding is prevalent in both languages and their distance in terms of phylogenetic relationships should allow limited generalizing about these processes of formation. Word lists taken from dictionaries in both languages were analyzed by entering the words in Excel spreadsheets so that various attributes of these words, such as word type, compound class (Noun, Verb, Preposition, Adjective and Adverb) and constituent class could be counted, and described with formulae, and compound and constituent meaning analyzed. The conclusion was that socio historical factors such as language contact, and aspects of cognition such as memory and transparency, account for compounding in a language in addition to typology
Active Learning for Reducing Labeling Effort in Text Classification Tasks
Labeling data can be an expensive task as it is usually performed manually by
domain experts. This is cumbersome for deep learning, as it is dependent on
large labeled datasets. Active learning (AL) is a paradigm that aims to reduce
labeling effort by only using the data which the used model deems most
informative. Little research has been done on AL in a text classification
setting and next to none has involved the more recent, state-of-the-art Natural
Language Processing (NLP) models. Here, we present an empirical study that
compares different uncertainty-based algorithms with BERT as the used
classifier. We evaluate the algorithms on two NLP classification datasets:
Stanford Sentiment Treebank and KvK-Frontpages. Additionally, we explore
heuristics that aim to solve presupposed problems of uncertainty-based AL;
namely, that it is unscalable and that it is prone to selecting outliers.
Furthermore, we explore the influence of the query-pool size on the performance
of AL. Whereas it was found that the proposed heuristics for AL did not improve
performance of AL; our results show that using uncertainty-based AL with
BERT outperforms random sampling of data. This difference in
performance can decrease as the query-pool size gets larger.Comment: Accepted as a conference paper at the joint 33rd Benelux Conference
on Artificial Intelligence and the 30th Belgian Dutch Conference on Machine
Learning (BNAIC/BENELEARN 2021). This camera-ready version submitted to
BNAIC/BENELEARN, adds several improvements including a more thorough
discussion of related work plus an extended discussion section. 28 pages
including references and appendice
- …