134 research outputs found

    A model for verbalising relations with roles in multiple languages

    Get PDF
    Natural language renderings of ontologies facilitate communication with domain experts. While for ontologies with terms in English this is fairly straightforward, it is problematic for grammatically richer languages due to conjugation of verbs, an article that may be dependent on the preposition, or a preposition that modifies the noun. There is no systematic way to deal with such `complex' names of OWL object properties, or their verbalisation with existing language models for annotating ontologies. The modifications occur only when the object performs some {\em role} in a relation, so we propose a conceptual model that can handle this. This requires reconciling the standard view with relational expressions to a positionalist view, which is included in the model and in the formalisation of the mapping between the two. This eases verbalisation and it allows for a more precise representation of the knowledge, yet is still compatible with existing technologies. We have implemented it as a Prot\'eg\'e plugin and validated its adequacy with several languages that need it, such as German and isiZulu

    Natural Language-based Approach for Helping in the Reuse of Ontology Design Patterns

    Get PDF
    Experiments in the reuse of Ontology Design Patterns (ODPs) have revealed that users with different levels of expertise in ontology modelling face difficulties when reusing ODPs. With the aim of tackling this problem we propose a method and a tool for supporting a semi-automatic reuse of ODPs that takes as input formulations in natural language (NL) of the domain aspect to be modelled, and obtains as output a set of ODPs for solving the initial ontological needs. The correspondence between ODPs and NL formulations is done through Lexico-Syntactic Patterns, linguistic constructs that convey the semantic relations present in ODPs, and which constitute the main contribution of this paper. The main benefit of the proposed approach is the use of non-restricted NL formulations in various languages for obtaining ODPs. The use of full NL poses challenges in the disambiguation of linguistic expressions that we expect to solve with user interaction, among other strategies

    User Interfaces to the Web of Data based on Natural Language Generation

    Get PDF
    We explore how Virtual Research Environments based on Semantic Web technologies support research interactions with RDF data in various stages of corpus-based analysis, analyze the Web of Data in terms of human readability, derive labels from variables in SPARQL queries, apply Natural Language Generation to improve user interfaces to the Web of Data by verbalizing SPARQL queries and RDF graphs, and present a method to automatically induce RDF graph verbalization templates via distant supervision

    Developments in classroom-based research on L2 writing

    Get PDF
    This paper reviews and reflects on developments in classroom-based research on second or foreign language (L2) writing from 2001 to 2020, based on scholarship drawn from the Journal of Second Language Writing, the flagship journal of the field. The review covers a total of 75 classroom-based studies and examines the major research themes and key findings under three research strands: (1) students and student learning of writing; (2) teachers and teaching of writing; and (3) classroom assessment and feedback, as well as the key theories and research methodologies adopted in extant classroom-based studies on L2 writing. The article ends with a discussion of the practical implications arising from the review, as well as potential research gaps that inform future directions for L2 writing classroom-based research. By providing a state-of-the-art review of developments in classroom-based research on L2 writing, this article contributes to a nuanced understanding of salient issues about learning, teaching and assessment of writing that take place in naturalistic classroom contexts, with relevant implications for both L2 writing practitioners and researchers

    Text2KGBench: A Benchmark for Ontology-Driven Knowledge Graph Generation from Text

    Full text link
    The recent advances in large language models (LLM) and foundation models with emergent capabilities have been shown to improve the performance of many NLP tasks. LLMs and Knowledge Graphs (KG) can complement each other such that LLMs can be used for KG construction or completion while existing KGs can be used for different tasks such as making LLM outputs explainable or fact-checking in Neuro-Symbolic manner. In this paper, we present Text2KGBench, a benchmark to evaluate the capabilities of language models to generate KGs from natural language text guided by an ontology. Given an input ontology and a set of sentences, the task is to extract facts from the text while complying with the given ontology (concepts, relations, domain/range constraints) and being faithful to the input sentences. We provide two datasets (i) Wikidata-TekGen with 10 ontologies and 13,474 sentences and (ii) DBpedia-WebNLG with 19 ontologies and 4,860 sentences. We define seven evaluation metrics to measure fact extraction performance, ontology conformance, and hallucinations by LLMs. Furthermore, we provide results for two baseline models, Vicuna-13B and Alpaca-LoRA-13B using automatic prompt generation from test cases. The baseline results show that there is room for improvement using both Semantic Web and Natural Language Processing techniques.Comment: 15 pages, 3 figures, 4 tables. Accepted at ISWC 2023 (Resources Track

    Statistical Extraction of Multilingual Natural Language Patterns for RDF Predicates: Algorithms and Applications

    Get PDF
    The Data Web has undergone a tremendous growth period. It currently consists of more then 3300 publicly available knowledge bases describing millions of resources from various domains, such as life sciences, government or geography, with over 89 billion facts. In the same way, the Document Web grew to the state where approximately 4.55 billion websites exist, 300 million photos are uploaded on Facebook as well as 3.5 billion Google searches are performed on average every day. However, there is a gap between the Document Web and the Data Web, since for example knowledge bases available on the Data Web are most commonly extracted from structured or semi-structured sources, but the majority of information available on the Web is contained in unstructured sources such as news articles, blog post, photos, forum discussions, etc. As a result, data on the Data Web not only misses a significant fragment of information but also suffers from a lack of actuality since typical extraction methods are time-consuming and can only be carried out periodically. Furthermore, provenance information is rarely taken into consideration and therefore gets lost in the transformation process. In addition, users are accustomed to entering keyword queries to satisfy their information needs. With the availability of machine-readable knowledge bases, lay users could be empowered to issue more specific questions and get more precise answers. In this thesis, we address the problem of Relation Extraction, one of the key challenges pertaining to closing the gap between the Document Web and the Data Web by four means. First, we present a distant supervision approach that allows finding multilingual natural language representations of formal relations already contained in the Data Web. We use these natural language representations to find sentences on the Document Web that contain unseen instances of this relation between two entities. Second, we address the problem of data actuality by presenting a real-time data stream RDF extraction framework and utilize this framework to extract RDF from RSS news feeds. Third, we present a novel fact validation algorithm, based on natural language representations, able to not only verify or falsify a given triple, but also to find trustworthy sources for it on the Web and estimating a time scope in which the triple holds true. The features used by this algorithm to determine if a website is indeed trustworthy are used as provenance information and therewith help to create metadata for facts in the Data Web. Finally, we present a question answering system that uses the natural language representations to map natural language question to formal SPARQL queries, allowing lay users to make use of the large amounts of data available on the Data Web to satisfy their information need

    Advancing Long-Term Care Science Through Using Common Data Elements: Candidate Measures for Care Outcomes of Personhood, Well-Being, and Quality of Life

    Get PDF
    To support the development of internationally comparable common data elements (CDEs) that can be used to measure essential aspects of long-term care (LTC) across low-, middle-, and high-income countries, a group of researchers in medicine, nursing, behavioral, and social sciences from 21 different countries have joined forces and launched the Worldwide Elements to Harmonize Research in LTC Living Environments (WE-THRIVE) initiative. This initiative aims to develop a common data infrastructure for international use across the domains of organizational context, workforce and staffing, person-centered care, and care outcomes, as these are critical to LTC quality, experiences, and outcomes. This article reports measurement recommendations for the care outcomes domain, focusing on previously prioritized care outcomes concepts of well-being, quality of life (QoL), and personhood for residents in LTC. Through literature review and expert ranking, we recommend nine measures of well-being, QoL, and personhood, as a basis for developing CDEs for long-term care outcomes across countries. Data in LTC have often included deficit-oriented measures; while important, reductions do not necessarily mean that residents are concurrently experiencing well-being. Enhancing measurement efforts with the inclusion of these positive LTC outcomes across countries would facilitate international LTC research and align with global shifts toward healthy aging and person-centered LTC models

    Compounding in Namagowab and English: (exploring meaning creation in compounds)

    Get PDF
    This essay investigates compounding in Namagowab and English, which belong to two widely divergent groups of languages, the Khoesan and Indo-European, respectively. The first motive is to investigate how and why new words are created from existing ones. The reading and data interpretation seeks an understanding of word formation and an overview of semantic compositionality, structure and productivity, within the broad context of cognitive, lexicalist and distributed morphology paradigms. This coupled with history reading about the languages and its people, is used to speculate about why compounds feature in lexical creation. Compounding is prevalent in both languages and their distance in terms of phylogenetic relationships should allow limited generalizing about these processes of formation. Word lists taken from dictionaries in both languages were analyzed by entering the words in Excel spreadsheets so that various attributes of these words, such as word type, compound class (Noun, Verb, Preposition, Adjective and Adverb) and constituent class could be counted, and described with formulae, and compound and constituent meaning analyzed. The conclusion was that socio historical factors such as language contact, and aspects of cognition such as memory and transparency, account for compounding in a language in addition to typology

    Active Learning for Reducing Labeling Effort in Text Classification Tasks

    Get PDF
    Labeling data can be an expensive task as it is usually performed manually by domain experts. This is cumbersome for deep learning, as it is dependent on large labeled datasets. Active learning (AL) is a paradigm that aims to reduce labeling effort by only using the data which the used model deems most informative. Little research has been done on AL in a text classification setting and next to none has involved the more recent, state-of-the-art Natural Language Processing (NLP) models. Here, we present an empirical study that compares different uncertainty-based algorithms with BERTbase_{base} as the used classifier. We evaluate the algorithms on two NLP classification datasets: Stanford Sentiment Treebank and KvK-Frontpages. Additionally, we explore heuristics that aim to solve presupposed problems of uncertainty-based AL; namely, that it is unscalable and that it is prone to selecting outliers. Furthermore, we explore the influence of the query-pool size on the performance of AL. Whereas it was found that the proposed heuristics for AL did not improve performance of AL; our results show that using uncertainty-based AL with BERTbase_{base} outperforms random sampling of data. This difference in performance can decrease as the query-pool size gets larger.Comment: Accepted as a conference paper at the joint 33rd Benelux Conference on Artificial Intelligence and the 30th Belgian Dutch Conference on Machine Learning (BNAIC/BENELEARN 2021). This camera-ready version submitted to BNAIC/BENELEARN, adds several improvements including a more thorough discussion of related work plus an extended discussion section. 28 pages including references and appendice
    • …
    corecore