205 research outputs found
Integration of heterogeneous data sources and automated reasoning in healthcare and domotic IoT systems
In recent years, IoT technology has radically transformed many crucial industrial and service sectors such as healthcare. The multi-facets heterogeneity of the devices and the collected information provides important opportunities to develop innovative systems and services. However, the ubiquitous presence of data silos and the poor semantic interoperability in the IoT landscape constitute a significant obstacle in the pursuit of this goal. Moreover, achieving actionable knowledge from the collected data requires IoT information sources to be analysed using appropriate artificial intelligence techniques such as automated reasoning. In this thesis work, Semantic Web technologies have been investigated as an approach to address both the data integration and reasoning aspect in modern IoT systems. In particular, the contributions presented in this thesis are the following: (1) the IoT Fitness Ontology, an OWL ontology that has been developed in order to overcome the issue of data silos and enable semantic interoperability in the IoT fitness domain; (2) a Linked Open Data web portal for collecting and sharing IoT health datasets with the research community; (3) a novel methodology for embedding knowledge in rule-defined IoT smart home scenarios; and (4) a knowledge-based IoT home automation system that supports a seamless integration of heterogeneous devices and data sources
The automatic processing of multiword expressions in Irish
It is well-documented that Multiword Expressions (MWEs) pose a unique challenge
to a variety of NLP tasks such as machine translation, parsing, information retrieval,
and more. For low-resource languages such as Irish, these challenges can be exacerbated by the scarcity of data, and a lack of research in this topic. In order to
improve handling of MWEs in various NLP tasks for Irish, this thesis will address
both the lack of resources specifically targeting MWEs in Irish, and examine how
these resources can be applied to said NLP tasks.
We report on the creation and analysis of a number of lexical resources as part
of this PhD research. Ilfhocail, a lexicon of Irish MWEs, is created through extract-
ing MWEs from other lexical resources such as dictionaries. A corpus annotated
with verbal MWEs in Irish is created for the inclusion of Irish in the PARSEME
Shared Task 1.2. Additionally, MWEs were tagged in a bilingual EN-GA corpus
for inclusion in experiments in machine translation. For the purposes of annotation, a categorisation scheme for nine categories of MWEs in Irish is created, based
on combining linguistic analysis on these types of constructions and cross-lingual
frameworks for defining MWEs.
A case study in applying MWEs to NLP tasks is undertaken, with the exploration of incorporating MWE information while training Neural Machine Translation
systems. Finally, the topic of automatic identification of Irish MWEs is explored,
documenting the training of a system capable of automatically identifying Irish
MWEs from a variety of categories, and the challenges associated with developing
such a system.
This research contributes towards a greater understanding of Irish MWEs and
their applications in NLP, and provides a foundation for future work in exploring
other methods for the automatic discovery and identification of Irish MWEs, and
further developing the MWE resources described above
Facilitating Information Access for Heterogeneous Data Across Many Languages
Information access, which enables people to identify, retrieve, and use information freely and effectively, has attracted interest from academia and industry. Systems for document retrieval and question answering have helped people access information in powerful and useful ways. Recently, natural language technologies based on neural network have been applied to various tasks for information access. Specifically, transformer-based pre-trained models have pushed tasks such as document and passage retrieval to new state-of-the-art effectiveness. (1) Most of the research has focused on helping people access passages and documents on the web. However, there is abundant information stored in other formats such as semi-structured tables and domain-specific relational databases in companies. Development of the models and frameworks that support access information from these data formats is also essential. (2) Moreover, most of the advances in information access research are based on English, leaving other languages less explored. It is insufficient and inequitable in our globalized and connected world to serve only speakers of English.
In this thesis, we explore and develop models and frameworks that could alleviate the aforementioned challenges. This dissertation consists of three parts. We begin with a discussion on developing models designed for accessing data in formats other than passages and documents. We mainly focus on two data formats, namely semi-structured tables and relational databases. In the second part, we discuss methods that can enhance the user experience for non-English speakers when using information access systems. Specifically, we first introduce model development for multilingual knowledge graph integration, which can benefit many information access applications such as cross-lingual question answering systems and other knowledge-driven cross-lingual NLP applications. We further focus on multilingual document dense retrieval and reranking that boost the effectiveness of search engines for non-English information access. Last but not least, we take a step further based on the aforementioned two parts by investigating models and frameworks that can facilitate non-English speakers to access structured data. In detail, we present cross-lingual Text-to-SQL semantic parsing systems that enable non-English speakers to query relational databases with queries in their languages
Digital 3D reconstruction as a research environment in art and architecture history: uncertainty classification and visualisation
The dissertation addresses the still not solved challenges concerned with the source-based digital 3D reconstruction, visualisation and documentation in the domain of archaeology, art and architecture history.
The emerging BIM methodology and the exchange data format IFC are changing the way of collaboration, visualisation and documentation in the planning, construction and facility management process. The introduction and development of the Semantic Web (Web 3.0), spreading the idea of structured, formalised and linked data, offers semantically enriched human- and machine-readable data.
In contrast to civil engineering and cultural heritage, academic object-oriented disciplines, like archaeology, art and architecture history, are acting as outside spectators.
Since the 1990s, it has been argued that a 3D model is not likely to be considered a scientific reconstruction unless it is grounded on accurate documentation and visualisation. However, these standards are still missing and the validation of the outcomes is not fulfilled. Meanwhile, the digital research data remain ephemeral and continue to fill the growing digital cemeteries.
This study focuses, therefore, on the evaluation of the source-based digital 3D reconstructions and, especially, on uncertainty assessment in the case of hypothetical reconstructions of destroyed or never built artefacts according to scientific principles, making the models shareable and reusable by a potentially wide audience.
The work initially focuses on terminology and on the definition of a workflow especially related to the classification and visualisation of uncertainty. The workflow is then applied to specific cases of 3D models uploaded to the DFG repository of the AI Mainz. In this way, the available methods of documenting, visualising and communicating uncertainty are analysed.
In the end, this process will lead to a validation or a correction of the workflow and the initial assumptions, but also (dealing with different hypotheses) to a better definition of the levels of uncertainty
Bibliographic Control in the Digital Ecosystem
With the contributions of international experts, the book aims to explore the new boundaries of universal bibliographic control. Bibliographic control is radically changing because the bibliographic universe is radically changing: resources, agents, technologies, standards and practices. Among the main topics addressed: library cooperation networks; legal deposit; national bibliographies; new tools and standards (IFLA LRM, RDA, BIBFRAME); authority control and new alliances (Wikidata, Wikibase, Identifiers); new ways of indexing resources (artificial intelligence); institutional repositories; new book supply chain; “discoverability” in the IIIF digital ecosystem; role of thesauri and ontologies in the digital ecosystem; bibliographic control and search engines
BIM-Based Life Cycle Sustainability Assessment for Buildings
In recent years, the progress of digitization in the architecture and construction sectors has produced enormous advances in the automation of analysis and evaluation processes. This is the case with environmental analysis systems, such as the life cycle analysis. Methodology practitioners have found a fundamental ally in the building information modeling platforms, which allow tasks that conventionally consume large amounts of energy and time to be carried out more automatically and efficiently. In this publication, the reader will find some of the latest advances in this area
Thinking outside the graph: scholarly knowledge graph construction leveraging natural language processing
Despite improved digital access to scholarly knowledge in recent decades, scholarly communication remains exclusively document-based.
The document-oriented workflows in science publication have reached the limits of adequacy as highlighted by recent discussions on the increasing proliferation of scientific literature, the deficiency of peer-review and the reproducibility crisis.
In this form, scientific knowledge remains locked in representations that are inadequate for machine processing.
As long as scholarly communication remains in this form, we cannot take advantage of all the advancements taking place in machine learning and natural language processing techniques.
Such techniques would facilitate the transformation from pure text based into (semi-)structured semantic descriptions that are interlinked in a collection of big federated graphs.
We are in dire need for a new age of semantically enabled infrastructure adept at storing, manipulating, and querying scholarly knowledge.
Equally important is a suite of machine assistance tools designed to populate, curate, and explore the resulting scholarly knowledge graph.
In this thesis, we address the issue of constructing a scholarly knowledge graph using natural language processing techniques.
First, we tackle the issue of developing a scholarly knowledge graph for structured scholarly communication, that can be populated and constructed automatically.
We co-design and co-implement the Open Research Knowledge Graph (ORKG), an infrastructure capable of modeling, storing, and automatically curating scholarly communications.
Then, we propose a method to automatically extract information into knowledge graphs.
With Plumber, we create a framework to dynamically compose open information extraction pipelines based on the input text.
Such pipelines are composed from community-created information extraction components in an effort to consolidate individual research contributions under one umbrella.
We further present MORTY as a more targeted approach that leverages automatic text summarization to create from the scholarly article's text structured summaries containing all required information.
In contrast to the pipeline approach, MORTY only extracts the information it is instructed to, making it a more valuable tool for various curation and contribution use cases.
Moreover, we study the problem of knowledge graph completion.
exBERT is able to perform knowledge graph completion tasks such as relation and entity prediction tasks on scholarly knowledge graphs by means of textual triple classification.
Lastly, we use the structured descriptions collected from manual and automated sources alike with a question answering approach that builds on the machine-actionable descriptions in the ORKG.
We propose JarvisQA, a question answering interface operating on tabular views of scholarly knowledge graphs i.e., ORKG comparisons.
JarvisQA is able to answer a variety of natural language questions, and retrieve complex answers on pre-selected sub-graphs.
These contributions are key in the broader agenda of studying the feasibility of natural language processing methods on scholarly knowledge graphs, and lays the foundation of which methods can be used on which cases.
Our work indicates what are the challenges and issues with automatically constructing scholarly knowledge graphs, and opens up future research directions
- …