Search CORE

6 research outputs found

Publish-Time Data Integration for Open Data Platforms

Author: Braunschweig Katrin
Damme Patrick
Eberius Julian
Lehner Wolfgang
Thiele Maik
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 16/09/2022
Field of study

Platforms for publication and collaborative management of data, such as Data.gov or Google Fusion Tables, are a new trend on the web. They manage very large corpora of datasets, but often lack an integrated schema, ontology, or even just common publication standards. This results in inconsistent names for attributes of the same meaning, which constrains the discovery of relationships between datasets as well as their reusability. Existing data integration techniques focus on reuse-time, i.e., they are applied when a user wants to combine a specific set of datasets or integrate them with an existing database. In contrast, this paper investigates a novel method of data integration at publish-time, where the publisher is provided with suggestions on how to integrate the new dataset with the corpus as a whole, without resorting to a manually created mediated schema or ontology for the platform. We propose data-driven algorithms that propose alternative attribute names for a newly published dataset based on attribute- and instance statistics maintained on the corpus. We evaluate the proposed algorithms using real-world corpora based on the Open Data Platform opendata.socrata.com and relational data extracted from Wikipedia. We report on the system's response time, and on the results of an extensive crowdsourcing-based evaluation of the quality of the generated attribute names alternatives

Qucosa

HSSS - Hochschulschriftenserver der SLUB

Technische Universität Dresden: Qucosa

Conversion of XML‐based Open Data into Relational Closed Data

Author: Niemi Timo
Näppilä Turkka
Publication venue: Tampereen yliopisto
Publication date: 01/01/2014
Field of study

Trepo - Institutional Repository of Tampere University

Query-Time Data Integration

Author: Eberius Julian
Publication venue
Publication date: 10/12/2015
Field of study

Today, data is collected in ever increasing scale and variety, opening up enormous potential for new insights and data-centric products. However, in many cases the volume and heterogeneity of new data sources precludes up-front integration using traditional ETL processes and data warehouses. In some cases, it is even unclear if and in what context the collected data will be utilized. Therefore, there is a need for agile methods that defer the effort of integration until the usage context is established. This thesis introduces Query-Time Data Integration as an alternative concept to traditional up-front integration. It aims at enabling users to issue ad-hoc queries on their own data as if all potential other data sources were already integrated, without declaring specific sources and mappings to use. Automated data search and integration methods are then coupled directly with query processing on the available data. The ambiguity and uncertainty introduced through fully automated retrieval and mapping methods is compensated by answering those queries with ranked lists of alternative results. Each result is then based on different data sources or query interpretations, allowing users to pick the result most suitable to their information need. To this end, this thesis makes three main contributions. Firstly, we introduce a novel method for Top-k Entity Augmentation, which is able to construct a top-k list of consistent integration results from a large corpus of heterogeneous data sources. It improves on the state-of-the-art by producing a set of individually consistent, but mutually diverse, set of alternative solutions, while minimizing the number of data sources used. Secondly, based on this novel augmentation method, we introduce the DrillBeyond system, which is able to process Open World SQL queries, i.e., queries referencing arbitrary attributes not defined in the queried database. The original database is then augmented at query time with Web data sources providing those attributes. Its hybrid augmentation/relational query processing enables the use of ad-hoc data search and integration in data analysis queries, and improves both performance and quality when compared to using separate systems for the two tasks. Finally, we studied the management of large-scale dataset corpora such as data lakes or Open Data platforms, which are used as data sources for our augmentation methods. We introduce Publish-time Data Integration as a new technique for data curation systems managing such corpora, which aims at improving the individual reusability of datasets without requiring up-front global integration. This is achieved by automatically generating metadata and format recommendations, allowing publishers to enhance their datasets with minimal effort. Collectively, these three contributions are the foundation of a Query-time Data Integration architecture, that enables ad-hoc data search and integration queries over large heterogeneous dataset collections

Technische Universität Dresden: Qucosa

Closing Information Gaps with Need-driven Knowledge Sharing

Author: Happel Hans-Jörg
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2018
Field of study

Informationslücken schließen durch bedarfsgetriebenen Wissensaustausch Systeme zum asynchronen Wissensaustausch – wie Intranets, Wikis oder Dateiserver – leiden häufig unter mangelnden Nutzerbeiträgen. Ein Hauptgrund dafür ist, dass Informationsanbieter von Informationsuchenden entkoppelt, und deshalb nur wenig über deren Informationsbedarf gewahr sind. Zentrale Fragen des Wissensmanagements sind daher, welches Wissen besonders wertvoll ist und mit welchen Mitteln Wissensträger dazu motiviert werden können, es zu teilen. Diese Arbeit entwirft dazu den Ansatz des bedarfsgetriebenen Wissensaustauschs (NKS), der aus drei Elementen besteht. Zunächst werden dabei Indikatoren für den Informationsbedarf erhoben – insbesondere Suchanfragen – über deren Aggregation eine fortlaufende Prognose des organisationalen Informationsbedarfs (OIN) abgeleitet wird. Durch den Abgleich mit vorhandenen Informationen in persönlichen und geteilten Informationsräumen werden daraus organisationale Informationslücken (OIG) ermittelt, die auf fehlende Informationen hindeuten. Diese Lücken werden mit Hilfe so genannter Mediationsdienste und Mediationsräume transparent gemacht. Diese helfen Aufmerksamkeit für organisationale Informationsbedürfnisse zu schaffen und den Wissensaustausch zu steuern. Die konkrete Umsetzung von NKS wird durch drei unterschiedliche Anwendungen illustriert, die allesamt auf bewährten Wissensmanagementsystemen aufbauen. Bei der Inversen Suche handelt es sich um ein Werkzeug das Wissensträgern vorschlägt Dokumente aus ihrem persönlichen Informationsraum zu teilen, um damit organisationale Informationslücken zu schließen. Woogle erweitert herkömmliche Wiki-Systeme um Steuerungsinstrumente zur Erkennung und Priorisierung fehlender Informationen, so dass die Weiterentwicklung der Wiki-Inhalte nachfrageorientiert gestaltet werden kann. Auf ähnliche Weise steuert Semantic Need, eine Erweiterung für Semantic MediaWiki, die Erfassung von strukturierten, semantischen Daten basierend auf Informationsbedarf der in Form strukturierter Anfragen vorliegt. Die Umsetzung und Evaluation der drei Werkzeuge zeigt, dass bedarfsgetriebener Wissensaustausch technisch realisierbar ist und eine wichtige Ergänzung für das Wissensmanagement sein kann. Darüber hinaus bietet das Konzept der Mediationsdienste und Mediationsräume einen Rahmen für die Analyse und Gestaltung von Werkzeugen gemäß der NKS-Prinzipien. Schließlich liefert der hier vorstellte Ansatz auch Impulse für die Weiterentwicklung von Internetdiensten und -Infrastrukturen wie der Wikipedia oder dem Semantic Web

KITopen

User satisfaction model to measure open government data usage

Author: Ahmed Mohammed Shihab
Publication venue
Publication date: 01/01/2020
Field of study

The open government data (OGD) initiative is presented by the government of any country to achieve promotion of transparency, social control and citizens participation in policy making. The use of OGD in Malaysia is still in its early stage and facing problems such as less participation, security issues, and lack of awareness. While most of the research in Information Communication Technology (ICT) that underpinned by Expectation Confirmation Theory (ECT) are focused on user satisfaction and determination of users’ reuse intention, this study focus on the direct antecedents of OGD users’ intention to use and its influence on OGD users’ satisfaction, as this research is still scarce. This research aims to examine ECT model on users’ satisfaction mediated by the intention to use the open government data (OGD). The objectives of this research are in three folds; (1) to design an integrated ECT and TAM models for explaining OGD satisfaction, (2) to examine the mediating role of citizens’ behavioural intention between the expectations, confirmation, perceived performance, incentive on usage, perceived risk and citizen’s satisfaction of open government data, (3) and to validate the impact of incentives on usage and perceived risk in explaining the new ECT model in the OGD context. Data were collected from 250 samples of OGD users in Malaysia. Empirical evidences were gathered through self-administered questionnaires using the Likert scale. The data were analysed using Partially Least Square Structural Equation Modelling (PLS-SEM) in order to test the model. The final model was verified by experts in the area. Results revealed that expectation has significant relationship with confirmation, but perceived performance showed insignificant relationship with confirmation which serves as a unique finding. Additionally, confirmation, expectation, perceived performance, incentive on usage and perceived risk has significant relationship with intention to use OGD. Meanwhile, the analysis proved that the intention to use mediates the relationship between confirmation, expectation, perceived performance, incentive on usage, perceived risk and satisfaction on use of OGD. This study suggests that the user’s expectations on OGD must be met in creating stronger intention and satisfaction. The implications of the study are to improve data service quality, support innovative services development, increase data transparency, and boost up potential investment

Universiti Utara Malaysia: UUM eTheses