Search CORE

10 research outputs found

MinoanER: Schema-Agnostic, Non-Iterative, Massively Parallel Resolution of Web Entities

Author: Christophides Vassilis
Efthymiou Vasilis
Papadakis George
Stefanidis Kostas
Publication venue
Publication date: 15/05/2019
Field of study

Entity Resolution (ER) aims to identify different descriptions in various Knowledge Bases (KBs) that refer to the same entity. ER is challenged by the Variety, Volume and Veracity of entity descriptions published in the Web of Data. To address them, we propose the MinoanER framework that simultaneously fulfills full automation, support of highly heterogeneous entities, and massive parallelization of the ER process. MinoanER leverages a token-based similarity of entities to define a new metric that derives the similarity of neighboring entities from the most important relations, as they are indicated only by statistics. A composite blocking method is employed to capture different sources of matching evidence from the content, neighbors, or names of entities. The search space of candidate pairs for comparison is compactly abstracted by a novel disjunctive blocking graph and processed by a non-iterative, massively parallel matching algorithm that consists of four generic, schema-agnostic matching rules that are quite robust with respect to their internal configuration. We demonstrate that the effectiveness of MinoanER is comparable to existing ER tools over real KBs exhibiting low Variety, but it outperforms them significantly when matching KBs with high Variety.Comment: Presented at EDBT 2001

arXiv.org e-Print Archive

TamPub Julkaisuarkisto - TamPub Institutional Repository

Trepo - Institutional Repository of Tampere University

Automatic Table Extension with Open Data

Author: Kleppmann Benedikt
Publication venue: Technological University Dublin
Publication date: 01/01/2018
Field of study

With thousands of data sources available on the web as well as within organisations, data scientists increasingly spend more time searching for data than analysing it. To ease the task of find and integrating relevant data for data mining projects, this dissertation presents two new methods for automatic table extension. Automatic table extension systems take over the task of tata discovery and data integration by adding new columns with new information (new attributes) to any table. The data values in the new columns are extracted from a given corpus of tables

Arrow@TUDublin

Explaining differences between unaligned table snapshots

Author: Fink Manuel
Meilicke Christian
Stuckenschmidt Heiner
Publication venue: OpenProceedings.org
Publication date: 01/01/2020
Field of study

We study the problem of explaining differences between two snapshots of the same database table including record insertions, deletions and in particular record updates. Unlike existing alternatives, our solution induces transformation functions and does not require knowledge of the correct alignment between the record sets. This allows profiling snapshots of tables with unspecified or modified primary keys. In such a problem setting, there are always multiple explanations for the differences. Our goal is to find the simplest explanation. We propose to measure the complexity of explanations on the basis of minimum description length in order to formulate the task as an optimization problem. We show that the problem is NP-hard and propose a heuristic search algorithm to solve practical problem instances. We implement a prototype called Affidavit to assess the explanatory qualities of our approach in experiments based on different real-world data sets. We show that it can scale to both a large number of records and attributes and is able to reliably provide correct explanations under practical levels of modifications

MAnnheim DOCument Server

Monitor Newsletter January 12, 1998

Author: Bowling Green State University
Publication venue: ScholarWorks@BGSU
Publication date: 12/01/1998
Field of study

Official Publication of Bowling Green State University for Faculty and Staffhttps://scholarworks.bgsu.edu/monitor/1481/thumbnail.jp

Bowling Green State University: ScholarWorks@BGSU

Automating Industrial Event Stream Analytics: Methods, Models, and Tools

Author: Zehnder Philipp
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 02/04/2022
Field of study

Industrial event streams are an important cornerstone of Industrial Internet of Things (IIoT) applications. For instance, in the manufacturing domain, such streams are typically produced by distributed industrial assets at high frequency on the shop floor. To add business value and extract the full potential of the data (e.g. through predictive quality assessment or maintenance), industrial event stream analytics is an essential building block. One major challenge is the distribution of required technical and domain knowledge across several roles, which makes the realization of analytics projects time-consuming and error-prone. For instance, accessing industrial data sources requires a high level of technical skills due to a large heterogeneity of protocols and formats. To reduce the technical overhead of current approaches, several problems must be addressed. The goal is to enable so-called "citizen technologists" to evaluate event streams through a self-service approach. This requires new methods and models that cover the entire data analytics cycle. In this thesis, the research question is answered, how citizen technologists can be facilitated to independently perform industrial event stream analytics. The first step is to investigate how the technical complexity of modeling and connecting industrial data sources can be reduced. Subsequently, it is analyzed how the event streams can be automatically adapted (directly at the edge), to meet the requirements of data consumers and the infrastructure. Finally, this thesis examines how machine learning models for industrial event streams can be trained in an automated way to evaluate previously integrated data. The main research contributions of this work are: 1. A semantics-based adapter model to describe industrial data sources and to automatically generate adapter instances on edge nodes. 2. An extension for publish-subscribe systems that dynamically reduces event streams while considering requirements of downstream algorithms. 3. A novel AutoML approach to enable citizen data scientists to train and deploy supervised ML models for industrial event streams. The developed approaches are fully implemented in various high-quality software artifacts. These have been integrated into a large open-source project, which enables rapid adoption of the novel concepts into real-world environments. For the evaluation, two user studies to investigate the usability, as well as performance and accuracy tests of the individual components were performed

KITopen

Abschlussbericht des Forschungsprojekts "Broker für Dynamische Produktionsnetzwerke"

Der Broker für dynamische Produktionsnetzwerke (DPNB) ist ein vom Bundesministerium für Bildung und Forschung (BMBF) gefördertes und durch den Projektträger Karlsruhe (PTKA) betreutes Forschungsprojekt zwischen sieben Partnern aus Wissenschaft und Wirtschaft mit einer Laufzeit von Januar 2019 bis einschließlich Dezember 2021. Über den Einsatz von Cloud Manufacturing sowie Hard- und Software-Komponenten bei den teilnehmenden Unternehmen, sollen Kapazitätsanbieter mit Kapazitätsnachfrager verbunden werden. Handelbare Kapazitäten sind in diesem Falle Maschinen-, sowie Transport- und Montagekapazitäten, um Supply Chains anhand des Anwendungsfalls der Blechindustrie möglichst umfassend abzubilden. Der vorliegende Abschlussbericht fasst den Stand der Technik sowie die Erkenntnisse aus dem Projekt zusammen. Außerdem wird ein Überblick über die Projektstruktur sowie die Projektpartner gegeben

KITopen

Web table integration and profiling for knowledge base augmentation

Author: Lehmberg Oliver
Publication venue
Publication date: 01/01/2019
Field of study

HTML tables on web pages ("web tables") have been used successfully as a data source for several applications. They can be extracted from web pages on a large-scale, resulting in corpora of millions of web tables. But, until today only little is known about the general distribution of topics and specific types of data that are contained in the tables that can be found on the Web. But this knowledge is essential to understanding the potential application areas and topical coverage of web tables as a data source. Such knowledge can be obtained through the integration of web tables with a knowledge base, which enables the semantic interpretation of their content and allows for their topical profiling. In turn, the knowledge base can be augmented by adding new statements from the web tables. This is challenging, because the data volume and variety are much larger than in traditional data integration scenarios, in which only a small number of data sources is integrated. The contributions of this thesis are methods for the integration of web tables with a knowledge base and the profiling of large-scale web table corpora through the application of these methods. For this profiling, two corpora of 147 million and 233 million web tables, respectively, are created and made publicly available. These corpora are two of only three that are openly available for research on web tables. Their data profile reveals that most web tables have only very few rows, with a median of 6 rows per web table, and between 35% and 52% of all columns contain non-textual values, such as numbers or dates. These two characteristics have been mostly ignored in the literature about web tables and are addressed by the methods presented in this thesis. The first method, T2K Match, is an algorithm for semantic table interpretation that annotates web tables with classes, properties, and entities from a knowledge base. Other than most algorithms for these tasks, it is not limited to the annotation of columns that contain the names of entities. Its application to a large-scale web table corpus results in the most fine-grained topical data profile of web tables at the time of writing, but also reveals that small web tables cannot be processed with high quality. For such small web tables, a method that stitches them into larger tables is presented and shown to drastically improve the quality of the results. The data profile further shows that the majority of the columns in the web tables, where classes and entities can be recognised, have no corresponding properties in the knowledge base. This makes them candidates for new properties that can be added to the knowledge base. The current methods for this task, however, suffer from the oversimplified assumption that web tables only contain binary relations. This results in the extraction of incomplete relations from the web tables as new properties and makes their correct interpretation impossible. To increase the completeness, a method is presented that generates additional data from the context of the web tables and synthesizes n-ary relations from all web tables of a web site. The application of this method to the second large-scale web table corpus shows that web tables contain a large number of n-ary relations. This means that the data contained in web tables is of higher complexity than previously assumed

MAnnheim DOCument Server

An analysis of the education potential of sites in the Cape Peninsula for secondary school fieldwork in environmental studies

Author: Nightingale Charles S
Publication venue: Department of Environmental and Geographical Science
Publication date: 01/01/1977
Field of study

In South African secondary schools much less fieldwork is undertaken than in a number of other countries despite fieldwork being required by some school syllabuses and the fact that, in many areas, suitable sites are ready to hand. In an attempt to assess the nature of future demands for fieldwork sites, this study reviews developments in education which lead to increasing emphasis on teaching outside the classroom, and the reasons why so little fieldwork is being done are analyzed. A methodology is developed for selecting fieldwork sites taking into account educational priorities and practical constraints. This is worked out in practice by drawing up a fieldwork syllabus for a particular school, and selecting sites in the Cape Peninsula for field studies. Finally, the educational potential of a sample of these sites is indicated by means of exercises prepared for secondary school children

Cape Town University OpenUCT

The nuclear-conventional nexus in Western military planning for European contingencies

Author: Butfoy Andrew
Publication venue
Publication date: 01/01/1988
Field of study

The nuclear-conventional nexus is central to many peacetime intra-Alliance debates, and it is a critical reference point for military planners. The linkage between nuclear and conventional military power also provides a distinctive dimension to the control of military operations during crisis and war.The management of this nexus is dependent on evolving political and operational factors such as: trans-Atlantic diplomacy and European political developments; the modernisation of theatre nuclear forces and doctrine; and the prospects for nuclear proliferation. It is argued that planning for nuclear and convention.al military units in and around Europe should be reviewed within the context of a shift in doctrine that more clearly addresses the requirements of crisis management. For this to occur strategic analysis should recognise how regional political factors both reflect, and help to mould, the juxtapositioning of nuclear and conventional military power. Such analysis would show that, within Europe, nuclear and conventional forces have acquired overlapping but not coterminous roles. These ideas are developed within an analytical framework which brings together: a discussion of the nature of strategy; a history of the nuclear-conventional nexus; and an examination of factors affecting the character of the linkage between nuclear and conventional forces in Europe

The Australian National University

WInte.r - a web data integration framework

Author: Bizer Christian
Brinkmann Alexander
Lehmberg Oliver
Publication venue: RWTH
Publication date: 01/01/2017
Field of study

The Web provides a plethora of structured data, such as semantic annotations in web pages, data from HTML tables, datasets from open data portals, or linked data from the Linked Open Data Cloud. For many use cases, it is necessary to integrate such web data with existing local datasets. This integration entails schema matching, identity resolution, as well as data fusion. As an alternative to using a combination of partial or ad hoc solutions, this poster presents the Web Data Integration Framework (WInte.r ), which supports end-to-end data integration by providing algorithms and building blocks for data pre-processing, schema matching, and identity resolution, as well as data fusion. While being fully usable out-of-the box, the framework is highly customisable and allows for the composition of sophisticated integration architectures such as T2K Match, which is used to match millions of web tables against DBpedia. A second use case for which WInte.r was employed is the task of stitching (combining) web tables from the same web site into larger tables as a preprocessing step before matching. The WInte.r framework is written in Java and is available as open source under the Apache 2.0 license

MAnnheim DOCument Server