Search CORE

10 research outputs found

Query-Time Data Integration

Author: Eberius Julian
Publication venue
Publication date: 10/12/2015
Field of study

Today, data is collected in ever increasing scale and variety, opening up enormous potential for new insights and data-centric products. However, in many cases the volume and heterogeneity of new data sources precludes up-front integration using traditional ETL processes and data warehouses. In some cases, it is even unclear if and in what context the collected data will be utilized. Therefore, there is a need for agile methods that defer the effort of integration until the usage context is established. This thesis introduces Query-Time Data Integration as an alternative concept to traditional up-front integration. It aims at enabling users to issue ad-hoc queries on their own data as if all potential other data sources were already integrated, without declaring specific sources and mappings to use. Automated data search and integration methods are then coupled directly with query processing on the available data. The ambiguity and uncertainty introduced through fully automated retrieval and mapping methods is compensated by answering those queries with ranked lists of alternative results. Each result is then based on different data sources or query interpretations, allowing users to pick the result most suitable to their information need. To this end, this thesis makes three main contributions. Firstly, we introduce a novel method for Top-k Entity Augmentation, which is able to construct a top-k list of consistent integration results from a large corpus of heterogeneous data sources. It improves on the state-of-the-art by producing a set of individually consistent, but mutually diverse, set of alternative solutions, while minimizing the number of data sources used. Secondly, based on this novel augmentation method, we introduce the DrillBeyond system, which is able to process Open World SQL queries, i.e., queries referencing arbitrary attributes not defined in the queried database. The original database is then augmented at query time with Web data sources providing those attributes. Its hybrid augmentation/relational query processing enables the use of ad-hoc data search and integration in data analysis queries, and improves both performance and quality when compared to using separate systems for the two tasks. Finally, we studied the management of large-scale dataset corpora such as data lakes or Open Data platforms, which are used as data sources for our augmentation methods. We introduce Publish-time Data Integration as a new technique for data curation systems managing such corpora, which aims at improving the individual reusability of datasets without requiring up-front global integration. This is achieved by automatically generating metadata and format recommendations, allowing publishers to enhance their datasets with minimal effort. Collectively, these three contributions are the foundation of a Query-time Data Integration architecture, that enables ad-hoc data search and integration queries over large heterogeneous dataset collections

Technische Universität Dresden: Qucosa

Enabling automatic provenance-based trust assessment of web content

Author: De Nies Tom
Publication venue: Ghent University. Faculty of Engineering and Architecture
Publication date: 01/01/2016
Field of study

Ghent University Academic Bibliography

Recommended from our members

Usable and Scalable Querying of Scientific Datasets

Author: McCamish Benjamin J.
Publication venue: 'Oregon State University'
Publication date
Field of study

Scientists and engineers have to analyze and query multiple large databases. Analysis over databases created by phasor measurement units can provide insight into the health of the grid, thereby improving control over operations. Realizing this data-driven control, however, requires validating, processing and storing massive amounts of PMU data efficiently, which is not always achieved with modern systems. Furthermore, users should know formal query languages, such as SQL, and the structure and content of the database to use these systems. But, scientists do not usually know concepts, such as query languages, and the content and structure of the databases. Finally, the information related to most queries is spread across multiple data sources, where each represents information in a distinct form. Traditionally, users have to write programming rules to integrate the data in these data sources into one database with a homogeneous structure. This, however, takes a great deal of time and effort. Moreover, end-users often do not have the required programming background and expertise to write and maintain these rules. To address these challenges, we proposed novel methods to query multiple large databases easily and efficiently. We also describe a PMU data management system that supports input from multiple PMU data streams, features an event-detection algorithm, and provides an efficient method for retrieving archival data. To make database systems more usable, database systems offer keyword query interfaces where users do not need to know formal query languages and content and structure of the schema. As keyword queries are inherently ambiguous, it is challenging for database systems to answer them precisely. Using extensive empirical studies, we show that users explore and learn to formulate more precise keyword queries in their course of interaction with the database system. We propose an effective and efficient online learning algorithm that adapts to the user learning in the interaction with convergence guarantees. Furthermore, we set forth a novel approach to learning rules to integrate and query multiple databases progressively using end-user feedback. In our framework, each data source learns to translate its information to a form compatible with other data sources. We show that our method delivers effective rules using a modest number of interactions with the end-user

ScholarsArchive@OSU

European Language Grid

Author
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 18/11/2022
Field of study

This open access book provides an in-depth description of the EU project European Language Grid (ELG). Its motivation lies in the fact that Europe is a multilingual society with 24 official European Union Member State languages and dozens of additional languages including regional and minority languages. The only meaningful way to enable multilingualism and to benefit from this rich linguistic heritage is through Language Technologies (LT) including Natural Language Processing (NLP), Natural Language Understanding (NLU), Speech Technologies and language-centric Artificial Intelligence (AI) applications. The European Language Grid provides a single umbrella platform for the European LT community, including research and industry, effectively functioning as a virtual home, marketplace, showroom, and deployment centre for all services, tools, resources, products and organisations active in the field. Today the ELG cloud platform already offers access to more than 13,000 language processing tools and language resources. It enables all stakeholders to deposit, upload and deploy their technologies and datasets. The platform also supports the long-term objective of establishing digital language equality in Europe by 2030 – to create a situation in which all European languages enjoy equal technological support. This is the very first book dedicated to Language Technology and NLP platforms. Cloud technology has only recently matured enough to make the development of a platform like ELG feasible on a larger scale. The book comprehensively describes the results of the ELG project. Following an introduction, the content is divided into four main parts: (I) ELG Cloud Platform; (II) ELG Inventory of Technologies and Resources; (III) ELG Community and Initiative; and (IV) ELG Open Calls and Pilot Projects

Directory of Open Access Books (DOAB)

Advances in database technology - EDBT 2016: 19th International Conference on Extending Database Technology, Bordeaux, France, March 15-18, 2016 : proceedings

Author
Publication venue: University of Konstanz, University Library
Publication date: 01/01/2016
Field of study

Digitale Bibliothek Thüringen

European Language Grid

Author
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

OAPEN Library

Ubiquitous Computing

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

The aim of this book is to give a treatment of the actively developed domain of Ubiquitous computing. Originally proposed by Mark D. Weiser, the concept of Ubiquitous computing enables a real-time global sensing, context-aware informational retrieval, multi-modal interaction with the user and enhanced visualization capabilities. In effect, Ubiquitous computing environments give extremely new and futuristic abilities to look at and interact with our habitat at any time and from anywhere. In that domain, researchers are confronted with many foundational, technological and engineering issues which were not known before. Detailed cross-disciplinary coverage of these issues is really needed today for further progress and widening of application range. This book collects twelve original works of researchers from eleven countries, which are clustered into four sections: Foundations, Security and Privacy, Integration and Middleware, Practical Applications

Directory of Open Access Books (DOAB)

Experimental Evaluation of Growing and Pruning Hyper Basis Function Neural Networks Trained with Extended Information Filter

Author: Miljković Zoran
Mitić Marko
Petronijević Jelena
Petrović Milica
Vuković Najdan
Publication venue: Society for Information Systems and Computer Networks
Publication date: 01/01/2015
Field of study

In this paper we test Extended Information Filter (EIF) for sequential training of Hyper Basis Function Neural Networks with growing and pruning ability (HBF-GP). The HBF neuron allows different scaling of input dimensions to provide better generalization property when dealing with complex nonlinear problems in engineering practice. The main intuition behind HBF is in generalization of Gaussian type of neuron that applies Mahalanobis-like distance as a distance metrics between input training sample and prototype vector. We exploit concept of neuron’s significance and allow growing and pruning of HBF neurons during sequential learning process. From engineer’s perspective, EIF is attractive for training of neural networks because it allows a designer to have scarce initial knowledge of the system/problem. Extensive experimental study shows that HBF neural network trained with EIF achieves same prediction error and compactness of network topology when compared to EKF, but without the need to know initial state uncertainty, which is its main advantage over EKF

Machinery - Repository of the Faculty of Mechanical Engineering, University of Belgrade

machinery

Bioinspired metaheuristic algorithms for global optimization

Author: Diryag Ali
Miljković Zoran
Mitić Marko
Petronijević Jelena
Petrović Milica
Vuković Najdan
Publication venue: Society for Information Systems and Computer Networks
Publication date: 01/01/2015
Field of study

This paper presents concise comparison study of newly developed bioinspired algorithms for global optimization problems. Three different metaheuristic techniques, namely Accelerated Particle Swarm Optimization (APSO), Firefly Algorithm (FA), and Grey Wolf Optimizer (GWO) are investigated and implemented in Matlab environment. These methods are compared on four unimodal and multimodal nonlinear functions in order to find global optimum values. Computational results indicate that GWO outperforms other intelligent techniques, and that all aforementioned algorithms can be successfully used for optimization of continuous functions

Machinery - Repository of the Faculty of Mechanical Engineering, University of Belgrade

machinery

Knowledge Co-production on Air Quality – The Role of Planning Research in Participatory, Healthy, and People-Centered Cities

Author: Lissandrello Enza
Nørgaard Lasse Schytt
Steffansen Rasmus Nedergård
Publication venue
Publication date: 01/06/2022
Field of study

VBN