Search CORE

37 research outputs found

Cardinality estimation in ETL processes

Author: Kiefer Tim
Lehner Wolfgang
Thiele Maik
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 22/04/2022
Field of study

The cardinality estimation in ETL processes is particularly difficult. Aside from the well-known SQL operators, which are also used in ETL processes, there are a variety of operators without exact counterparts in the relational world. In addition to those, we find operators that support very specific data integration aspects. For such operators, there are no well-examined statistic approaches for cardinality estimations. Therefore, we propose a black-box approach and estimate the cardinality using a set of statistic models for each operator. We discuss different model granularities and develop an adaptive cardinality estimation framework for ETL processes. We map the abstract model operators to specific statistic learning approaches (regression, decision trees, support vector machines, etc.) and evaluate our cardinality estimations in an extensive experimental study

Qucosa

HSSS - Hochschulschriftenserver der SLUB

Technische Universität Dresden: Qucosa

Business Intelligence Technology, Applications, and Trends

Author: North Max
Obeidat Muhammad
Rattanak Vebol
Richardson Ronny
Publication venue: DigitalCommons@Kennesaw State University
Publication date: 01/02/2015
Field of study

Enterprises are considering substantial investment in Business Intelligence (BI) theories and technologies to maintain their competitive advantages. BI allows massive diverse data collected from virus sources to be transformed into useful information, allowing more effective and efficient production. This paper briefly and broadly explores the business intelligence technology, applications and trends while provides a few stimulating and innovate theories and practices. The authors also explore several contemporary studies related to the future of BI and surrounding fields

DigitalCommons@Kennesaw State University

Graph-Based ETL Processes For Warehousing Statistical Open Data

Author: Berro Alain
Megdiche-Bousarsar Imen
Teste Olivier
Publication venue: 'Scitepress'
Publication date: 01/01/2015
Field of study

ICEIS 2015 will be held in conjunction with ENASE 2015 and GISTAM 2015International audienceWarehousing is a promising mean to cross and analyse Statistical Open Data (SOD). But extracting structures, integrating and defining multidimensional schema from several scattered and heterogeneous tables in the SOD are major problems challenging the traditional ETL (Extract-Transform-Load) processes. In this paper, we present a three step ETL processes which rely on RDF graphs to meet all these problems. In the first step, we automatically extract tables structures and values using a table anatomy ontology. This phase converts structurally heterogeneous tables into a unified RDF graph representation. The second step performs a holistic integration of several semantically heterogeneous RDF graphs. The optimal integration is performed through an Integer Linear Program (ILP). In the third step, system interacts with users to incrementally transform the integrated RDF graph into a multidimensional schema

Scientific Publications of the University of Toulouse II Le Mirail

Open Archive Toulouse Archive Ouverte

Toulouse Capitole Publications

Toulouse 1 Capitole Publications

A Framework for Real-time Analysis in OLAP Systems

Author: Baluch Omer
Publication venue
Publication date: 18/03/2012
Field of study

OLAP systems are designed to quickly answer multi-dimensional queries against large data warehouse systems. Constructing data cubes and their associated indexes is time consuming and computationally expensive, and for this reason, data cubes are only refreshed periodically. Increasingly, organizations are demanding for both historical and predictive analysis based on the most current data. This trend has also placed the requirement on OLAP systems to merge updates at a much faster rate than before. In this thesis, we proposes a framework for OLAP systems that enables updates to be merged with data cubes in soft real-time. We apply a strategy of local partitioning of the data cube, and maintain a ``hot'' partition for each materialized view to merge update data. We augment this strategy by applying multi-core processing using the OpenMP library to accelerate data cube construction and query resolution. Experiments using a data cube with 10,000,000 tuples and an update set of 100,000 tuples show that our framework achieves a 99% performance improvement updating the data cube, a 76% performance increase when constructing a new data cube, and a 72% performance increase when resolving a range query against a data cube with 1,000,000 tuples

Concordia University Research Repository

Beiträge zu Business Intelligence und IT-Compliance

Author: Kehlenbeck Matthias
Publication venue: Hannover : Gottfried Wilhelm Leibniz Universität Hannover
Publication date: 01/01/2011
Field of study

[no abstract

Institutionelles Repositorium der Leibniz Universität Hannover

Multi-Objective Materialized View Selection in Data-Intensive Flows

Author: Nadal Francesch Sergi
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2015
Field of study

In this thesis we present Forge, a tool for automating multi-objective materialization of intermediate results in data-intensive flows, driven by a set of different quality objectives. We report initial evaluation results, showing the feasibility and efficiency of our approach

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

1st doctoral symposium of the international conference on software language engineering (SLE) : collected research abstracts, October 11, 2010, Eindhoven, The Netherlands

Author
Publication venue: Technische Universiteit Eindhoven
Publication date: 01/01/2010
Field of study

The first Doctoral Symposium to be organised by the series of International Conferences on Software Language Engineering (SLE) will be held on October 11, 2010 in Eindhoven, as part of the 3rd instance of SLE. This conference series aims to integrate the different sub-communities of the software-language engineering community to foster cross-fertilisation and strengthen research overall. The Doctoral Symposium at SLE 2010 aims to contribute towards these goals by providing a forum for both early and late-stage Ph.D. students to present their research and get detailed feedback and advice from researchers both in and out of their particular research area. Consequently, the main objectives of this event are: – to give Ph.D. students an opportunity to write about and present their research; – to provide Ph.D. students with constructive feedback from their peers and from established researchers in their own and in different SLE sub-communities; – to build bridges for potential research collaboration; and – to foster integrated thinking about SLE challenges across sub-communities. All Ph.D. students participating in the Doctoral Symposium submitted an extended abstract describing their doctoral research. Based on a good set of submisssions we were able to accept 13 submissions for participation in the Doctoral Symposium. These proceedings present final revised versions of these accepted research abstracts. We are particularly happy to note that submissions to the Doctoral Symposium covered a wide range of SLE topics drawn from all SLE sub-communities. In selecting submissions for the Doctoral Symposium, we were supported by the members of the Doctoral-Symposium Selection Committee (SC), representing senior researchers from all areas of the SLE community.We would like to thank them for their substantial effort, without which this Doctoral Symposium would not have been possible. Throughout, they have provided reviews that go beyond the normal format of a review being extra careful in pointing out potential areas of improvement of the research or its presentation. Hopefully, these reviews themselves will already contribute substantially towards the goals of the symposium and help students improve and advance their work. Furthermore, all submitting students were also asked to provide two reviews for other submissions. The members of the SC went out of their way to comment on the quality of these reviews helping students improve their reviewing skills

Pure OAI Repository

1st doctoral symposium of the international conference on software language engineering (SLE) : collected research abstracts, October 11, 2010, Eindhoven, The Netherlands

Author
Publication venue: Technische Universiteit Eindhoven
Publication date: 01/01/2010
Field of study

Pure OAI Repository

Business Intelligence on Non-Conventional Data

Author: Gallinucci Enrico <1988>
Publication venue: Alma Mater Studiorum - Università di Bologna
Publication date: 15/05/2017
Field of study

The revolution in digital communications witnessed over the last decade had a significant impact on the world of Business Intelligence (BI). In the big data era, the amount and diversity of data that can be collected and analyzed for the decision-making process transcends the restricted and structured set of internal data that BI systems are conventionally limited to. This thesis investigates the unique challenges imposed by three specific categories of non-conventional data: social data, linked data and schemaless data. Social data comprises the user-generated contents published through websites and social media, which can provide a fresh and timely perception about people’s tastes and opinions. In Social BI (SBI), the analysis focuses on topics, meant as specific concepts of interest within the subject area. In this context, this thesis proposes meta-star, an alternative strategy to the traditional star-schema for modeling hierarchies of topics to enable OLAP analyses. The thesis also presents an architectural framework of a real SBI project and a cross-disciplinary benchmark for SBI. Linked data employ the Resource Description Framework (RDF) to provide a public network of interlinked, structured, cross-domain knowledge. In this context, this thesis proposes an interactive and collaborative approach to build aggregation hierarchies from linked data. Schemaless data refers to the storage of data in NoSQL databases that do not force a predefined schema, but let database instances embed their own local schemata. In this context, this thesis proposes an approach to determine the schema profile of a document-based database; the goal is to facilitate users in a schema-on-read analysis process by understanding the rules that drove the usage of the different schemata. A final and complementary contribution of this thesis is an innovative technique in the field of recommendation systems to overcome user disorientation in the analysis of a large and heterogeneous wealth of data

AMS Tesi di Dottorato

Knowledge visualizations: a tool to achieve optimized operational decision making and data integration

Author: Hudson Paul C.
Rzasa Jeffrey A.
Publication venue: Monterey, California: Naval Postgraduate School
Publication date: 01/06/2015
Field of study

The overabundance of data created by modern information systems (IS) has led to a breakdown in cognitive decision-making. Without authoritative source data, commanders’ decision-making processes are hindered as they attempt to paint an accurate shared operational picture (SOP). Further impeding the decision-making process is the lack of proper interface interaction to provide a visualization that aids in the extraction of the most relevant and accurate data. Utilizing the DSS to present visualizations based on OLAP cube integrated data allow decision-makers to rapidly glean information and build their situation awareness (SA). This yields a competitive advantage to the organization while in garrison or in combat. Additionally, OLAP cube data integration enables analysis to be performed on an organization’s data-flows. This analysis is used to identify the critical path of data throughout the organization. Linking a decision-maker to the authoritative data along this critical path eliminates the many decision layers in a hierarchal command structure that can introduce latency or error into the decision-making process. Furthermore, the organization has an integrated SOP from which to rapidly build SA, and make effective and efficient decisions.http://archive.org/details/knowledgevisuali1094545877Outstanding ThesisOutstanding ThesisMajor, United States Marine CorpsCaptain, United States Marine CorpsApproved for public release; distribution is unlimited

Calhoun, Institutional Archive of the Naval Postgraduate School