Search CORE

290 research outputs found

A unified view of data-intensive flows in business intelligence systems : a survey

Author: Abelló Gamazo Alberto
Jovanovic Petar
Romero Moral Óscar
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Data-intensive flows are central processes in today’s business intelligence (BI) systems, deploying different technologies to deliver data, from a multitude of data sources, in user-preferred and analysis-ready formats. To meet complex requirements of next generation BI systems, we often need an effective combination of the traditionally batched extract-transform-load (ETL) processes that populate a data warehouse (DW) from integrated data sources, and more real-time and operational data flows that integrate source data at runtime. Both academia and industry thus must have a clear understanding of the foundations of data-intensive flows and the challenges of moving towards next generation BI environments. In this paper we present a survey of today’s research on data-intensive flows and the related fundamental fields of database theory. The study is based on a proposed set of dimensions describing the important challenges of data-intensive flows in the next generation BI setting. As a result of this survey, we envision an architecture of a system for managing the lifecycle of data-intensive flows. The results further provide a comprehensive understanding of data-intensive flows, recognizing challenges that still are to be addressed, and how the current solutions can be applied for addressing these challenges.Peer ReviewedPostprint (author's final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Heterogeneous Relational Databases for a Grid-enabled Analysis Environment

Author: Ali Arshad
Anjum Ashiq
Azim Tahir
Bunn Julian
Iqbal Saima
McClatchey Richard
Newman Harvey
Shah S. Yousaf
Solomonides Tony
Steenberg Conrad
Thomas Michael
van Lingen Frank
Willers Ian
Publication venue
Publication date: 01/01/2005
Field of study

Grid based systems require a database access mechanism that can provide seamless homogeneous access to the requested data through a virtual data access system, i.e. a system which can take care of tracking the data that is stored in geographically distributed heterogeneous databases. This system should provide an integrated view of the data that is stored in the different repositories by using a virtual data access mechanism, i.e. a mechanism which can hide the heterogeneity of the backend databases from the client applications. This paper focuses on accessing data stored in disparate relational databases through a web service interface, and exploits the features of a Data Warehouse and Data Marts. We present a middleware that enables applications to access data stored in geographically distributed relational databases without being aware of their physical locations and underlying schema. A web service interface is provided to enable applications to access this middleware in a language and platform independent way. A prototype implementation was created based on Clarens [4], Unity [7] and POOL [8]. This ability to access the data stored in the distributed relational databases transparently is likely to be a very powerful one for Grid users, especially the scientific community wishing to collate and analyze data distributed over the Grid

arXiv.org e-Print Archive

Caltech Authors

GeocentraleApps : Practical Approaches to Data Integration for Spatially Enabled Apps.

Author: Hofer Simon
Schaller Christoph
Publication venue: 'Osterreichische Akademie der Wissenschaften'
Publication date: 01/01/2016
Field of study

With modern mobile and internet technologies, spatial data is becoming ubiquitous. In order to realize the vision of a spatially enabled society, however, decision makers in government, public administration, business and society need to understand and actively take into account location as a driver in their decisions. This necessitates decision-support tools that integrate the required spatial and textual data from a variety of sources. GeocentraleApps is a platform for building such modern spatially enabled applications. These applications and the underlying spatial data infrastructures face challenges such as data discovery, matching between disparate data sources, issues with data and service quality, as well as the need for appropriate visualization and presentation. GeocentraleApps meets these challenges by flexibly combining a number of different mechanisms for data integration. This paper presents lessons learned from an analysis of the platform’s data integration approaches with regard to their individual architectures. It points out the advantages and disadvantages of the current solutions and gives an outlook on future developments

Berner Fachhochschule: ARBOR

Conceptual Workflow for Complex Data Integration using AXML

Author: Boussaid Omar
Darmont Jérôme
Salem Rashed
Publication venue: HAL CCSD
Publication date: 03/10/2010
Field of study

International audienceRelevant data for decision support systems are available everywhere and in various formats. Such data must be integrated into a uniﬁed format. Traditional data integration approaches are not adapted to handle complex data. Thus, we exploit the Active XML language for integrating complex data. Its XML part allows to unify, model and store complex data. Moreover, its services part tackles the distributed issue of data sources. Accordingly, different integration tasks are proposed as services. These services are managed via a set of active rules that are built upon metadata and events of the integration system. In this paper, we design an architecture for integrating complex data autonomously. We have also designed the workﬂow for data integration tasks

HAL

HAL-Rennes 1

Personalized Biomedical Data Integration

Author: Ian Foster
Olufunmilayo Olopade
Xiaoming Wang
Publication venue: 'IntechOpen'
Publication date: 08/01/2011
Field of study

IntechOpen

Design and management of data warehouses - Report on the DMDW '99 workshop.

Author: Gatziu S.
Jeusfeld M.A.
Staudt M.
Vassiliou Y.
Publication venue
Publication date
Field of study

Research Papers in Economics

Container-Managed ETL Applications for Integrating Data in Near Real-Time

Author: Bruckner Robert
Schiefer Josef
Publication venue: AIS Electronic Library (AISeL)
Publication date: 31/12/2003
Field of study

As the analytical capabilities and applications of e-business systems expand, providing real-time access to critical business performance indicators to improve the speed and effectiveness of business operations has become crucial. The monitoring of business activities requires focused, yet incremental enterprise application integration (EAI) efforts and balancing information requirements in real-time with historical perspectives. The decision-making process in traditional data warehouse environments is often delayed because data cannot be propagated from the source system to the data warehouse in a timely manner. In this paper, we present an architecture for a container-based ETL (extraction, transformation, loading) environment, which supports a continual near real-time data integration with the aim of decreasing the time it takes to make business decisions and to attain minimized latency between the cause and effect of a business decision. Instead of using vendor proprietary ETL solutions, we use an ETL container for managing ETLets (pronounced “et-lets”) for the ETL processing tasks. The architecture takes full advantage of existing J2EE (Java 2 Platform, Enterprise Edition) technology and enables the implementation of a distributed, scalable, near real-time ETL environment. We have fully implemented the proposed architecture. Furthermore, we compare the ETL container to alternative continuous data integration approaches

AIS Electronic Library (AISeL)