1,803 research outputs found
A unified view of data-intensive flows in business intelligence systems : a survey
Data-intensive flows are central processes in today’s business intelligence (BI) systems, deploying different technologies to deliver data, from a multitude of data sources, in user-preferred and analysis-ready formats. To meet complex requirements of next generation BI systems, we often need an effective combination of the traditionally batched extract-transform-load (ETL) processes that populate a data warehouse (DW) from integrated data sources, and more real-time and operational data flows that integrate source data at runtime. Both academia and industry thus must have a clear understanding of the foundations of data-intensive flows and the challenges of moving towards next generation BI environments. In this paper we present a survey of today’s research on data-intensive flows and the related fundamental fields of database theory. The study is based on a proposed set of dimensions describing the important challenges of data-intensive flows in the next generation BI setting. As a result of this survey, we envision an architecture of a system for managing the lifecycle of data-intensive flows. The results further provide a comprehensive understanding of data-intensive flows, recognizing challenges that still are to be addressed, and how the current solutions can be applied for addressing these challenges.Peer ReviewedPostprint (author's final draft
Repetitive querying of large random heterogeneous datasets in RDBMS using materialized views
A methodology has been developed to increase time efficiency of querying large heterogeneous datasets repetitively by applying materialized views on repetitive complex queries. Additionally, a simple user interface is provided to demonstrate the utility of this research methodology. The programs demonstrate sufficiently that the core design can be used to deploy a complete system which could be used in different domains. The methodology as developed in this research is presented as an experimental proof-of-concept prototype based on an abstract design
Recommended from our members
The Application of Object-Oriented Views to an Engineering Environment.
With the increasing popularity of object-oriented technology, object-oriented database systems are being used in design environments as central repositories. In this thesis, we investigate the role of versioning and the characteristics of design databases in design environments. In an effort to improve the configuration management scheme in a design environment, we also investigate the use of database views as a possible configuration tool.
We propose a unified version management scheme that facilitates cooperative team work and show that the use of database views provides a powerful configuration management scheme for a design environment
A role-based access control schema for materialized views
This thesis research presents a framework that enhances security at the level of materialized views. Materialized views can be used for performance reasons in very large systems such as data warehouses or distributed systems, or for providing a filtered selection of data from a more general database. Existing proposed techniques provide rule-based access control for materialized views, however, the administration of such systems is time consuming and cumbersome in a large environment. This thesis presents a role-based access control schema for materialized views in which data authorization rules are associated with roles and defined in Datalog syntax in plain text files, a column level restriction is imposed on a materialized view based on a user assigned role, and a role conflict strategy is defined in which priority is given to each conflicting role in order to resolve role conflicts if a user is gaining authorization for permissions associated with conflicting roles at the same time. KEYWORDS Materialized Views, Authorization Views, Session Roles, Role Conflict
Data modeling with NoSQL : how, when and why
Tese de mestrado integrado. Engenharia Informática e Computação. Faculdade de Engenharia. Universidade do Porto. 201
The Family of MapReduce and Large Scale Data Processing Systems
In the last two decades, the continuous increase of computational power has
produced an overwhelming flow of data which has called for a paradigm shift in
the computing architecture and large scale data processing mechanisms.
MapReduce is a simple and powerful programming model that enables easy
development of scalable parallel applications to process vast amounts of data
on large clusters of commodity machines. It isolates the application from the
details of running a distributed program such as issues on data distribution,
scheduling and fault tolerance. However, the original implementation of the
MapReduce framework had some limitations that have been tackled by many
research efforts in several followup works after its introduction. This article
provides a comprehensive survey for a family of approaches and mechanisms of
large scale data processing mechanisms that have been implemented based on the
original idea of the MapReduce framework and are currently gaining a lot of
momentum in both research and industrial communities. We also cover a set of
introduced systems that have been implemented to provide declarative
programming interfaces on top of the MapReduce framework. In addition, we
review several large scale data processing systems that resemble some of the
ideas of the MapReduce framework for different purposes and application
scenarios. Finally, we discuss some of the future research directions for
implementing the next generation of MapReduce-like solutions.Comment: arXiv admin note: text overlap with arXiv:1105.4252 by other author
- …