862 research outputs found
A Framework for Developing Real-Time OLAP algorithm using Multi-core processing and GPU: Heterogeneous Computing
The overwhelmingly increasing amount of stored data has spurred researchers
seeking different methods in order to optimally take advantage of it which
mostly have faced a response time problem as a result of this enormous size of
data. Most of solutions have suggested materialization as a favourite solution.
However, such a solution cannot attain Real- Time answers anyhow. In this paper
we propose a framework illustrating the barriers and suggested solutions in the
way of achieving Real-Time OLAP answers that are significantly used in decision
support systems and data warehouses
Concurrency Lock Issues in Relational Cloud Computing
The widespread popularity of Cloud computing as a preferred platform for the deployment of web applications has resulted in an enormous number of applications moving to the cloud, and the huge success of cloud service providers. Due to the increasing number of web applications being hosted in the cloud, and the growing scale of data which these applications store, process, and serve – scalable data management systems form a critical part of cloud infrastructures. There are issues related to the database security while database is on cloud. The major challenging issues are multi-tenancy, scalability and the privacy. This paper focuses on the problems faced in the data security of Relational Cloud. The problems faced by various types of tenants and the type of access into the database makes a rework on the security of data, by analyzing proper locking strategies on the records accessed from the database. Data security in cloud computing addresses the type of access mode by the users (for analytical or transaction purpose) and the frequency of data access from the physical location (in shared or no-shared disk mode). Accordingly, the various data locking strategies are studied and appropriate locking mechanism will be implemented for real-time applications as in e-commerce. Keywords: Relational Cloud, Multi-tenant, two-phase locking, concurrency control, data management
A Business Intelligence Solution, based on a Big Data Architecture, for processing and analyzing the World Bank data
The rapid growth in data volume and complexity has needed the adoption of advanced technologies to extract valuable insights for decision-making. This project aims to address this need by developing a comprehensive framework that combines Big Data processing, analytics, and visualization techniques to enable effective analysis of World Bank data. The problem addressed in this study is the need for a scalable and efficient Business Intelligence solution that can handle the vast amounts of data generated by the World Bank. Therefore, a Big Data architecture is implemented on a real use case for the International Bank of Reconstruction and Development. The findings of this project demonstrate the effectiveness of the proposed solution. Through the integration of Apache Spark and Apache Hive, data is processed using Extract, Transform and Load techniques, allowing for efficient data preparation. The use of Apache Kylin enables the construction of a multidimensional model, facilitating fast and interactive queries on the data. Moreover, data visualization techniques are employed to create intuitive and informative visual representations of the analysed data. The key conclusions drawn from this project highlight the advantages of a Big Data-driven Business Intelligence solution in processing and analysing World Bank data. The implemented framework showcases improved scalability, performance, and flexibility compared to traditional approaches. In conclusion, this bachelor thesis presents a Business Intelligence solution based on a Big Data architecture for processing and analysing the World Bank data. The project findings emphasize the importance of scalable and efficient data processing techniques, multidimensional modelling, and data visualization for deriving valuable insights. The application of these techniques contributes to the field by demonstrating the potential of Big Data Business Intelligence solutions in addressing the challenges associated with large-scale data analysis
The End of Slow Networks: It's Time for a Redesign
Next generation high-performance RDMA-capable networks will require a
fundamental rethinking of the design and architecture of modern distributed
DBMSs. These systems are commonly designed and optimized under the assumption
that the network is the bottleneck: the network is slow and "thin", and thus
needs to be avoided as much as possible. Yet this assumption no longer holds
true. With InfiniBand FDR 4x, the bandwidth available to transfer data across
network is in the same ballpark as the bandwidth of one memory channel, and it
increases even further with the most recent EDR standard. Moreover, with the
increasing advances of RDMA, the latency improves similarly fast. In this
paper, we first argue that the "old" distributed database design is not capable
of taking full advantage of the network. Second, we propose architectural
redesigns for OLTP, OLAP and advanced analytical frameworks to take better
advantage of the improved bandwidth, latency and RDMA capabilities. Finally,
for each of the workload categories, we show that remarkable performance
improvements can be achieved
AnyDB: An Architecture-less DBMS for Any Workload
In this paper, we propose a radical new approach for scale-out distributed
DBMSs. Instead of hard-baking an architectural model, such as a shared-nothing
architecture, into the distributed DBMS design, we aim for a new class of
so-called architecture-less DBMSs. The main idea is that an architecture-less
DBMS can mimic any architecture on a per-query basis on-the-fly without any
additional overhead for reconfiguration. Our initial results show that our
architecture-less DBMS AnyDB can provide significant speed-ups across varying
workloads compared to a traditional DBMS implementing a static architecture.Comment: Submitted to 11th Annual Conference on Innovative Data Systems
Research (CIDR 21
The Data Lakehouse: Data Warehousing and More
Relational Database Management Systems designed for Online Analytical
Processing (RDBMS-OLAP) have been foundational to democratizing data and
enabling analytical use cases such as business intelligence and reporting for
many years. However, RDBMS-OLAP systems present some well-known challenges.
They are primarily optimized only for relational workloads, lead to
proliferation of data copies which can become unmanageable, and since the data
is stored in proprietary formats, it can lead to vendor lock-in, restricting
access to engines, tools, and capabilities beyond what the vendor offers. As
the demand for data-driven decision making surges, the need for a more robust
data architecture to address these challenges becomes ever more critical. Cloud
data lakes have addressed some of the shortcomings of RDBMS-OLAP systems, but
they present their own set of challenges. More recently, organizations have
often followed a two-tier architectural approach to take advantage of both
these platforms, leveraging both cloud data lakes and RDBMS-OLAP systems.
However, this approach brings additional challenges, complexities, and
overhead. This paper discusses how a data lakehouse, a new architectural
approach, achieves the same benefits of an RDBMS-OLAP and cloud data lake
combined, while also providing additional advantages. We take today's data
warehousing and break it down into implementation independent components,
capabilities, and practices. We then take these aspects and show how a
lakehouse architecture satisfies them. Then, we go a step further and discuss
what additional capabilities and benefits a lakehouse architecture provides
over an RDBMS-OLAP
Simplifying Big Data Analytics System with A Reference Architecture
The internet and pervasive technology like the Internet of Things (i.e. sensors and smart devices) have exponentially increased the scale of data collection and availability. This big data not only challenges the structure of existing enterprise analytics systems but also offer new opportunities to create new knowledge and competitive advantage. Businesses have been exploiting these opportunities by implementing and operating big data analytics capabilities. Social network companies such as Facebook, LinkedIn, Twitter and Video streaming company like Netflix have implemented big data analytics and subsequently published related literatures. However, these use cases did not provide a simplified and coherent big data analytics reference architecture as well as currently, there still remains limited reference architecture of big data analytics. This paper aims to simplify big data analytics by providing a reference architecture based on existing four use cases and subsequently verified the reference architecture with Amazon and Google analytics services
- …