Search CORE

254 research outputs found

Towards Scalable Real-time Analytics:: An Architecture for Scale-out of OLxP Workloads

Author: Auch Nathan
Bodner Thomas
Bumbulis Peter
Färber Franz
Goel Anil K.
Gropengiesser Francis
Lehner Wolfgang
MacLean Scott
Mathis Christian
Pound Jeffrey
Publication venue: 'VLDB Endowment'
Publication date: 10/01/2023
Field of study

We present an overview of our work on the SAP HANA Scale-out Extension, a novel distributed database architecture designed to support large scale analytics over real-time data. This platform permits high performance OLAP with massive scale-out capabilities, while concurrently allowing OLTP workloads. This dual capability enables analytics over real-time changing data and allows fine grained user-specified service level agreements (SLAs) on data freshness. We advocate the decoupling of core database components such as query processing, concurrency control, and persistence, a design choice made possible by advances in high-throughput low-latency networks and storage devices. We provide full ACID guarantees and build on a logical timestamp mechanism to provide MVCC-based snapshot isolation, while not requiring synchronous updates of replicas. Instead, we use asynchronous update propagation guaranteeing consistency with timestamp validation. We provide a view into the design and development of a large scale data management platform for real-time analytics, driven by the needs of modern enterprise customers

Qucosa

HSSS - Hochschulschriftenserver der SLUB

Technische Universität Dresden: Qucosa

Positional Delta Trees to reconcile updates with read-optimized data storage

Author: Boncz P.A. (Peter)
Héman S. (Sándor)
Nes N.J. (Niels)
Zukowski M. (Marcin)
Publication venue: CWI
Publication date: 01/08/2008
Field of study

We investigate techniques that marry the high readonly analytical query performance of compressed, replicated column storage (“read-optimized” databases) with the ability to handle a high-throughput update workload. Today’s large RAM sizes and the growing gap between sequential vs. random IO disk throughput, bring this once elusive goal in reach, as it has become possible to buffer enough updates in memory to allow background migration of these updates to disk, where efficient sequential IO is amortized among many updates. Our key goal is that read-only queries always see the latest database state, yet are not (significantly) slowed down by the update processing. To this end, we propose the Positional Delta Tree (PDT), that is designed to minimize the overhead of on-the-fly merging of differential updates into (index) scans on stale disk-based data. We describe the PDT data structure and its basic operations (lookup, insert, delete, modify) and provide an in-detail study of their performance. Further, we propose a storage architecture called Replicated Mirrors, that replicates tables in multiple orders, storing each table copy mirrored in both column- and row-wise data formats, and uses PDTs to handle updates. Experiments in the MonetDB/X100 system show that this integrated architecture is able to achieve our main goals

CWI's Institutional Repository

GraphScope Flex: LEGO-like Graph Computing Stack

Author: He Tao
Hu Shuxian
Lai Longbin
Li Dongze
Li Neng
Li Xue
Liu Lexiao
Luo Xiaojian
Lyu Binqing
Meng Ke
Shen Sijie
Su Li
Wang Lei
Xu Jingbo
Yu Wenyuan
Zeng Weibin
Zhang Lei
Zhang Siyuan
Zhou Jingren
Zhou Xiaoli
Zhu Diwen
Publication venue
Publication date: 19/12/2023
Field of study

Graph computing has become increasingly crucial in processing large-scale graph data, with numerous systems developed for this purpose. Two years ago, we introduced GraphScope as a system addressing a wide array of graph computing needs, including graph traversal, analytics, and learning in one system. Since its inception, GraphScope has achieved significant technological advancements and gained widespread adoption across various industries. However, one key lesson from this journey has been understanding the limitations of a "one-size-fits-all" approach, especially when dealing with the diversity of programming interfaces, applications, and data storage formats in graph computing. In response to these challenges, we present GraphScope Flex, the next iteration of GraphScope. GraphScope Flex is designed to be both resource-efficient and cost-effective, while also providing flexibility and user-friendliness through its LEGO-like modularity. This paper explores the architectural innovations and fundamental design principles of GraphScope Flex, all of which are direct outcomes of the lessons learned during our ongoing development process. We validate the adaptability and efficiency of GraphScope Flex with extensive evaluations on synthetic and real-world datasets. The results show that GraphScope Flex achieves 2.4X throughput and up to 55.7X speedup over other systems on the LDBC Social Network and Graphalytics benchmarks, respectively. Furthermore, GraphScope Flex accomplishes up to a 2,400X performance gain in real-world applications, demonstrating its proficiency across a wide range of graph computing scenarios with increased effectiveness

arXiv.org e-Print Archive

WiSer: A Highly Available HTAP DBMS for IoT Applications

Author: Barber Ronald
Garcia-Arellano Christian
Grosman Ronen
Lohman Guy
Mohan C.
Mueller Rene
Pirahesh Hamid
Raman Vijayshankar
Sidle Richard
Storm Adam
Tian Yuanyuan
Tözün Pinar
Wu Yingjun
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

In a classic transactional distributed database management system (DBMS), write transactions invariably synchronize with a coordinator before final commitment. While enforcing serializability, this model has long been criticized for not satisfying the applications' availability requirements. When entering the era of Internet of Things (IoT), this problem has become more severe, as an increasing number of applications call for the capability of hybrid transactional and analytical processing (HTAP), where aggregation constraints need to be enforced as part of transactions. Current systems work around this by creating escrows, allowing occasional overshoots of constraints, which are handled via compensating application logic. The WiSer DBMS targets consistency with availability, by splitting the database commit into two steps. First, a PROMISE step that corresponds to what humans are used to as commitment, and runs without talking to a coordinator. Second, a SERIALIZE step, that fixes transactions' positions in the serializable order, via a consensus procedure. We achieve this split via a novel data representation that embeds read-sets into transaction deltas, and serialization sequence numbers into table rows. WiSer does no sharding (all nodes can run transactions that modify the entire database), and yet enforces aggregation constraints. Both readwrite conflicts and aggregation constraint violations are resolved lazily in the serialized data. WiSer also covers node joins and departures as database tables, thus simplifying correctness and failure handling. We present the design of WiSer as well as experiments suggesting this approach has promise

arXiv.org e-Print Archive

Crossref

The IT University of Copenhagen's Repository

Approximating an Energy-Proportional DBMS by a Dynamic Cluster of Nodes

Author: A.K. Elmagarmid
A.S. Szalay
D. Schall
D. Seo
H. Jung
L.A. Barroso
P.A. Bernstein
T. Härder
W. Lang
Y. Ou
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Crossref

Time Series Management Systems:A Survey

Author: Jensen Søren Kejser
Pedersen Torben Bach
Thomsen Christian
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 17/08/2017
Field of study

The collection of time series data increases as more monitoring and automation are being deployed. These deployments range in scale from an Internet of things (IoT) device located in a household to enormous distributed Cyber-Physical Systems (CPSs) producing large volumes of data at high velocity. To store and analyze these vast amounts of data, specialized Time Series Management Systems (TSMSs) have been developed to overcome the limitations of general purpose Database Management Systems (DBMSs) for times series management. In this paper, we present a thorough analysis and classification of TSMSs developed through academic or industrial research and documented through publications. Our classification is organized into categories based on the architectures observed during our analysis. In addition, we provide an overview of each system with a focus on the motivational use case that drove the development of the system, the functionality for storage and querying of time series a system implements, the components the system is composed of, and the capabilities of each system with regard to Stream Processing and Approximate Query Processing (AQP). Last, we provide a summary of research directions proposed by other researchers in the field and present our vision for a next generation TSMS.Comment: 20 Pages, 15 Figures, 2 Tables, Accepted for publication in IEEE TKD

arXiv.org e-Print Archive

Crossref

VBN

Optimization Research of the OLAP Query Technology Based on P2P

Author: Wang Chunfeng
Publication venue: 'Universitas Ahmad Dahlan'
Publication date: 01/11/2014
Field of study

With the increasing data of the application system, the fast and efficient access to the information of support decision-making analysis has become more and more difficult and the original OLAP technology have also revealed many shortcomings. Using the method of P2P network technology and OLAP storage query and query method, the paper has constructed a distributed P2P-OLAP network model and put forward the storage and sharing scheme of multidimensional data, OLAP query scheme based on collaboration support. Finally, the paper has shown that the scheme can effectively improve the performance of decision analysis by the experiment

TELKOMNIKA (Telecommunication Computing Electronics and Control)

UAD Journal Management System