Search CORE

80 research outputs found

Vectorwise: Beyond Column Stores

Author: Boncz P.A.
Zukowski M.
Publication venue
Publication date: 01/01/2012
Field of study

textabstractThis paper tells the story of Vectorwise, a high-performance analytical database system, from multiple perspectives: its history from academic project to commercial product, the evolution of its technical architecture, customer reactions to the product and its future research and development roadmap. One take-away from this story is that the novelty in Vectorwise is much more than just column-storage: it boasts many query processing innovations in its vectorized execution model, and an adaptive mixed row/column data storage model with indexing support tailored to analytical workloads. Another one is that there is a long road from research prototype to commercial product, though database research continues to achieve a strong innovative inﬂuence on product development

How the High Performance Analytics Work with SAP HANA

Author: Mahajan Narayan
Publication venue: Mohammad Nassar for Researches (MNFR)
Publication date: 06/08/2023
Field of study

Informed decision-making, better communication and faster response to business situation are the key differences between leaders and followers in this competitive global marketplace. A data-driven organization can analyze patterns & anomalies to make sense of the current situation and be ready for future opportunities. Organizations no longer have the problem of “lack of data”, but the problem of “actionable data” at the right time to act, direct and influence their business decisions. The data exists in different transactional systems and/or data warehouse systems, which takes significant time to retrieve/ process relevant information and negatively impacts the time window to out-maneuver the competition. To solve the problem of “actionable data”, enterprises can take advantage of the SAP HANA [1] in-memory platform that enables rapid processing and analysis of huge volumes of data in real-time. This paper discusses how SAP HANA virtual data models can be used for on-the-fly analysis of live transactional data to derive insight, perform what-if analysis and execute business transactions in real-time without using persisted aggregates

International Journal of Computer (IJC - Global Society of Scientific Research and Researchers, GSSRR)

Formal Representation of the SS-DB Benchmark and Experimental Evaluation in EXTASCID

Author: Cheng Yu
Rusu Florin
Publication venue
Publication date: 01/01/2013
Field of study

Evaluating the performance of scientific data processing systems is a difficult task considering the plethora of application-specific solutions available in this landscape and the lack of a generally-accepted benchmark. The dual structure of scientific data coupled with the complex nature of processing complicate the evaluation procedure further. SS-DB is the first attempt to define a general benchmark for complex scientific processing over raw and derived data. It fails to draw sufficient attention though because of the ambiguous plain language specification and the extraordinary SciDB results. In this paper, we remedy the shortcomings of the original SS-DB specification by providing a formal representation in terms of ArrayQL algebra operators and ArrayQL/SciQL constructs. These are the first formal representations of the SS-DB benchmark. Starting from the formal representation, we give a reference implementation and present benchmark results in EXTASCID, a novel system for scientific data processing. EXTASCID is complete in providing native support both for array and relational data and extensible in executing any user code inside the system by the means of a configurable metaoperator. These features result in an order of magnitude improvement over SciDB at data loading, extracting derived data, and operations over derived data.Comment: 32 pages, 3 figure

arXiv.org e-Print Archive

CiteSeerX

eScholarship - University of California

Finding a Second Wind: Speeding Up Graph Traversal Queries in RDBMSs Using Column-Oriented Processing

Author: Chernishev George
Firsov Mikhail
Polyntsov Michael
Smirnov Kirill
Publication venue
Publication date: 16/08/2023
Field of study

Recursive queries and recursive derived tables constitute an important part of the SQL standard. Their efficient processing is important for many real-life applications that rely on graph or hierarchy traversal. Position-enabled column-stores offer a novel opportunity to improve run times for this type of queries. Such systems allow the engine to explicitly use data positions (row ids) inside its core and thus, enable novel efficient implementations of query plan operators. In this paper, we present an approach that significantly speeds up recursive query processing inside RDBMSes. Its core idea is to employ a particular aspect of column-store technology (late materialization) which enables the query engine to manipulate data positions during query execution. Based on it, we propose two sets of Volcano-style operators intended to process different query cases. In order validate our ideas, we have implemented the proposed approach in PosDB, an RDBMS column-store with SQL support. We experimentally demonstrate the viability of our approach by providing a comparison with PostgreSQL. Experiments show that for breadth-first search: 1) our position-based approach yields up to 6x better results than PostgreSQL, 2) our tuple-based one results in only 3x improvement when using a special rewriting technique, but it can work in a larger number of cases, and 3) both approaches can't be emulated in row-stores efficiently

arXiv.org e-Print Archive

Hillview:A trillion-cell spreadsheet for big data

Author: Aguilera Marcos K.
Budiu Mihai
Gopalan Parikshit
Kruiger Han
Suresh Lalith
Wieder Udi
Publication venue: 'VLDB Endowment'
Publication date: 01/07/2019
Field of study

Hillview is a distributed spreadsheet for browsing very large datasets that cannot be handled by a single machine. As a spreadsheet, Hillview provides a high degree of interactivity that permits data analysts to explore information quickly along many dimensions while switching visualizations on a whim. To provide the required responsiveness, Hillview introduces visualization sketches, or vizketches, as a simple idea to produce compact data visualizations. Vizketches combine algorithmic techniques for data summarization with computer graphics principles for efficient rendering. While simple, vizketches are effective at scaling the spreadsheet by parallelizing computation, reducing communication, providing progressive visualizations, and offering precise accuracy guarantees. Using Hillview running on eight servers, we can navigate and visualize datasets of tens of billions of rows and trillions of cells, much beyond the published capabilities of competing systems

arXiv.org e-Print Archive

Proceedings - University of Groningen

Dissertations of the University of Groningen