2 research outputs found

    AsterixDB: A Scalable, Open Source BDMS

    Full text link
    AsterixDB is a new, full-function BDMS (Big Data Management System) with a feature set that distinguishes it from other platforms in today's open source Big Data ecosystem. Its features make it well-suited to applications like web data warehousing, social data storage and analysis, and other use cases related to Big Data. AsterixDB has a flexible NoSQL style data model; a query language that supports a wide range of queries; a scalable runtime; partitioned, LSM-based data storage and indexing (including B+-tree, R-tree, and text indexes); support for external as well as natively stored data; a rich set of built-in types; support for fuzzy, spatial, and temporal types and queries; a built-in notion of data feeds for ingestion of data; and transaction support akin to that of a NoSQL store. Development of AsterixDB began in 2009 and led to a mid-2013 initial open source release. This paper is the first complete description of the resulting open source AsterixDB system. Covered herein are the system's data model, its query language, and its software architecture. Also included are a summary of the current status of the project and a first glimpse into how AsterixDB performs when compared to alternative technologies, including a parallel relational DBMS, a popular NoSQL store, and a popular Hadoop-based SQL data analytics platform, for things that both technologies can do. Also included is a brief description of some initial trials that the system has undergone and the lessons learned (and plans laid) based on those early "customer" engagements

    Result Distribution in Big Data Systems

    No full text
    We are building a Big Data Management System (BDMS) called AsterixDB at UCI. Since AsterixDB is designed to operate on large volumes of data, the results for its queries can be potentially very large, and AsterixDB is also designed to operate under high concurency workloads. As a result, we need a specialized mechanism to manage these large volumes of query results and deliver them to the clients. In this thesis, we present an architecture and an implementation of a new result distribution framework that is capable of handling large volumes of results under high concurency workloads. We present the various components of this result distribution framework and show how they interact with each other to manage large volumes of query results and deliver them to clients. We also discuss various result distribution policies that are possible with our framework and compare their performance through experiments. We have implemented a REST-like HTTP client interface on top of the result distribution framework to allow clients to submit queries and obtain their results. This client interface provides two modes for clients to choose from to read their query results: synchronous mode and asynchronous mode. In synchronous mode, query results are delivered to a client as a direct response to its query within the same request-response cycle. In asynchronous mode, a query handle is returned instead to the client as a response to its query. The client can store the handle and send another request later, including the query handle, to read the result for the query whenever it wants. The architectural support for these two modes is also described in this thesis. We believe that the result distribution framework, combined with this client interface, successfully meets the result management demands of AsterixDB
    corecore