15 research outputs found

    UpStream: storage-centric load management for streaming applications with update semantics

    Get PDF
    This paper addresses the problem of minimizing the staleness of query results for streaming applications with update semantics under overload conditions. Staleness is a measure of how out-of-date the results are compared with the latest data arriving on the input. Real-time streaming applications are subject to overload due to unpredictably increasing data rates, while in many of them, we observe that data streams and queries in fact exhibit "update semantics” (i.e., the latest input data are all that really matters when producing a query result). Under such semantics, overload will cause staleness to build up. The key to avoid this is to exploit the update semantics of applications as early as possible in the processing pipeline. In this paper, we propose UpStream, a storage-centric framework for load management over streaming applications with update semantics. We first describe how we model streams and queries that possess the update semantics, providing definitions for correctness and staleness for the query results. Then, we show how staleness can be minimized based on intelligent update key scheduling techniques applied at the queue level, while preserving the correctness of the results, even for complex queries that involve sliding windows. UpStream is based on the simple idea of applying the updates in place, yet with great returns in terms of lowering staleness and memory consumption, as we also experimentally verify on the Borealis syste

    Virtual data mart for measuring organizational achievement using data virtualization technique (KPIVDM)

    Get PDF
    Currently in the dynamic environment, organizations are confronted with new and growingly vital decisions which can impact their very survival.In fact, these demands are increasing the pressure on Information Technology in order to ensure that data will be delivered properly at the right time and faster rate.In this paper, we propose to build a virtual data mart, especially for Organizational KPIs by using data virtualization technology, which can be used to help KPI developers to build and update performance management system quickly and make these systems work in real time. In this paper, we present a way of identifying and building virtual dat a marts for Organizational KPIs.The basic principle underlying the proposed approach is that the design of virtual data marts should be driven by the business needs and organizational requirements that each virtual data mart is expected to address.As a consequence, th e virtual data mart design process must be based on a deep understanding of the top management’s need and users' expectations.A prototype is recommended to validate the use of the proposed method

    Building Scalable and Consistent Distributed Databases Under Conflicts

    Get PDF
    Distributed databases, which rely on redundant and distributed storage across multiple servers, are able to provide mission-critical data management services at large scale. Parallelism is the key to the scalability of distributed databases, but concurrent queries having conflicts may block or abort each other when strong consistency is enforced using rigorous concurrency control protocols. This thesis studies the techniques of building scalable distributed databases under strong consistency guarantees even in the face of high contention workloads. The techniques proposed in this thesis share a common idea, conflict mitigation, meaning mitigating conflicts by rescheduling operations in the concurrency control in the first place instead of resolving contending conflicts. Using this idea, concurrent queries under conflicts can be executed with high parallelism. This thesis explores this idea on both databases that support serializable ACID (atomic, consistency, isolation, durability) transactions, and eventually consistent NoSQL systems. First, the epoch-based concurrency control (ECC) technique is proposed in ALOHA-KV, a new distributed key-value store that supports high performance read-only and write-only distributed transactions. ECC demonstrates that concurrent serializable distributed transactions can be processed in parallel with low overhead even under high contention. With ECC, a new atomic commitment protocol is developed that only requires amortized one round trip for a distributed write-only transaction to commit in the absence of failures. Second, a novel paradigm of serializable distributed transaction processing is developed to extend ECC with read-write transaction processing support. This paradigm uses a newly proposed database operator, functors, which is a placeholder for the value of a key, which can be computed asynchronously in parallel with other functor computations of the same or other transactions. Functor-enabled ECC achieves more fine-grained concurrency control than transaction level concurrency control, and it never aborts transactions due to read-write or write-write conflicts but allows transactions to fail due to logic errors or constraint violations while guaranteeing serializability. Lastly, this thesis explores consistency in the eventually consistent system, Apache Cassandra, for an investigation of the consistency violation, referred to as "consistency spikes". This investigation shows that the consistency spikes exhibited by Cassandra are strongly correlated with garbage collection, particularly the "stop-the-world" phase in the Java virtual machine. Thus, delaying read operations arti cially at servers immediately after a garbage collection pause can virtually eliminate these spikes. All together, these techniques allow distributed databases to provide scalable and consistent storage service

    Maintaining Internal Consistency of Report for Real-time OLAP with Layer-based View

    Get PDF
    Maintaining internal consistency of report is an important aspect in the field of real-time data warehouses. OLAP and Query tools were initially designed to operate on top of unchanging, static historical data. In a real-time environment, however, the results they produce are usually negatively influenced by data changes concurrent to query execution, which may result in some internal report inconsistency. In this paper, we propose a new method, called layer-based view approach, to appropriately and effectively maintain report data consistency. The core idea is to prevent the data involved in an OLAP query from being changed through using lock mechanism, and avoid the confliction between read and write operations with the help of layer mechanism. Our approach can effectively deal with report consistency issue, while at the same time avoiding the query contention between read and write operations under real-time OLAP environment

    A Framework for Real-time Analysis in OLAP Systems

    Get PDF
    OLAP systems are designed to quickly answer multi-dimensional queries against large data warehouse systems. Constructing data cubes and their associated indexes is time consuming and computationally expensive, and for this reason, data cubes are only refreshed periodically. Increasingly, organizations are demanding for both historical and predictive analysis based on the most current data. This trend has also placed the requirement on OLAP systems to merge updates at a much faster rate than before. In this thesis, we proposes a framework for OLAP systems that enables updates to be merged with data cubes in soft real-time. We apply a strategy of local partitioning of the data cube, and maintain a ``hot'' partition for each materialized view to merge update data. We augment this strategy by applying multi-core processing using the OpenMP library to accelerate data cube construction and query resolution. Experiments using a data cube with 10,000,000 tuples and an update set of 100,000 tuples show that our framework achieves a 99% performance improvement updating the data cube, a 76% performance increase when constructing a new data cube, and a 72% performance increase when resolving a range query against a data cube with 1,000,000 tuples

    A Memory-Optimal Many-To-Many Semi-Stream Join

    Get PDF

    Scheduling to Minimize Staleness and Stretch in Real-Time Data Warehouses

    No full text
    We study scheduling algorithms for loading data feeds into real time data warehouses, which are used in applications such as IP network monitoring, online financial trading, and credit card fraud detection. In these applications, the warehouse collects a large number of streaming data feeds that are generated by external sources and arrive asynchronously. Data for each table in the warehouse are generated at a constant rate, different tables possibly at different rates. For each data feed, the arrival of new data triggers an update that seeks to append the new data to the corresponding table; if multiple updates are pending for the same table, they are batched together before being loaded. At time τ, if a table has been updated with information up to time r ≤ τ, its staleness is defined as τ − r. Our first objective is to schedule the updates on one or more processors in a way that minimizes the total staleness. In order to ensure fairness, our second objective is to limit the maximum “stretch”, which we define (roughly) as the ratio between the duration of time an update waits till it is finished being processed, and the length of the update. In contrast to earlier work proving the nonexistence of constant-competitive algorithms for related scheduling problems, we prove that any online nonpreemptive algorithm, no processor of which is eve

    Toward a new architecture for industry

    Get PDF
    Thesis (M.Arch.) Massachusetts Institute of Technology. Dept. of Architecture, 1955.Bibliography: leaves 253-256.by N. Keith Scott.M.Arch

    Bowdoin Orient v.108, no.1-23 (1978-1979)

    Get PDF
    https://digitalcommons.bowdoin.edu/bowdoinorient-1970s/1009/thumbnail.jp
    corecore