59 research outputs found

    Cooperative scans

    Get PDF
    Data mining, information retrieval and other application areas exhibit a query load with multiple concurrent queries touching a large fraction of a relation. This leads to individual query plans based on a table scan or large index scan. The implementation of this access path in most database systems is straightforward. The Scan operator issues next page requests to the buffer manager without concern for the system state. Conversely, the buffer manager is not aware of the work ahead and it focuses on keeping the most-recently-used pages in the buffer pool. This paper introduces cooperative scans -- a new algorithm, based on a better sharing of knowledge and responsibility between the Scan operator and the buffer manager, which significantly improves performance of concurrent scan queries. In this approach, queries share the buffer content, and progress of the scans is optimized by the buffer manager by minimizing the number of disk transfers in light of the total workload ahead. The experimental results are based on a simulation of the various disk-access scheduling policies, and implementation of the cooperative scans within PostgreSQL and MonetDB/X100. These real-life experiments show that with a little effort the performance of existing database systems on concurrent scan queries can be strongly improve

    Cache craftiness for fast multicore key-value storage

    Get PDF
    We present Masstree, a fast key-value database designed for SMP machines. Masstree keeps all data in memory. Its main data structure is a trie-like concatenation of B+-trees, each of which handles a fixed-length slice of a variable-length key. This structure effectively handles arbitrary-length possiblybinary keys, including keys with long shared prefixes. [superscript +]-tree fanout was chosen to minimize total DRAM delay when descending the tree and prefetching each tree node. Lookups use optimistic concurrency control, a read-copy-update-like technique, and do not write shared data structures; updates lock only affected nodes. Logging and checkpointing provide consistency and durability. Though some of these ideas appear elsewhere, Masstree is the first to combine them. We discuss design variants and their consequences. On a 16-core machine, with logging enabled and queries arriving over a network, Masstree executes more than six million simple queries per second. This performance is comparable to that of memcached, a non-persistent hash table server, and higher (often much higher) than that of VoltDB, MongoDB, and Redis.National Science Foundation (U.S.). (Award 0834415)National Science Foundation (U.S.). (Award 0915164)Quanta Computer (Firm

    Cache Hierarchy-Aware Query Mapping on Emerging Multicore Architectures

    Get PDF
    One of the important characteristics of emerging multicores/manycores is the existence of 'shared on-chip caches,' through which different threads/processes can share data (help each other) or displace each other's data (hurt each other). Most of current commercial multicore systems on the market have on-chip cache hierarchies with multiple layers (typically, in the form of L1, L2 and L3, the last two being either fully or partially shared). In the context of database workloads, exploiting full potential of these caches can be critical. Motivated by this observation, our main contribution in this work is to present and experimentally evaluate a cache hierarchy-aware query mapping scheme targeting workloads that consist of batch queries to be executed on emerging multicores. Our proposed scheme distributes a given batch of queries across the cores of a target multicore architecture based on the affinity relations among the queries. The primary goal behind this scheme is to maximize the utilization of the underlying on-chip cache hierarchy while keeping the load nearly balanced across domain affinities. Each domain affinity in this context corresponds to a cache structure bounded by a particular level of the cache hierarchy. A graph partitioning-based method is employed to distribute queries across cores, and an integer linear programming (ILP) formulation is used to address locality and load balancing concerns. We evaluate our scheme using the TPC-H benchmarks on an Intel Xeon based multicore. Our solution achieves up to 25 percent improvement in individual query execution times and 15-19 percent improvement in throughput over the default Linux-based process scheduler. © 1968-2012 IEEE

    Cost-Based Optimization of Integration Flows

    Get PDF
    Integration flows are increasingly used to specify and execute data-intensive integration tasks between heterogeneous systems and applications. There are many different application areas such as real-time ETL and data synchronization between operational systems. For the reasons of an increasing amount of data, highly distributed IT infrastructures, and high requirements for data consistency and up-to-dateness of query results, many instances of integration flows are executed over time. Due to this high load and blocking synchronous source systems, the performance of the central integration platform is crucial for an IT infrastructure. To tackle these high performance requirements, we introduce the concept of cost-based optimization of imperative integration flows that relies on incremental statistics maintenance and inter-instance plan re-optimization. As a foundation, we introduce the concept of periodical re-optimization including novel cost-based optimization techniques that are tailor-made for integration flows. Furthermore, we refine the periodical re-optimization to on-demand re-optimization in order to overcome the problems of many unnecessary re-optimization steps and adaptation delays, where we miss optimization opportunities. This approach ensures low optimization overhead and fast workload adaptation

    The Viability of the Air Mobility Command Pure Pallet Program for US Army Reparable Retrograde Shipments

    Get PDF
    Last year, Congress approved 17.1billiondollars,anincreaseof17.1 billion dollars, an increase of 4 billion dollars more than originally was requested by the Bush Administration, for US Army vehicles to be repaired or replaced (commonly referred to as reset) as a result of military operations in Iraq and Afghanistan. A large portion of the repair workload falls upon the Army depots in Anniston and Red River in Texarkana, Texas and must rely on the DOD transportation system for air and surface movement of retrograde cargo deemed serviceable and unserviceable to fill requisitions and backorders for entry into the national supply inventory. Headquarters Air Mobility Command developed an initiative for distribution to the US Central Command to allow supply requisition shipments to accumulate based on customer defined delivery timelines to a single unit destination to eliminate the need of mixed destinations on a single pallet, thereby avoiding intermediate handling and increase in-transit visibility. This research viewed the depot and the item managers as the customers due to the value they collectively add in equipment repairs and how retrograde is directed to meet the needs of the end user. Subject matter experts from Army Materiel Command provided their inputs through a series of focused interviews to calculate their value placed on transportation system and convergence with a cost comparison of the accumulation principles of the AMC pure pallet program. The results indicated that the AMC pure pallet program was not a viable option due to conflicts with customer requirements, high variability in the volume of retrograde generated to successfully utilize this option despite the savings in using consolidated shipments

    Army-NASA aircrew/aircraft integration program (A3I) software detailed design document, phase 3

    Get PDF
    The capabilities and design approach of the MIDAS (Man-machine Integration Design and Analysis System) computer-aided engineering (CAE) workstation under development by the Army-NASA Aircrew/Aircraft Integration Program is detailed. This workstation uses graphic, symbolic, and numeric prototyping tools and human performance models as part of an integrated design/analysis environment for crewstation human engineering. Developed incrementally, the requirements and design for Phase 3 (Dec. 1987 to Jun. 1989) are described. Software tools/models developed or significantly modified during this phase included: an interactive 3-D graphic cockpit design editor; multiple-perspective graphic views to observe simulation scenarios; symbolic methods to model the mission decomposition, equipment functions, pilot tasking and loading, as well as control the simulation; a 3-D dynamic anthropometric model; an intermachine communications package; and a training assessment component. These components were successfully used during Phase 3 to demonstrate the complex interactions and human engineering findings involved with a proposed cockpit communications design change in a simulated AH-64A Apache helicopter/mission that maps to empirical data from a similar study and AH-1 Cobra flight test

    An improved method for text summarization using lexical chains

    Get PDF
    This work is directed toward the creation of a system for automatically sum-marizing documents by extracting selected sentences. Several heuristics including position, cue words, and title words are used in conjunction with lexical chain in-formation to create a salience function that is used to rank sentences for extraction. Compiler technology, including the Flex and Bison tools, is used to create the AutoExtract summarizer that extracts and combines this information from the raw text. The WordNet database is used for the creation of the lexical chains. The AutoExtract summarizer performed better than the Microsoft Word97 AutoSummarize tool and the Sinope commercial summarizer in tests against ideal extracts and in tests judged by humans

    Model consistency management for systems engineering

    Get PDF
    Um der Komplexität der interdisziplinären Entwicklung moderner technischer Systeme Herr zu werden, findet die Entwicklung heutzutage meist modellbasiert statt. Dabei werden zahlreiche verschiedene Modelle genutzt, die jeweils unterschiedliche Gesichtspunkte berücksichtigen und sich auf verschiedenen Abstraktionsebenen befinden. Wenn die hierbei auftretenden Inkonsistenzen zwischen den Modellen ungelöst bleiben, kann dies zu Fehlern im fertigen System führen. Modelltransformations- und -synchronisationstechniken sind ein vielversprechender Ansatz, um solche Inkonsistenzen zu erkennen und aufzulösen. Existierende Modellsynchronisationstechniken sind allerdings nicht mächtig genug, um die komplexen Beziehungen in so einem Entwicklungsszenario zu unterstützen. In dieser Arbeit wird eine neue Modellsynchronisationstechnik präsentiert, die es erlaubt, Modelle verschiedener Sichten und Abstraktionsebenen zu synchronisieren. Dabei werden Metriken zur Erhöhung des Automatisierungsgrads eingesetzt, die Expertenwissen abbilden. Der Ansatz erlaubt unterschiedliche Grade an Benutzerinteraktion, von vollautomatischer Funktionsweise bis zu feingranularen manuellen Entscheidungen.The development of complex mechatronic systems requires the close collaboration of different disciplines, like mechanical engineering, electrical engineering, control engineering, and software engineering. To tackle the complexity of such systems, such a development is heavily based on models. Engineers use several models on different abstraction levels, for different purposes and with different view-points. Usually, a discipline-spanning system model is developed during the first, interdisciplinary system design phase. For the implementation phase, the disciplines use different models and tools to develop the discipline-specific aspects of the system. During such a model-based development, inconsistencies between the different discipline-specific models and the discipline-spanning system model are likely to occur, because changes to discipline-specific models may affect the discipline-spanning system model and models of other disciplines. These inconsistencies lead to increased development time and costs if they remain unresolved. Model transformation and synchronization are promising techniques to detect and resolve such inconsistencies. However, existing model synchronization solutions are not powerful enough to support the complex consistency relations of such an application scenario. In this thesis, we present a novel model synchronization technique that allows for synchronized models with multiple views and abstraction levels. To minimize the information loss and improve automation during the synchronization, it employs metrics to encode expert knowledge. The approach can be customized to allow different amounts of user interaction, from full automation to fine-grained manual decisions.Tag der Verteidigung: 24.10.2014Paderborn, Univ., Diss., 201

    Efficient spatial keyword query processing on geo-textual data

    Get PDF
    • …
    corecore