3,837 research outputs found

    Comprehensive characterization of an open source document search engine

    Get PDF
    This work performs a thorough characterization and analysis of the open source Lucene search library. The article describes in detail the architecture, functionality, and micro-architectural behavior of the search engine, and investigates prominent online document search research issues. In particular, we study how intra-server index partitioning affects the response time and throughput, explore the potential use of low power servers for document search, and examine the sources of performance degradation ands the causes of tail latencies. Some of our main conclusions are the following: (a) intra-server index partitioning can reduce tail latencies but with diminishing benefits as incoming query traffic increases, (b) low power servers given enough partitioning can provide same average and tail response times as conventional high performance servers, (c) index search is a CPU-intensive cache-friendly application, and (d) C-states are the main culprits for performance degradation in document search.Web of Science162art. no. 1

    A load-sharing architecture for high performance optimistic simulations on multi-core machines

    Get PDF
    In Parallel Discrete Event Simulation (PDES), the simulation model is partitioned into a set of distinct Logical Processes (LPs) which are allowed to concurrently execute simulation events. In this work we present an innovative approach to load-sharing on multi-core/multiprocessor machines, targeted at the optimistic PDES paradigm, where LPs are speculatively allowed to process simulation events with no preventive verification of causal consistency, and actual consistency violations (if any) are recovered via rollback techniques. In our approach, each simulation kernel instance, in charge of hosting and executing a specific set of LPs, runs a set of worker threads, which can be dynamically activated/deactivated on the basis of a distributed algorithm. The latter relies in turn on an analytical model that provides indications on how to reassign processor/core usage across the kernels in order to handle the simulation workload as efficiently as possible. We also present a real implementation of our load-sharing architecture within the ROme OpTimistic Simulator (ROOT-Sim), namely an open-source C-based simulation platform implemented according to the PDES paradigm and the optimistic synchronization approach. Experimental results for an assessment of the validity of our proposal are presented as well

    Load sharing for optimistic parallel simulations on multicore machines

    Get PDF
    Parallel Discrete Event Simulation (PDES) is based on the partitioning of the simulation model into distinct Logical Processes (LPs), each one modeling a portion of the entire system, which are allowed to execute simulation events concurrently. This allows exploiting parallel computing architectures to speedup model execution, and to make very large models tractable. In this article we cope with the optimistic approach to PDES, where LPs are allowed to concurrently process their events in a speculative fashion, and rollback/ recovery techniques are used to guarantee state consistency in case of causality violations along the speculative execution path. Particularly, we present an innovative load sharing approach targeted at optimizing resource usage for fruitful simulation work when running an optimistic PDES environment on top of multi-processor/multi-core machines. Beyond providing the load sharing model, we also define a load sharing oriented architectural scheme, based on a symmetric multi-threaded organization of the simulation platform. Finally, we present a real implementation of the load sharing architecture within the open source ROme OpTimistic Simulator (ROOT-Sim) package. Experimental data for an assessment of both viability and effectiveness of our proposal are presented as well. Copyright is held by author/owner(s)

    Adaptive CPU Resource Allocation for Emulator in Kernel-based Virtual Machine

    Full text link
    The technologies of heterogeneous multi-core architectures, co-location, and virtualization can be used to reduce server power consumption and improve system utilization, which are three important technologies for data centers. This article explores the scheduling strategy of Emulator threads within virtual machine processes in a scenario of co-location of multiple virtual machines on heterogeneous multi-core architectures. In this co-location scenario, the scheduling strategy for Emulator threads significantly affects the performance of virtual machines. This article focuses on this thread for the first time in the relevant field. This article found that the scheduling latency metric can well indicate the running status of the vCPU threads and Emulator threads in the virtualization environment, and applied this metric to the design of the scheduling strategy. This article designed an Emulator thread scheduler based on heuristic rules, which, in coordination with the host operating system's scheduler, dynamically adjusts the scheduling scope of Emulator threads to improve the overall performance of virtual machines. The article found that in real application scenarios, the scheduler effectively improved the performance of applications within virtual machines, with a maximum performance improvement of 40.7%
    corecore