3,074 research outputs found

    Empirical Evaluation of the Parallel Distribution Sweeping Framework on Multicore Architectures

    Full text link
    In this paper, we perform an empirical evaluation of the Parallel External Memory (PEM) model in the context of geometric problems. In particular, we implement the parallel distribution sweeping framework of Ajwani, Sitchinava and Zeh to solve batched 1-dimensional stabbing max problem. While modern processors consist of sophisticated memory systems (multiple levels of caches, set associativity, TLB, prefetching), we empirically show that algorithms designed in simple models, that focus on minimizing the I/O transfers between shared memory and single level cache, can lead to efficient software on current multicore architectures. Our implementation exhibits significantly fewer accesses to slow DRAM and, therefore, outperforms traditional approaches based on plane sweep and two-way divide and conquer.Comment: Longer version of ESA'13 pape

    The potential of programmable logic in the middle: cache bleaching

    Full text link
    Consolidating hard real-time systems onto modern multi-core Systems-on-Chip (SoC) is an open challenge. The extensive sharing of hardware resources at the memory hierarchy raises important unpredictability concerns. The problem is exacerbated as more computationally demanding workload is expected to be handled with real-time guarantees in next-generation Cyber-Physical Systems (CPS). A large body of works has approached the problem by proposing novel hardware re-designs, and by proposing software-only solutions to mitigate performance interference. Strong from the observation that unpredictability arises from a lack of fine-grained control over the behavior of shared hardware components, we outline a promising new resource management approach. We demonstrate that it is possible to introduce Programmable Logic In-the-Middle (PLIM) between a traditional multi-core processor and main memory. This provides the unique capability of manipulating individual memory transactions. We propose a proof-of-concept system implementation of PLIM modules on a commercial multi-core SoC. The PLIM approach is then leveraged to solve long-standing issues with cache coloring. Thanks to PLIM, colored sparse addresses can be re-compacted in main memory. This is the base principle behind the technique we call Cache Bleaching. We evaluate our design on real applications and propose hypervisor-level adaptations to showcase the potential of the PLIM approach.Accepted manuscrip

    DReAM: An approach to estimate per-Task DRAM energy in multicore systems

    Get PDF
    Accurate per-task energy estimation in multicore systems would allow performing per-task energy-aware task scheduling and energy-aware billing in data centers, among other applications. Per-task energy estimation is challenged by the interaction between tasks in shared resources, which impacts tasks’ energy consumption in uncontrolled ways. Some accurate mechanisms have been devised recently to estimate per-task energy consumed on-chip in multicores, but there is a lack of such mechanisms for DRAM memories. This article makes the case for accurate per-task DRAM energy metering in multicores, which opens new paths to energy/performance optimizations. In particular, the contributions of this article are (i) an ideal per-task energy metering model for DRAM memories; (ii) DReAM, an accurate yet low cost implementation of the ideal model (less than 5% accuracy error when 16 tasks share memory); and (iii) a comparison with standard methods (even distribution and access-count based) proving that DReAM is much more accurate than these other methods.Peer ReviewedPostprint (author's final draft

    Heracles: Fully Synthesizable Parameterized MIPS-Based Multicore System

    Get PDF
    Heracles is an open-source complete multicore system written in Verilog. It is fully parameterized and can be reconfigured and synthesized into different topologies and sizes. Each processing node has a 7-stage pipeline, fully bypassed, microprocessor running the MIPS-III ISA, a 4-stage input-buffer, virtual-channel router, and a local variable-size shared memory. Our design is highly modular with clear interfaces between the core, the memory hierarchy, and the on-chip network. In the baseline design, the microprocessor is attached to two caches, one instruction cache and one data cache, which are oblivious to the global memory organization. The memory system in Heracles can be configured as one single global shared memory (SM), or distributed shared memory (DSM), or any combination thereof. Each core is connected to the rest of the network of processors by a parameterized, realistic, wormhole router. We show different topology configurations of the system, and their synthesis results on the Xilinx Virtex-5 LX330T FPGA board. We also provide a small MIPS cross-compiler toolchain to assist in developing software for Heracles

    Design of a High Capacity, Scalable, and Green Wireless Communication System Leveraging the Unlicensed Spectrum

    Get PDF
    The stunning demand for mobile wireless data that has been recently growing at an exponential rate requires a several fold increase in spectrum. The use of unlicensed spectrum is thus critically needed to aid the existing licensed spectrum to meet such a huge mobile wireless data traffic growth demand in a cost effective manner. The deployment of Long Term Evolution (LTE) in the unlicensed spectrum (LTE-U) has recently been gaining significant industry momentum. The lower transmit power regulation of the unlicensed spectrum makes LTE deployment in the unlicensed spectrum suitable only for a small cell. A small cell utilizing LTE-L (LTE in licensed spectrum), and LTE-U (LTE in unlicensed spectrum) will therefore significantly reduce the total cost of ownership (TCO) of a small cell, while providing the additional mobile wireless data offload capacity from Macro Cell to small cell in LTE Heterogeneous Networks (HetNet), to meet such an increase in wireless data demand. The U.S. 5 GHz Unlicensed National Information Infrastructure (U-NII) bands that are currently under consideration for LTE deployment in the unlicensed spectrum contain only a limited number of 20 MHZ channels. Thus in a dense multi-operator deployment scenario, one or more LTE-U small cells have to co-exist and share the same 20 MHz unlicensed channel with each other and with the incumbent Wi-Fi. This dissertation presents a proactive small cell interference mitigation strategy for improving the spectral efficiency of LTE networks in the unlicensed spectrum. It describes the scenario and demonstrate via simulation results, that in the absence of an explicit interference mitigation mechanism, there will be a significant degradation in the overall LTE-U system performance for LTE-U co-channel co-existence in countries such as U.S. that do not mandate Listen-Before-Talk (LBT) regulations. An unlicensed spectrum Inter Cell Interference Coordination (usICIC) mechanism is then presented as a time-domain multiplexing technique for interference mitigation for the sharing of an unlicensed channel by multi-operator LTE-U small cells. Through extensive simulation results, it is demonstrated that the proposed usICIC mechanism will result in 40% or more improvement in the overall LTE-U system performance (throughput) leading to increased wireless communication system capacity. The ever increasing demand for mobile wireless data is also resulting in a dramatic expansion of wireless network infrastructure by all service providers resulting in significant escalation in energy consumption by the wireless networks. This not only has an impact on the recurring operational expanse (OPEX) for the service providers, but importantly the resulting increase in greenhouse gas emission is not good for the environment. Energy efficiency has thus become one of the critical tenets in the design and deployment of Green wireless communication systems. Consequently the market trend for next-generation communication systems has been towards miniaturization to meet this stunning ever increasing demand for mobile wireless data, leading towards the need for scalable distributed and parallel processing system architecture that is energy efficient, and high capacity. Reducing cost and size while increasing capacity, ensuring scalability, and achieving energy efficiency requires several design paradigm shifts. This dissertation presents the design for a next generation wireless communication system that employs new energy efficient distributed and parallel processing system architecture to achieve these goals while leveraging the unlicensed spectrum to significantly increase (by a factor of two) the capacity of the wireless communication system. This design not only significantly reduces the upfront CAPEX, but also the recurring OPEX for the service providers to maintain their next generation wireless communication networks
    • …
    corecore