Search CORE

137 research outputs found

Scheduling Data-Intensive Tasks on Heterogeneous Many Cores

Author: Kotthaus Helena
Tözün Pinar
Publication venue
Publication date: 01/01/2019
Field of study

The IT University of Copenhagen's Repository

MorphStream: Scalable Processing of Transactions over Streams on Multicores

Author: Liu Haikun
Mao Yancan
Markl Volker
Yang Zhonghao
Zhang Shuhao
Zhao Jianjun
Publication venue
Publication date: 24/07/2023
Field of study

Transactional Stream Processing Engines (TSPEs) form the backbone of modern stream applications handling shared mutable states. Yet, the full potential of these systems, specifically in exploiting parallelism and implementing dynamic scheduling strategies, is largely unexplored. We present MorphStream, a TSPE designed to optimize parallelism and performance for transactional stream processing on multicores. Through a unique three-stage execution paradigm (i.e., planning, scheduling, and execution), MorphStream enables dynamic scheduling and parallel processing in TSPEs. Our experiment showcased MorphStream outperforms current TSPEs across various scenarios and offers support for windowed state transactions and non-deterministic state access, demonstrating its potential for broad applicability

arXiv.org e-Print Archive

Towards Power-Aware Data Pipelining on Multicores

Author: Aldinucci Marco
Daniele De Sensi
Gabriele Mencagli
Marco Danelutto
Massimo Torquati
Publication venue: 'Universidad de Valladolid'
Publication date: 01/01/2017
Field of study

Institutional Research Information System University of Turin

How to Stop Under-Utilization and Love Multicores

Author: Ailamaki Anastasia
Liarou Erietta
Porobic Danica
Psaroudakis Iraklis
Tözün Pinar
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 14/02/2014
Field of study

Designing scalable transaction processing systems on modern hardware has been a challenge for almost a decade. Hardware trends oblige software to overcome three major challenges against systems scalability: (1) Exploiting the abundant thread-level parallelism provided by multicores, (2) Achieving predictively efficient execution despite the variability in communication latencies among cores on multisocket multicores, and (3) Taking advantage of the aggressive micro-architectural features. In this tutorial, we shed light on the above three challenges and survey recent proposals to alleviate them. First, we present a systematic way of eliminating scalability bottlenecks based on minimizing unbounded communication and show several techniques that apply the presented methodology to minimize bottlenecks in major components of transaction processing systems. Then, we analyze the problems that arise from the non-uniform nature of communication latencies on modern multisockets and ways to address them for systems that already scale well on multicores. Finally, we examine the sources of under-utilization within a modern processor and present insights and techniques to better exploit the micro-architectural resources of a processor by improving cache locality at the right level

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Power-Aware Pipelining with Automatic Concurrency Control

Author: Aldinucci Marco
Danelutto Marco
Daniele De Sensi
Gabriele Mencagli
Massimo Torquati
Publication venue: 'Wiley'
Publication date: 01/01/2019
Field of study

Institutional Research Information System University of Turin

07361 Abstracts Collection -- Programming Models for Ubiquitous Parallelism

Author: Cohen Albert
Lengauer Christian
Midkiff Samuel P.
Wong David Chi-Leung
Publication venue: Dagstuhl Seminar Proceedings. 07361 - Programming Models for Ubiquitous Parallelism
Publication date: 01/01/2008
Field of study

From 02.09. to 07.09.2007, the Dagstuhl Seminar 07361 ``Programming Models for Ubiquitous Parallelism\u27\u27 was held in the International Conference and Research Center (IBFI), Schloss Dagstuhl. During the seminar, several participants presented their current research, and ongoing work and open problems were discussed. Abstracts of the presentations given during the seminar as well as abstracts of seminar results and ideas are put together in this paper. The first section describes the seminar topics and goals in general. Links to extended abstracts or full papers are provided, if available

Dagstuhl Research Online Publication Server

A Survey on Transactional Stream Processing

Author: Markl Volker
Soto Juan
Zhang Shuhao
Publication venue
Publication date: 31/05/2023
Field of study

Transactional stream processing (TSP) strives to create a cohesive model that merges the advantages of both transactional and stream-oriented guarantees. Over the past decade, numerous endeavors have contributed to the evolution of TSP solutions, uncovering similarities and distinctions among them. Despite these advances, a universally accepted standard approach for integrating transactional functionality with stream processing remains to be established. Existing TSP solutions predominantly concentrate on specific application characteristics and involve complex design trade-offs. This survey intends to introduce TSP and present our perspective on its future progression. Our primary goals are twofold: to provide insights into the diverse TSP requirements and methodologies, and to inspire the design and development of groundbreaking TSP systems

arXiv.org e-Print Archive

Tailbench: a benchmark suite and evaluation methodology for latency-critical applications

Author: Kasture Harshad
Sanchez Daniel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/09/2016
Field of study

Latency-critical applications, common in datacenters, must achieve small and predictable tail (e.g., 95th or 99th percentile) latencies. Their strict performance requirements limit utilization and efficiency in current datacenters. These problems have sparked research in hardware and software techniques that target tail latency. However, research in this area is hampered by the lack of a comprehensive suite of latency-critical benchmarks. We present TailBench, a benchmark suite and evaluation methodology that makes latency-critical workloads as easy to run and characterize as conventional, throughput-oriented ones. TailBench includes eight applications that span a wide range of latency requirements and domains, and a harness that implements a robust and statistically sound load-testing methodology. The modular design of the TailBench harness facilitates multiple load-testing scenarios, ranging from multi-node configurations that capture network overheads, to simplified single-node configurations that allow measuring tail latency in simulation. Validation results show that the simplified configurations are accurate for most applications. This flexibility enables rapid prototyping of hardware and software techniques for latency-critical workloads.National Science Foundation (U.S.) (CCF-1318384)Qatar Computing Research InstituteGoogle (Firm) (Google Research Award

DSpace@MIT

Crossref

Effective data parallel computing on multicore processors

Author: Byun Jong-Ho
NC DOCKS at The University of North Carolina at Charlotte
Publication venue
Publication date: 01/01/2010
Field of study

The rise of chip multiprocessing or the integration of multiple general purpose processing cores on a single chip (multicores), has impacted all computing platforms including high performance, servers, desktops, mobile, and embedded processors. Programmers can no longer expect continued increases in software performance without developing parallel, memory hierarchy friendly software that can effectively exploit the chip level multiprocessing paradigm of multicores. The goal of this dissertation is to demonstrate a design process for data parallel problems that starts with a sequential algorithm and ends with a high performance implementation on a multicore platform. Our design process combines theoretical algorithm analysis with practical optimization techniques. Our target multicores are quad-core processors from Intel and the eight-SPE IBM Cell B.E. Target applications include Matrix Multiplications (MM), Finite Difference Time Domain (FDTD), LU Decomposition (LUD), and Power Flow Solver based on Gauss-Seidel (PFS-GS) algorithms. These applications are popular computation methods in science and engineering problems and are characterized by unit-stride (MM, LUD, and PFS-GS) or 2-point stencil (FDTD) memory access pattern. The main contributions of this dissertation include a cache- and space-efficient algorithm model, integrated data pre-fetching and caching strategies, and in-core optimization techniques. Our multicore efficient implementations of the above described applications outperform nai¨ve parallel implementations by at least 2x and scales well with problem size and with the number of processing cores

The University of North Carolina at Greensboro