Search CORE

22,553 research outputs found

Parallelism-Aware Memory Interference Delay Analysis for COTS Multicore Systems

Author: Yun Heechul
Publication venue
Publication date: 25/07/2014
Field of study

In modern Commercial Off-The-Shelf (COTS) multicore systems, each core can generate many parallel memory requests at a time. The processing of these parallel requests in the DRAM controller greatly affects the memory interference delay experienced by running tasks on the platform. In this paper, we model a modern COTS multicore system which has a nonblocking last-level cache (LLC) and a DRAM controller that prioritizes reads over writes. To minimize interference, we focus on LLC and DRAM bank partitioned systems. Based on the model, we propose an analysis that computes a safe upper bound for the worst-case memory interference delay. We validated our analysis on a real COTS multicore platform with a set of carefully designed synthetic benchmarks as well as SPEC2006 benchmarks. Evaluation results show that our analysis is more accurately capture the worst-case memory interference delay and provides safer upper bounds compared to a recently proposed analysis which significantly under-estimate the delay.Comment: Technical Repor

arXiv.org e-Print Archive

CiteSeerX

Generalized Methodology for Array Processor Design of Real-time Systems

Author: El hadidy F.
Herrmann O.E.
Publication venue: IEEE
Publication date: 01/01/1994
Field of study

Many techniques and design tools have been developed for mapping algorithms to array processors. Linear mapping is usually used for regular algorithms. Large and complex problems are not regular by nature and regularization may cause a computational overhead which prevents the ability to meet real-time deadlines. In this paper, a systematic design methodology for mapping partially-regular as well as regular Dependence Graphs is presented. In this approach the set of all optimal solutions is generated under the given constraints. Due to nature of the problem and the tight timing constraints of real-time systems the set of alternative solutions is limited. An image processing example is discusse

University of Twente Research Information

Recommended from our members

Silicon compilation

Author: Dutt Nikil D.
Gajski Daniel D.
Pangrle Barry M.
Publication venue: eScholarship, University of California
Publication date: 01/01/1987
Field of study

Silicon compilation is a term used for many different purposes. In this paper we define silicon compilation as a mapping from some higher level description into layout. We define the basic issues in structural and behavioral silicon compilation and some possible solutions to those issues. Finally, we define the concept of an intelligent silicon compiler in which the compiler evaluates the quality of the generated design and attempts to improve it if it is not satisfactory

eScholarship - University of California

Design of multimedia processor based on metric computation

Author: Balasa
Berekovic
Jean Luc Philippe
Jean Philippe Diguet
Mohamed Abid
Nader Ben Amor
Suzuki
Wuytack
Yannick Le Moullec
Publication venue: 'Elsevier BV'
Publication date: 01/01/2005
Field of study

Media-processing applications, such as signal processing, 2D and 3D graphics rendering, and image compression, are the dominant workloads in many embedded systems today. The real-time constraints of those media applications have taxing demands on today's processor performances with low cost, low power and reduced design delay. To satisfy those challenges, a fast and efficient strategy consists in upgrading a low cost general purpose processor core. This approach is based on the personalization of a general RISC processor core according the target multimedia application requirements. Thus, if the extra cost is justified, the general purpose processor GPP core can be enforced with instruction level coprocessors, coarse grain dedicated hardware, ad hoc memories or new GPP cores. In this way the final design solution is tailored to the application requirements. The proposed approach is based on three main steps: the first one is the analysis of the targeted application using efficient metrics. The second step is the selection of the appropriate architecture template according to the first step results and recommendations. The third step is the architecture generation. This approach is experimented using various image and video algorithms showing its feasibility

arXiv.org e-Print Archive

Crossref

HAL-Université de Bretagne Occidentale

VBN

Minimizing Energy Consumption of MPI Programs in Realistic Environment

Author: Guermouche Amina
Jalby William
Pradelle Benoit
Triquenaux Nicolas
Publication venue
Publication date: 23/02/2015
Field of study

Dynamic voltage and frequency scaling proves to be an efficient way of reducing energy consumption of servers. Energy savings are typically achieved by setting a well-chosen frequency during some program phases. However, determining suitable program phases and their associated optimal frequencies is a complex problem. Moreover, hardware is constrained by non negligible frequency transition latencies. Thus, various heuristics were proposed to determine and apply frequencies, but evaluating their efficiency remains an issue. In this paper, we translate the energy minimization problem into a mixed integer program that specifically models most current hardware limitations. The problem solution then estimates the minimal energy consumption and the associated frequency schedule. The paper provides two different formulations and a discussion on the feasibility of each of them on realistic applications

arXiv.org e-Print Archive

Hal-Diderot

HAL UVSQ

Performance analysis of a Master/Slave switched Ethernet for military embedded applications

Author: Fraboul Christian
Frances Fabrice
Mifdaoui Ahlem
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 07/11/2010
Field of study

Current military communication network is a generation old and is no longer effective in meeting the emerging requirements imposed by the next generation military embedded applications. A new communication network based upon Full Duplex Switched Ethernet is proposed in this paper to overcome these limitations. To allow existing military subsystems to be easily supported by a Switched Ethernet network, our proposal consists in keeping their current centralized communication scheme by using an optimized master/slave transmission control on Switched Ethernet thanks to the Flexible Time Triggered (FTT) paradigm. Our main objective is to assess the performance of such a proposal and estimate the quality of service we can expect in terms of latency. Using the Network Calculus formalism, schedulability analysis are determined. These analysis are illustrated in the case of a realistic military embedded application extracted from a real military aircraft network, to highlight the proposal's ability to support the required time constrained communications

Crossref

Scientific Publications of the University of Toulouse II Le Mirail

Open Archive Toulouse Archive Ouverte

SELFISHMIGRATE: A Scalable Algorithm for Non-clairvoyantly Scheduling Heterogeneous Processors

Author: Im Sungjin
Kulkarni Janardhan
Munagala Kamesh
Pruhs Kirk
Publication venue
Publication date: 01/01/2014
Field of study

We consider the classical problem of minimizing the total weighted flow-time for unrelated machines in the online \emph{non-clairvoyant} setting. In this problem, a set of jobs

J

arrive over time to be scheduled on a set of

M

machines. Each job

j

has processing length

p_j

, weight

w_j

, and is processed at a rate of

\ell_{ij}

when scheduled on machine

i

. The online scheduler knows the values of

w_j

and

\ell_{ij}

upon arrival of the job, but is not aware of the quantity

p_j

. We present the {\em first} online algorithm that is {\em scalable} ((1+\eps)-speed

O(\frac{1}{\epsilon^2})

-competitive for any constant \eps > 0) for the total weighted flow-time objective. No non-trivial results were known for this setting, except for the most basic case of identical machines. Our result resolves a major open problem in online scheduling theory. Moreover, we also show that no job needs more than a logarithmic number of migrations. We further extend our result and give a scalable algorithm for the objective of minimizing total weighted flow-time plus energy cost for the case of unrelated machines and obtain a scalable algorithm. The key algorithmic idea is to let jobs migrate selfishly until they converge to an equilibrium. Towards this end, we define a game where each job's utility which is closely tied to the instantaneous increase in the objective the job is responsible for, and each machine declares a policy that assigns priorities to jobs based on when they migrate to it, and the execution speeds. This has a spirit similar to coordination mechanisms that attempt to achieve near optimum welfare in the presence of selfish agents (jobs). To the best our knowledge, this is the first work that demonstrates the usefulness of ideas from coordination mechanisms and Nash equilibria for designing and analyzing online algorithms

arXiv.org e-Print Archive

CiteSeerX

Crossref