22,553 research outputs found
Parallelism-Aware Memory Interference Delay Analysis for COTS Multicore Systems
In modern Commercial Off-The-Shelf (COTS) multicore systems, each core can
generate many parallel memory requests at a time. The processing of these
parallel requests in the DRAM controller greatly affects the memory
interference delay experienced by running tasks on the platform. In this paper,
we model a modern COTS multicore system which has a nonblocking last-level
cache (LLC) and a DRAM controller that prioritizes reads over writes. To
minimize interference, we focus on LLC and DRAM bank partitioned systems. Based
on the model, we propose an analysis that computes a safe upper bound for the
worst-case memory interference delay. We validated our analysis on a real COTS
multicore platform with a set of carefully designed synthetic benchmarks as
well as SPEC2006 benchmarks. Evaluation results show that our analysis is more
accurately capture the worst-case memory interference delay and provides safer
upper bounds compared to a recently proposed analysis which significantly
under-estimate the delay.Comment: Technical Repor
Generalized Methodology for Array Processor Design of Real-time Systems
Many techniques and design tools have been developed for mapping algorithms to array processors. Linear mapping is usually used for regular algorithms. Large and complex problems are not regular by nature and regularization may cause a computational overhead which prevents the ability to meet real-time deadlines. In this paper, a systematic design methodology for mapping partially-regular as well as regular Dependence Graphs is presented. In this approach the set of all optimal solutions is generated under the given constraints. Due to nature of the problem and the tight timing constraints of real-time systems the set of alternative solutions is limited. An image processing example is discusse
Recommended from our members
Silicon compilation
Silicon compilation is a term used for many different purposes. In this paper we define silicon compilation as a mapping from some higher level description into layout. We define the basic issues in structural and behavioral silicon compilation and some possible solutions to those issues. Finally, we define the concept of an intelligent silicon compiler in which the compiler evaluates the quality of the generated design and attempts to improve it if it is not satisfactory
Design of multimedia processor based on metric computation
Media-processing applications, such as signal processing, 2D and 3D graphics
rendering, and image compression, are the dominant workloads in many embedded
systems today. The real-time constraints of those media applications have
taxing demands on today's processor performances with low cost, low power and
reduced design delay. To satisfy those challenges, a fast and efficient
strategy consists in upgrading a low cost general purpose processor core. This
approach is based on the personalization of a general RISC processor core
according the target multimedia application requirements. Thus, if the extra
cost is justified, the general purpose processor GPP core can be enforced with
instruction level coprocessors, coarse grain dedicated hardware, ad hoc
memories or new GPP cores. In this way the final design solution is tailored to
the application requirements. The proposed approach is based on three main
steps: the first one is the analysis of the targeted application using
efficient metrics. The second step is the selection of the appropriate
architecture template according to the first step results and recommendations.
The third step is the architecture generation. This approach is experimented
using various image and video algorithms showing its feasibility
Minimizing Energy Consumption of MPI Programs in Realistic Environment
Dynamic voltage and frequency scaling proves to be an efficient way of
reducing energy consumption of servers. Energy savings are typically achieved
by setting a well-chosen frequency during some program phases. However,
determining suitable program phases and their associated optimal frequencies is
a complex problem. Moreover, hardware is constrained by non negligible
frequency transition latencies. Thus, various heuristics were proposed to
determine and apply frequencies, but evaluating their efficiency remains an
issue. In this paper, we translate the energy minimization problem into a mixed
integer program that specifically models most current hardware limitations. The
problem solution then estimates the minimal energy consumption and the
associated frequency schedule. The paper provides two different formulations
and a discussion on the feasibility of each of them on realistic applications
Performance analysis of a Master/Slave switched Ethernet for military embedded applications
Current military communication network is a generation
old and is no longer effective in meeting the emerging
requirements imposed by the next generation military embedded applications. A new communication network based upon Full Duplex Switched Ethernet is proposed in this paper to overcome these limitations. To allow existing military subsystems to be easily supported by a Switched Ethernet network, our proposal consists in keeping their current centralized communication scheme by using an optimized master/slave transmission control on Switched Ethernet thanks to the Flexible Time Triggered (FTT) paradigm. Our main objective is to assess the performance
of such a proposal and estimate the quality of service we
can expect in terms of latency. Using the Network Calculus formalism, schedulability analysis are determined. These analysis are illustrated in the case of a realistic military embedded application extracted from a real military aircraft network, to highlight the proposal's ability to support the required time constrained communications
SELFISHMIGRATE: A Scalable Algorithm for Non-clairvoyantly Scheduling Heterogeneous Processors
We consider the classical problem of minimizing the total weighted flow-time
for unrelated machines in the online \emph{non-clairvoyant} setting. In this
problem, a set of jobs arrive over time to be scheduled on a set of
machines. Each job has processing length , weight , and is
processed at a rate of when scheduled on machine . The online
scheduler knows the values of and upon arrival of the job,
but is not aware of the quantity . We present the {\em first} online
algorithm that is {\em scalable} ((1+\eps)-speed
-competitive for any constant \eps > 0) for the
total weighted flow-time objective. No non-trivial results were known for this
setting, except for the most basic case of identical machines. Our result
resolves a major open problem in online scheduling theory. Moreover, we also
show that no job needs more than a logarithmic number of migrations. We further
extend our result and give a scalable algorithm for the objective of minimizing
total weighted flow-time plus energy cost for the case of unrelated machines
and obtain a scalable algorithm. The key algorithmic idea is to let jobs
migrate selfishly until they converge to an equilibrium. Towards this end, we
define a game where each job's utility which is closely tied to the
instantaneous increase in the objective the job is responsible for, and each
machine declares a policy that assigns priorities to jobs based on when they
migrate to it, and the execution speeds. This has a spirit similar to
coordination mechanisms that attempt to achieve near optimum welfare in the
presence of selfish agents (jobs). To the best our knowledge, this is the first
work that demonstrates the usefulness of ideas from coordination mechanisms and
Nash equilibria for designing and analyzing online algorithms
- âŠ