449 research outputs found
System-Level Design and Virtual Prototyping of a Telecommunication Application on a NUMA Platform
International audienceThe use of model-driven approaches for embedded system design has become a common practice. Among these model-driven approaches, only a few of them include the generation of a full-system simulation comprising operating system, code generation for tasks and hardware simulation models. Even less common is the extension to massively parallel, NoC based designs, such as required for high performance streaming applications where dozens of tasks are replicated onto identical general purpose processor cores of a Multi-processor System-on-chip (MP-SoC). We present the extension of a system-level tool to handle clustered Network-on-Chip (NoC) with virtual prototyping platforms. On the one hand, the automatic generation of the virtual prototype becomes more complex as topcell, address mapping and linker script have to be adapted. On the other hand, the exploration of the design space is particularly important for this class of applications, as performance may strongly be impacted by Non Uniform Memory Access (NUMA)
Implications of memory mappings on cache misses
This paper proposes an optimization by an alternative
approach to memory mapping. Low set associativity allows
representing cache lines by corresponding memory areas.
With the help of the notion of temporal reuse in the innermost
loop, the behaviour of values in the cache is modelled.
Combining these values into cache lines so that spatial reuse is
considered demands an alternative memory mapping.
Memory mappings with a low expectation of conflicts
are achieved by the random placement of arrays in memory.
Significant increase of cache misses for a worst case placement is
shown by experiments, as well as cache miss reduction achieved by
improving reuse
The Combinatorics of Cache Misses during Matrix Multiplication
In this paper we construct an analytic model of cache misses during matrix multiplication. The analysis in this paper applies to square matrices of size 2m where the array layout function is given in terms of a function Θ that interleaves the bits in the binary expansions of the row and column indices. We first analyze the number of cache misses for direct-mapped caches and then indicate how to extend this analysis to -way associative caches. The work in this paper accomplishes two things. First, we construct fast algorithms to estimate the number of cache misses. Second, we develop a theoretical understanding of cache misses that will allow us, in subsequent work, to approach the problem of minimizing cache misses by appropriately choosing the bit interleaving function that goes into the array layout function
Seminar-Beiträge Cache-Optimierung
Dieser Bericht enthält die Ausarbeitungen von Vorträgen aus einem
Seminar gleichen Namens, das am 29. Januar 1998 am Institut für
Programmstrukturen und Datenorganisation unter Leitung von Holger
Hopp, Daniela Genius und Michael Philippsen stattfand.
Die Ausarbeitungen geben einen Überblick über Techniken und Modelle,
um die Zwischenspeicher (Caches), die in praktisch allen
heutigen Rechnern eingesetzt werden, effektiv einzusetzen. Dabei
stehen insbesondere solche Techniken im Vordergrund, die von
Übersetzern ausgeführt werden können wie diverse
Schleifentransformationen, Speicherabbildungen, Vorzeitiges Laden von
Daten. Außerdem werden Konsistenzmodelle für
Parallelrechner untersucht
Application Domain-Driven System Design for Pervasive Video Processing
International audiencePervasive video processing in future Ambient Intelligence environments sets new challenges in embedded system design. In particular, very high performance requirements have to be combined with the constraints of deeply embedded systems, frequently changing operating modes, and low-cost, high-volume production. By leveraging upon the key properties of the application domain, we devised a computation model, a hardware template, and a programming approach which provide a natural mapping from application requirements to a complete system solution. Our approach enables the direct exploitation of concurrency and regularity in achieving the combined challenge of adaptability, performance, and efficiency
Multi-Periodic Process Networks: Technical Report
This paper aims at modeling video stream applications with structured data and multiple clocks. Multi-Periodic Process Networks (MPPN) are real-time process networks with an adaptable degree of synchronous behavior and a hierarchical structure. MPPN help to describe stream-processing applications and deduce resource requirements such as parallel functional units, throughput and buffer sizes
Multi-Periodic Process Networks: Prototyping and Verifying Stream-Processing Systems
International audienceModeling video and graphic streams with different clocks is largely an open problem. This article proposes a new kind of process network for application modeling, called Hierarchical Process Network. With properties such as abstraction, composition, synchronization and sequencing, hierarchy helps to describe stream-processing applications and deduce parameters such as throughput and buffer sizes more precisely. Real-time is explicit, as well as adaptable degrees of synchronous behavior
The SANDRA project: cooperative architecture/compiler technology for embedded real-time streaming applications
The convergence of digital television, Internet access, gaming, and digital media capture and playback stresses the importance of high-quality and high-performance video and graphics processing. The SANDRA project, a collaboration between Philips Research and INRIA, develops a consistent and efficient system design approach for regular, real-time constrained stream processing. The project aims at providing a system template with its associated compiler chain and application development framework, enabling an early validation of both the functional and the non-functional requirements of the application at every system design stage
Search for heavy resonances decaying to two Higgs bosons in final states containing four b quarks
A search is presented for narrow heavy resonances X decaying into pairs of Higgs bosons (H) in proton-proton collisions collected by the CMS experiment at the LHC at root s = 8 TeV. The data correspond to an integrated luminosity of 19.7 fb(-1). The search considers HH resonances with masses between 1 and 3 TeV, having final states of two b quark pairs. Each Higgs boson is produced with large momentum, and the hadronization products of the pair of b quarks can usually be reconstructed as single large jets. The background from multijet and t (t) over bar events is significantly reduced by applying requirements related to the flavor of the jet, its mass, and its substructure. The signal would be identified as a peak on top of the dijet invariant mass spectrum of the remaining background events. No evidence is observed for such a signal. Upper limits obtained at 95 confidence level for the product of the production cross section and branching fraction sigma(gg -> X) B(X -> HH -> b (b) over barb (b) over bar) range from 10 to 1.5 fb for the mass of X from 1.15 to 2.0 TeV, significantly extending previous searches. For a warped extra dimension theory with amass scale Lambda(R) = 1 TeV, the data exclude radion scalar masses between 1.15 and 1.55 TeV
- …