Search CORE

214 research outputs found

1. Introduction

Author: Torroni Paolo
Publication venue
Publication date: 06/03/2013
Field of study

For personal use only. Please do not repost or distribute

Almae Matris Studiorum Campus

Caching in real-time and embedded systems and Benchmarking the ARM Cortex-M3 and Quark x1000 proccessors

Author: Pueyo Ramón Pablo
Vaughan John
Publication venue: 'Universidad de Zaragoza'
Publication date: 01/01/2015
Field of study

The general goal is to compare performance of two processors for the low-end embedded market: Intel Quark x1000 vs. ARM Cortex M3, with special emphasis in the memory hierarchy. To do that, first we will assess the cache potential varying sizes, associativities and line sizes by means of CACTI, a cache modeling tool. Then we will review relevant research literature to conclude about the importance and possibilities of the memory hierarchy in real-time embedded systems. Finally, we will write an specific benchmark suite, using it to test the two referenced processors

Repositorio Universidad de Zaragoza

CROSS-LAYER CUSTOMIZATION FOR LOW POWER AND HIGH PERFORMANCE EMBEDDED MULTI-CORE PROCESSORS

Author: Yu Chenjie
Publication venue
Publication date: 01/01/2010
Field of study

Due to physical limitations and design difficulties, computer processor architecture has shifted to multi-core and even many-core based approaches in recent years. Such architectures provide potentials for sustainable performance scaling into future peta-scale/exa-scale computing platforms, at affordable power budget, design complexity, and verification efforts. To date, multi-core processor products have been replacing uni-core processors in almost every market segment, including embedded systems, general-purpose desktops and laptops, and super computers. However, many issues still remain with multi-core processor architectures that need to be addressed before their potentials could be fully realized. People in both academia and industry research community are still seeking proper ways to make efficient and effective use of these processors. The issues involve hardware architecture trade-offs, the system software service, the run-time management, and user application design, which demand more research effort into this field. Due to the architectural specialties with multi-core based computers, a Cross-Layer Customization framework is proposed in this work, which combines application specific information and system platform features, along with necessary operating system service support, to achieve exceptional power and performance efficiency for targeted multi-core platforms. Several topics are covered with specific optimization goals, including snoop cache coherence protocol, inter-core communication for producer-consumer applications, synchronization mechanisms, and off-chip memory bandwidth limitations. Analysis of benchmark program execution with conventional mechanisms is made to reveal the overheads in terms of power and performance. Specific customizations are proposed to eliminate such overheads with support from hardware, system software, compiler, and user applications. Experiments show significant improvement on system performance and power efficiency

Digital Repository at the University of Maryland

A Survey on Cache Management Mechanisms for Real-Time Embedded Systems

Author: ALHAMMAD AHMED
FRÖHLICH ANTÔNIO
GRACIOLI GIOVANI
MANCUSO RENATO
Pellizzoni Rodolfo
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/11/2015
Field of study

© ACM, 2015. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in ACM Computing Surveys, {48, 2, (November 2015)} http://doi.acm.org/10.1145/2830555Multicore processors are being extensively used by real-time systems, mainly because of their demand for increased computing power. However, multicore processors have shared resources that affect the predictability of real-time systems, which is the key to correctly estimate the worst-case execution time of tasks. One of the main factors for unpredictability in a multicore processor is the cache memory hierarchy. Recently, many research works have proposed different techniques to deal with caches in multicore processors in the context of real-time systems. Nevertheless, a review and categorization of these techniques is still an open topic and would be very useful for the real-time community. In this article, we present a survey of cache management techniques for real-time embedded systems, from the first studies of the field in 1990 up to the latest research published in 2014. We categorize the main research works and provide a detailed comparison in terms of similarities and differences. We also identify key challenges and discuss future research directions.King Saud University NSER

University of Waterloo's Institutional Repository

Crossref

A Survey of Techniques for Architecting TLBs

Author: Mittal Sparsh
Publication venue: 'Wiley'
Publication date: 01/01/2016
Field of study

“Translation lookaside buffer” (TLB) caches virtual to physical address translation information and is used in systems ranging from embedded devices to high-end servers. Since TLB is accessed very frequently and a TLB miss is extremely costly, prudent management of TLB is important for improving performance and energy efficiency of processors. In this paper, we present a survey of techniques for architecting and managing TLBs. We characterize the techniques across several dimensions to highlight their similarities and distinctions. We believe that this paper will be useful for chip designers, computer architects and system engineers

Research Archive of Indian Institute of Technology Hyderabad

Analysis of the reconfiguration latency and energy overheads for a Xilinx Virtex-5 FPGA

Author: Clemente Barreira Juan Antonio
Mecha Hortensia
Olivito Javier
Resano Javier
Serrano Felipe
Publication venue: 'Institution of Engineering and Technology (IET)'
Publication date: 01/01/2018
Field of study

In this paper we have evaluated the overhead and the tradeoffs of a set of components usually included in a system with run-time partial reconfiguration implemented on a Xilinx Virtex-5. Our analysis shows the benefits of including a scratchpad memory inside the reconfiguration controller in order to improve the efficiency of the reconfiguration process. We have designed a simple controller for this scratchpad that includes support for prefetching and caching in order to further reduce both the energy and latency overhead

Docta Complutense

Crossref

Repositorio Universidad de Zaragoza

Memory Optimizations for Time-Predictable Embedded Software

Author: VIVY SUHENDRA
Publication venue
Publication date: 12/08/2009
Field of study

Ph.DDOCTOR OF PHILOSOPH

ScholarBank@NUS

Architecture design of video processing systems on a chip

Author: Jaspers Egbert
Publication venue: Technische Universiteit Eindhoven
Publication date: 01/01/2003
Field of study

Pure OAI Repository

A Co-Processor Approach for Efficient Java Execution in Embedded Systems

Author: Säntti Tero
Publication venue: Turku Centre for Computer Science
Publication date: 10/11/2008
Field of study

This thesis deals with a hardware accelerated Java virtual machine, named REALJava. The REALJava virtual machine is targeted for resource constrained embedded systems. The goal is to attain increased computational performance with reduced power consumption. While these objectives are often seen as trade-offs, in this context both of them can be attained simultaneously by using dedicated hardware. The target level of the computational performance of the REALJava virtual machine is initially set to be as fast as the currently available full custom ASIC Java processors. As a secondary goal all of the components of the virtual machine are designed so that the resulting system can be scaled to support multiple co-processor cores. The virtual machine is designed using the hardware/software co-design paradigm. The partitioning between the two domains is flexible, allowing customizations to the resulting system, for instance the floating point support can be omitted from the hardware in order to decrease the size of the co-processor core. The communication between the hardware and the software domains is encapsulated into modules. This allows the REALJava virtual machine to be easily integrated into any system, simply by redesigning the communication modules. Besides the virtual machine and the related co-processor architecture, several performance enhancing techniques are presented. These include techniques related to instruction folding, stack handling, method invocation, constant loading and control in time domain. The REALJava virtual machine is prototyped using three different FPGA platforms. The original pipeline structure is modified to suit the FPGA environment. The performance of the resulting Java virtual machine is evaluated against existing Java solutions in the embedded systems field. The results show that the goals are attained, both in terms of computational performance and power consumption. Especially the computational performance is evaluated thoroughly, and the results show that the REALJava is more than twice as fast as the fastest full custom ASIC Java processor. In addition to standard Java virtual machine benchmarks, several new Java applications are designed to both verify the results and broaden the spectrum of the tests.Siirretty Doriast

UTUPub