Search CORE

9 research outputs found

A DVS system based on the trade-off between energy savings and execution time

Author: Alou Cervera Pedro
Cobos Márquez José Antonio
García Suárez Oscar
Oliver Ramírez Jesús Angel
Vasic Miroslav
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/09/2008
Field of study

DVS (Dynamic Voltage Scaling) is a technique used for reducing the power consumption of digital circuits. The power consumed by these circuits has a main component (dynamic power) that is proportional to the square of the supply voltage. Additionally, for every supply voltage, there is a maximum value of the clock frequency. The advantage of using DVS is that the supply voltage (and hence clock frequency) can be adjusted depending on the specific needs during execution. The DVS concept has been used in some commercial products like Transmeta’s Crusoe [1], Intel Speed Step [2], AMD K6 [3], Hitachi SH4 [4], etc. This paper presents results obtained by using a DVS algorithm based on the workload estimation and trade-off between the execution time and power savings. It is discussed about influence of the power supply’s slew rate, algorithms influence on the system performance and problems to estimate the processors workload. The DVS system is realized on Intel’s PXA255 platform and energy savings have been calculated by measuring directly voltages and currents on the platform

Crossref

Archivo Digital UPM

Smart vision in system-on-chip applications

Author: Wells Cade Cenric
Publication venue: ProQuest Dissertations & Theses,
Publication date: 01/01/2005
Field of study

In the last decade the ability to design and manufacture integrated circuits with higher transistor densities has led to the integration of complete systems on a single silicon die. These are commonly referred to as System-on-Chip (SoC). As SoCs processes can incorporate multiple technologies it is now feasible to produce single chip camera systems with embedded image processing, known as Imager-on-Chips (IoC). The development of IoCs is complicated due to the mixture of digital and analog components and the high cost of prototyping these designs using silicon processes. There are currently no re-usable prototyping platforms that specifically address the needs of IoC development. This thesis details a new prototyping platform specifically for use in the development of low-cost mass-market IoC applications. FPGA technology was utilised to implement a frame-based processing architecture suitable for supporting a range of real-time imaging and machine vision applications. To demonstrate the effectiveness of the prototyping platform, an example object counting and highlighting application was developed and functionally verified in real-time. A high-level IoC cost model was formulated to calculate the cost of manufacturing prototyped applications as a single IoC. This highlighted the requirement for careful analysis of optical issues, embedded imager array size and the silicon process used to ensure the desired IoC unit cost was achieved. A modified version of the FPGA architecture, which would result in improving the DSP performance, is also proposed

Glasgow Theses Service

OpenGrey Repository

Diseño e implementación de un módulo para la realización de prácticas en la asignatura “Sistemas de Tiempo Real” basado en el sistema operativo RTEMS

Author: Rodríguez Cerro César
Publication venue
Publication date: 09/10/2017
Field of study

El fin del presente proyecto es la creación de un entorno de trabajo que englobe las herramientas necesarias para el desarrollo de aplicaciones de tiempo real y que permita después, generar una aplicación concreta que forme una de las prácticas de la asignatura “Sistemas de Tiempo Real” correspondiente al Grado en Ingeniería Informática de la Universidad Carlos III de Madrid. En concreto, en este proyecto se utilizará el sistema operativo de tiempo real RTEMS para la implementación de un módulo correspondiente a un servidor de control remoto y la plataforma Arduino para el desarrollo del módulo correspondiente a un sistema de control empotrado en un dispositivo concreto. Ambos módulos se comunicarán entre sí para lograr un objetivo final y conformarán la práctica que los futuros alumnos deben realizar. Además se proporciona la documentación necesaria para la creación del entorno de trabajo así como se propone el enunciado y los criterios de evaluación para una posible práctica.Ingeniería Informátic

Universidad Carlos III de Madrid e-Archivo

Energy Efficient Designs for Collaborative Signal and Information Processing inWireless Sensor Networks

Author: Xu Yingyue
Publication venue: TRACE: Tennessee Research and Creative Exchange
Publication date: 01/05/2005
Field of study

Collaborative signal and information processing (CSIP) plays an important role in the deployment of wireless sensor networks. Since each sensor has limited computing capability, constrained power usage, and limited sensing range, collaboration among sensor nodes is important in order to compensate for each other’s limitation as well as to improve the degree of fault tolerance. In order to support the execution of CSIP algorithms, distributed computing paradigm and clustering protocols, are needed, which are the major concentrations of this dissertation. In order to facilitate collaboration among sensor nodes, we present a mobile-agent computing paradigm, where instead of each sensor node sending local information to a processing center, as is typical in the client/server-based computing, the processing code is moved to the sensor nodes through mobile agents. We further conduct extensive performance evaluation versus the traditional client/server-based computing. Experimental results show that the mobile agent paradigm performs much better when the number of nodes is large while the client/server paradigm is advantageous when the number of nodes is small. Based on this result, we propose a hybrid computing paradigm that adopts different computing models within different clusters of sensor nodes. Either the client/server or the mobile agent paradigm can be employed within clusters or between clusters according to the different cluster configurations. This new computing paradigm can take full advantages of both client/server and mobile agent computing paradigms. Simulations show that the hybrid computing paradigm performs better than either the client/server or the mobile agent computing. The mobile agent itinerary has a significant impact on the overall performance of the sensor network. We thus formulate both the static mobile agent planning and the dynamic mobile agent planning as optimization problems. Based on the models, we present three itinerary planning algorithms. We have showed, through simulation, that the predictive dynamic itinerary performs the best under a wide range of conditions, thus making it particularly suitable for CSIP in wireless sensor networks. In order to facilitate the deployment of hybrid computing paradigm, we proposed a decentralized reactive clustering (DRC) protocol to cluster the sensor network in an energy-efficient way. The clustering process is only invoked by events occur in the sensor network. Nodes that do not detect the events are put into the sleep state to save energy. In addition, power control technique is used to minimize the transmission power needed. The advantages of DRC protocol are demonstrated through simulations

University of Tennessee, Knoxville: Trace

Characterization and Avoidance of Critical Pipeline Structures in Aggressive Superscalar Processors

Author: Sassone Peter G.
Publication venue: Georgia Institute of Technology
Publication date: 20/07/2005
Field of study

In recent years, with only small fractions of modern processors now accessible in a single cycle, computer architects constantly fight against propagation issues across the die. Unfortunately this trend continues to shift inward, and now the even most internal features of the pipeline are designed around communication, not computation. To address the inward creep of this constraint, this work focuses on the characterization of communication within the pipeline itself, architectural techniques to avoid it when possible, and layout co-design for early detection of problems. I present work in creating a novel detection tool for common case operand movement which can rapidly characterize an applications dataflow patterns. The results produced are suitable for exploitation as a small number of patterns can describe a significant portion of modern applications. Work on dynamic dependence collapsing takes the observations from the pattern results and shows how certain groups of operations can be dynamically grouped, avoiding unnecessary communication between individual instructions. This technique also amplifies the efficiency of pipeline data structures such as the reorder buffer, increasing both IPC and frequency. I also identify the same sets of collapsible instructions at compile time, producing the same benefits with minimal hardware complexity. This technique is also done in a backward compatible manner as the groups are exposed by simple reordering of the binarys instructions. I present aggressive pipelining approaches for these resources which avoids the critical timing often presumed necessary in aggressive superscalar processors. As these structures are designed for the worst case, pipelining them can produce greater frequency benefit than IPC loss. I also use the observation that the dynamic issue order for instructions in aggressive superscalar processors is predictable. Thus, a hardware mechanism is introduced for caching the wakeup order for groups of instructions efficiently. These wakeup vectors are then used to speculatively schedule instructions, avoiding the dynamic scheduling when it is not necessary. Finally, I present a novel approach to fast and high-quality chip layout. By allowing architects to quickly evaluate what if scenarios during early high-level design, chip designs are less likely to encounter implementation problems later in the process.Ph.D.Committee Chair: Scott Wills; Committee Member: David Schimmel; Committee Member: Gabriel Loh; Committee Member: Hsien-Hsin Lee; Committee Member: Yorai Ward

Scholarly Materials And Research @ Georgia Tech

Improving cache Behavior in CMP architectures throug cache partitioning techniques

Author: Moretó Planas Miquel
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2010
Field of study

The evolution of microprocessor design in the last few decades has changed significantly, moving from simple inorder single core architectures to superscalar and vector architectures in order to extract the maximum available instruction level parallelism. Executing several instructions from the same thread in parallel allows significantly improving the performance of an application. However, there is only a limited amount of parallelism available in each thread, because of data and control dependences. Furthermore, designing a high performance, single, monolithic processor has become very complex due to power and chip latencies constraints. These limitations have motivated the use of thread level parallelism (TLP) as a common strategy for improving processor performance. Multithreaded processors allow executing different threads at the same time, sharing some hardware resources. There are several flavors of multithreaded processors that exploit the TLP, such as chip multiprocessors (CMP), coarse grain multithreading, fine grain multithreading, simultaneous multithreading (SMT), and combinations of them.To improve cost and power efficiency, the computer industry has adopted multicore chips. In particular, CMP architectures have become the most common design decision (combined sometimes with multithreaded cores). Firstly, CMPs reduce design costs and average power consumption by promoting design re-use and simpler processor cores. For example, it is less complex to design a chip with many small, simple cores than a chip with fewer, larger, monolithic cores.Furthermore, simpler cores have less power hungry centralized hardware structures. Secondly, CMPs reduce costs by improving hardware resource utilization. On a multicore chip, co-scheduled threads can share costly microarchitecture resources that would otherwise be underutilized. Higher resource utilization improves aggregate performance and enables lower cost design alternatives.One of the resources that impacts most on the final performance of an application is the cache hierarchy. Caches store data recently used by the applications in order to take advantage of temporal and spatial locality of applications. Caches provide fast access to data, improving the performance of applications. Caches with low latencies have to be small, which prompts the design of a cache hierarchy organized into several levels of cache.In CMPs, the cache hierarchy is normally organized in a first level (L1) of instruction and data caches private to each core. A last level of cache (LLC) is normally shared among different cores in the processor (L2, L3 or both). Shared caches increase resource utilization and system performance. Large caches improve performance and efficiency by increasing the probability that each application can access data from a closer level of the cache hierarchy. It also allows an application to make use of the entire cache if needed.A second advantage of having a shared cache in a CMP design has to do with the cache coherency. In parallel applications, different threads share the same data and keep a local copy of this data in their cache. With multiple processors, it is possible for one processor to change the data, leaving another processor's cache with outdated data. Cache coherency protocol monitors changes to data and ensures that all processor caches have the most recent data. When the parallel application executes on the same physical chip, the cache coherency circuitry can operate at the speed of on-chip communications, rather than having to use the much slower between-chip communication, as is required with discrete processors on separate chips. These coherence protocols are simpler to design with a unified and shared level of cache onchip.Due to the advantages that multicore architectures offer, chip vendors use CMP architectures in current high performance, network, real-time and embedded systems. Several of these commercial processors have a level of the cache hierarchy shared by different cores. For example, the Sun UltraSPARC T2 has a 16-way 4MB L2 cache shared by 8 cores each one up to 8-way SMT. Other processors like the Intel Core 2 family also share up to a 12MB 24-way L2 cache. In contrast, the AMD K10 family has a private L2 cache per core and a shared L3 cache, with up to a 6MB 64-way L3 cache.As the long-term trend of increasing integration continues, the number of cores per chip is also projected to increase with each successive technology generation. Some significant studies have shown that processors with hundreds of cores per chip will appear in the market in the following years. The manycore era has already begun. Although this era provides many opportunities, it also presents many challenges. In particular, higher hardware resource sharing among concurrently executing threads can cause individual thread's performance to become unpredictable and might lead to violations of the individual applications' performance requirements. Current resource management mechanisms and policies are no longer adequate for future multicore systems.Some applications present low re-use of their data and pollute caches with data streams, such as multimedia, communications or streaming applications, or have many compulsory misses that cannot be solved by assigning more cache space to the application. Traditional eviction policies such as Least Recently Used (LRU), pseudo LRU or random are demand-driven, that is, they tend to give more space to the application that has more accesses to the cache hierarchy.When no direct control over shared resources is exercised (the last level cache in this case), it is possible that a particular thread allocates most of the shared resources, degrading other threads performance. As a consequence, high resource sharing and resource utilization can cause systems to become unstable and violate individual applications' requirements. If we want to provide a Quality of Service (QoS) to applications, we need to enhance the control over shared resources and enrich the collaboration between the OS and the architecture.In this thesis, we propose software and hardware mechanisms to improve cache sharing in CMP architectures. We make use of a holistic approach, coordinating targets of software and hardware to improve system aggregate performance and provide QoS to applications. We make use of explicit resource allocation techniques to control the shared cache in a CMP architecture, with resource allocation targets driven by hardware and software mechanisms.The main contributions of this thesis are the following:- We have characterized different single- and multithreaded applications and classified workloads with a systematic method to better understand and explain the cache sharing effects on a CMP architecture. We have made a special effort in studying previous cache partitioning techniques for CMP architectures, in order to acquire the insight to propose improved mechanisms.- In CMP architectures with out-of-order processors, cache misses can be served in parallel and share the miss penalty to access main memory. We take this fact into account to propose new cache partitioning algorithms guided by the memory-level parallelism (MLP) of each application. With these algorithms, the system performance is improved (in terms of throughput and fairness) without significantly increasing the hardware required by previous proposals.- Driving cache partition decisions with indirect indicators of performance such as misses, MLP or data re-use may lead to suboptimal cache partitions. Ideally, the appropriate metric to drive cache partitions should be the target metric to optimize, which is normally related to IPC. Thus, we have developed a hardware mechanism, OPACU, which is able to obtain at run-time accurate predictions of the performance of an application when running with different cache assignments.- Using performance predictions, we have introduced a new framework to manage shared caches in CMP architectures, FlexDCP, which allows the OS to optimize different IPC-related target metrics like throughput or fairness and provide QoS to applications. FlexDCP allows an enhanced coordination between the hardware and the software layers, which leads to improved system performance and flexibility.- Next, we have made use of performance estimations to reduce the load imbalance problem in parallel applications. We have built a run-time mechanism that detects parallel applications sensitive to cache allocation and, in these situations, the load imbalance is reduced by assigning more cache space to the slowest threads. This mechanism, helps reducing the long optimization time in terms of man-years of effort devoted to large-scale parallel applications.- Finally, we have stated the main characteristics that future multicore processors with thousands of cores should have. An enhanced coordination between the software and hardware layers has been proposed to better manage the shared resources in these architectures

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Tesis Doctorals en Xarxa

Secretaría de Estado de Cultura

Virtualización y green IT

Author: Pérez Juan Marcos
Publication venue
Publication date: 07/08/2015
Field of study

Como humanos, nos apasionan los avances y la amplia adopción de las Tecnologías Informáticas. Las mismas han aportado enormes beneficios y han mejorado la calidad de vida de casi toda la humanidad, pero también han venido contribuyendo negativamente al medio ambiente sin que la mayoría de las personas se percaten de ello. Las computadoras y otras infraestructuras de tecnologías informáticas, consumen importantes cantidades de electricidad agregando una pesada carga sobre nuestras redes eléctricas y favoreciendo las emisiones de gases de efecto invernadero. El hardware de estas tecnologías plantea graves problemas ambientales durante su producción y su disposición. Estamos obligados a reducir o eliminar en lo posible el impacto medioambiental de las Tecnologías Informáticas para ayudar a crear un entorno más sostenible. Green IT se refiere al uso eficiente de los recursos informáticos minimizando el impacto ambiental, maximizando su viabilidad económica y asegurando deberes sociales. En esta tesina el objetivo general es realizar una investigación sobre Green IT y su relación con las diferentes Tecnologías Verdes que existen en la actualidad tales como: Cloud Computing, Grid Computing y Virtualización, con el propósito de identificar el ahorro de los diversos tipos de recursos y el impacto positivo en el medio ambiente.Facultad de Informátic

Virtualización y green IT

Author: Pérez Juan Marcos
Publication venue
Publication date: 01/12/2011
Field of study

Virtualización y green IT

Author: Pérez Juan Marcos
Publication venue
Publication date: 07/08/2015
Field of study

Servicio de Difusión de la Creación Intelectual