Search CORE

394 research outputs found

On the efficiency of reductions in µ-SIMD media extensions

Author: Corbal San Adrián Jesús
Espasa Sans Roger
Valero Cortés Mateo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2001
Field of study

Many important multimedia applications contain a significant fraction of reduction operations. Although, in general, multimedia applications are characterized for having high amounts of Data Level Parallelism, reductions and accumulations are difficult to parallelize and show a poor tolerance to increases in the latency of the instructions. This is specially significant for µ-SIMD extensions such as MMX or AltiVec. To overcome the problem of reductions in µ-SIMD ISAs, designers tend to include more and more complex instructions able to deal with the most common forms of reductions in multimedia. As long as the number of processor pipeline stages grows, the number of cycles needed to execute these multimedia instructions increases with every processor generation, severely compromising performance. The paper presents an in-depth discussion of how reductions/accumulations are performed in current µ-SIMD architectures and evaluates the performance trade-offs for near-future highly aggressive superscalar processors with three different styles of µ-SIMD extensions. We compare a MMX-like alternative to a MDMX-like extension that has packed accumulators to attack the reduction problem, and we also compare it to MOM, a matrix register ISA. We show that while packed accumulators present several advantages, they introduce artificial recurrences that severely degrade performance for processors with high number of registers and long latency operations. On the other hand, the paper demonstrates that longer SIMD media extensions such as MOM can take great advantage of accumulators by exploiting the associative parallelism implicit in reductions.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Fuzzy memoization for floating-point multimedia applications

Author: Corbal San Adrián Jesús
Valero Cortés Mateo
Álvarez Martínez Carlos
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2005
Field of study

Instruction memoization is a promising technique to reduce the power consumption and increase the performance of future low-end/mobile multimedia systems. Power and performance efficiency can be improved by reusing instances of an already executed operation. Unfortunately, this technique may not always be worth the effort due to the power consumption and area impact of the tables required to leverage an adequate level of reuse. In this paper, we introduce and evaluate a novel way of understanding multimedia floating-point operations based on the fuzzy computation paradigm: performance and power consumption can be improved at the cost of small precision losses in computation. By exploiting this implicit characteristic of multimedia applications, we propose a new technique called tolerant memoization. This technique expands the capabilities of classic memoization by associating entries with similar inputs to the same output. We evaluate this new technique by measuring the effect of tolerant memoization for floating-point operations in a low-power multimedia processor and discuss the trade-offs between performance and quality of the media outputs. We report energy improvements of 12 percent for a set of key multimedia applications with small LUT of 6 Kbytes, compared to 3 percent obtained using previously proposed techniques.Peer ReviewedPostprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Three-dimensional memory vectorization for high bandwidth media memory systems

Author: Corbal San Adrián Jesús
Espasa Sans Roger
Valero Cortés Mateo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2002
Field of study

Vector processors have good performance, cost and adaptability when targeting multimedia applications. However, for a significant number of media programs, conventional memory configurations fail to deliver enough memory references per cycle to feed the SIMD functional units. This paper addresses the problem of the memory bandwidth. We propose a novel mechanism suitable for 2-dimensional vector architectures and targeted at providing high effective bandwidth for SIMD memory instructions. The basis of this mechanism is the extension of the scope of vectorization at the memory level, so that 3-dimensional memory patterns can be fetched into a second-level register file. By fetching long blocks of data and by reusing 2-dimensional memory streams at this second-level register file, we obtain a significant increase in the effective memory bandwidth. As side benefits, the new 3-dimensional load instructions provide a high robustness to memory latency and a significant reduction of the cache activity, thus reducing power and energy requirements. At the investment of a 50% more area than a regular SIMD register file, we have measured and average speed-up of 13% and the potential for power savings in the L2 cache of a 30%.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Command vector memory systems: high performance at low cost

Author: Corbal San Adrián Jesús
Espasa Sans Roger
Valero Cortés Mateo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1998
Field of study

The focus of this paper is on designing both a low cost and high performance, high bandwidth vector memory system that takes advantage of modern commodity SDRAM memory chips. To successfully extract the full bandwidth from SDRAM parts, we propose a new memory system organization based on sending commands to the memory system as opposed to sending individual addresses. A command specifies, in a few bytes, a request for multiple independent memory words. A command is similar to a burst found in DRAM memories, but does not require the memory words to be consecutive. The command is sent to all sections of the memory array simultaneously, thus not requiring a crossbar in the proper sense. Our simulations show that this command based memory system can improve performance over a traditional SDRAM-based memory system by factors that range between 1.15 up to 1.54. Moreover, in many cases, the command memory system outperforms even the best SRAM memory system under consideration. Overall the command based memory system achieves similar or better results than a 10 ns SRAM memory system (a) using fewer banks and (b) using memory devices that are between 15 to 60 times cheaper.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Exploiting a new level of DLP in multimedia applications

Author: Corbal San Adrián Jesús
Espasa Sans Roger
Valero Cortés Mateo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1999
Field of study

This paper proposes and evaluates MOM: a novel ISA paradigm targeted at multimedia applications. By fusing conventional vector ISA approaches together with more recent SIMD-like (Single Instruction Multiple Data) ISAs (such as MMX), we have developed a new matrix oriented ISA which efficiently deals with the small matrix structures typically found in multimedia applications. MOM exploits a level of DLP not reachable by neither conventional vector ISAs nor SIMD-like media ISA extensions. Our results show that MOM provides a factor of 1.3x to 4x performance improvement when compared with two different multimedia extensions (MMX and MDMX) on several kernels, which translates into up to a 50% of performance gain when measuring full applications (20% in average). Furthermore, the streaming nature of MOM provides additional advantages for executing multimedia applications, such as a very low fetch pressure or a high tolerance to memory latency, making MOM an ideal candidate for the embedded domain.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Precise localization for aerial inspection using augmented reality markers

Author: Amor Martínez Adrián
Moreno-Noguer Francesc
Ruiz García Alberto
Sanfeliu Cortés Alberto
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

The final publication is available at link.springer.comThis chapter is devoted to explaining a method for precise localization using augmented reality markers. This method can achieve precision of less of 5 mm in position at a distance of 0.7 m, using a visual mark of 17 mm × 17 mm, and it can be used by controller when the aerial robot is doing a manipulation task. The localization method is based on optimizing the alignment of deformable contours from textureless images working from the raw vertexes of the observed contour. The algorithm optimizes the alignment of the XOR area computed by means of computer graphics clipping techniques. The method can run at 25 frames per second.Peer ReviewedPostprint (author's final draft

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Digital.CSIC

A case for resource-conscious out-of-order processors

Author: Cristal Kestelman Adrián
Llosa Espuny José Francisco
Martínez José F
Valero Cortés Mateo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2003
Field of study

Modern out-of-order processors tolerate long-latency memory operations by supporting a large number of in-flight instructions. This is achieved in part through proper sizing of critical resources, such as register files or instruction queues. In light of the increasing gap between processor speed and memory latency, tolerating upcoming latencies in this way would require impractical sizes of such critical resources.To tackle this scalability problem, we make a case for resource-conscious out-of-order processors. We present quantitative evidence that critical resources are increasingly underutilized in these processors. We advocate that better use of such resources should be a priority in future research in processor architectures.Peer ReviewedPostprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Planar PØP: feature-less pose estimation with applications in UAV localization

Author: Amor Martínez Adrián
Herrero Cotarelo Fernando
Ruiz Alberto
Sanfeliu Cortés Alberto
Santamaria Navarro Àngel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

© 20xx IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.We present a featureless pose estimation method that, in contrast to current Perspective-n-Point (PnP) approaches, it does not require n point correspondences to obtain the camera pose, allowing for pose estimation from natural shapes that do not necessarily have distinguished features like corners or intersecting edges. Instead of using n correspondences (e.g. extracted with a feature detector) we will use the raw polygonal representation of the observed shape and directly estimate the pose in the pose-space of the camera. This method compared with a general PnP method, does not require n point correspondences neither a priori knowledge of the object model (except the scale), which is registered with a picture taken from a known robot pose. Moreover, we achieve higher precision because all the information of the shape contour is used to minimize the area between the projected and the observed shape contours. To emphasize the non-use of n point correspondences between the projected template and observed contour shape, we call the method Planar PØP. The method is shown both in simulation and in a real application consisting on a UAV localization where comparisons with a precise ground-truth are provided.Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

Initial results on fuzzy floating point computation for multimedia processors

Author: Corbal San Adrián Jesús
Salamí San Juan Esther
Valero Cortés Mateo
Álvarez Martínez Carlos
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2002
Field of study

During the recent years, the market of mid/low-end portable systems such as PDAs or mobile digital phones have experimented a revolution in both selling volume and features as handheld devices incorporate Multimedia applications. This fact brings to an increase in the computational demands of the devices, while still having the limitation of power (and energy) consumption. Instruction memoization is a promising technique to help alleviate the problem of power consumption of expensive functional units such as the floating-point one. Unfortunately, this technique could be energy-inefficient for low-end systems due to the additional power consumption of the relatively big tables required. In this paper we present a novel way of understanding multimedia floating point operations based on the fuzzy computation paradigm: losses in the computation precision may exchange performance for negligible errors in the output. Exploiting the implicit characteristics of media FP computation, we propose a new technique called fuzzy memoization. Fuzzy memoization expands the capabilities of classic memoization by attaching entries with similar inputs to the same output. We present a case of study for a SH4 like processor and report good performance and power-delay improvements with feasible hardware requirements.Peer ReviewedPostprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Comparación entre modelos de simulación para la interacción del corazón derecho e izquierdo bajo la influencia de patologías cardíacas

Author: Cortés Ruiz Adrián
Murillo Castarlenas Javier Antonio
Publication venue: 'Universidad de Zaragoza'
Publication date: 01/01/2018
Field of study

En este trabajo se va a desarrollar un modelo de simulación del sistema cardiopulmonar, que sirva como herramienta para poder estudiar el comportamiento fisiológico, realizar un análisis comparativo de tres modelos de elastancias variables en el tiempo (exponencial, coseno, y exponencial-coseno) y simular algunas patologías. El objetivo de este trabajo consiste en seleccionar cuál de los tres modelos es más adecuado para poder implementarlo en otros más complejos, teniendo en cuenta que la función de elastancia óptima tiene que tener un equilibrio entre el número de parámetros utilizados, y el realismo con el que simula el comportamiento fisiológico del sistema cardiopulmonar; Y en poder entender las consecuencias que provocan sobre el organismo, cada una de las enfermedades estudiadas. Después de realizar el análisis comparativo de elastancias, se ha llegado a la conclusión que para elegir un tipo de elastancia que pueda ser utilizada en otros modelos hemodinámicos más complejos, tiene que existir un equilibrio, entre el número de parámetros utilizados en el modelo de elastancia, y el realismo con el que simula l el comportamiento fisiológico del sistema circulatorio cardiopulmonar. El modelo de elastancia variable en el tiempo que mejor cumple este criterio, ha sido el modelo de elastancia cardíaca exponencial-coseno. Los resultados obtenidos al efectuar la simulación del sistema circulatorio cardiopulmonar sano y con anomalías, indican que nuestro modelo simula de manera realista la fisiología de manera relativamente simple, pero presenta limitaciones procedentes de las hipótesis realizadas para construir el modelo agregado

Repositorio Universidad de Zaragoza