Search CORE

59 research outputs found

Evaluación formativa con feedback rápido usando mandos interactivos

Author: Llosa Espuny José Francisco
Álvarez Martínez Carlos
Publication venue: Universidade de Santiago de Compostela. Escola Técnica Superior d'Enxeñaría
Publication date: 07/07/2010
Field of study

Numerosos estudios han demostrado que el uso del feedback como herramienta docente resulta beneficioso para el aprendizaje. Para que este feedback sea útil debe ser rápido, es decir, llegar al alumno poco después de haber realizado su tarea.En este artículo presentamos resultados de una prueba piloto en la que se ha usado una herramienta para proporcionar feedback basada en transparencias y mandos interactivos. Con esta herramienta los alumnos contestan todos a la vez a preguntas realizadas en clase. La herramienta recopila los datos y ofrece estadísticas inmediatas sobre las respuestas de los alumnos. Esto permite al profesor detectar los errores comunes y enfatizar aquellos aspectos más deficitarios para los alumnos, proporcionando feedback inmediato. En este trabajo explicamos las características de la herramienta, el entorno de trabajo, los beneficios aportados, las deficiencias observadas y las incidencias producidas. Finalmente proporcionamos una evaluación estadística de la prueba, tanto a través de encuestas a los estudiantes como analizando sus resultados académicos.Peer Reviewe

UPCommons. Portal del coneixement obert de la UPC

A case for resource-conscious out-of-order processors

Author: Cristal Kestelman Adrián
Llosa Espuny José Francisco
Martínez José F
Valero Cortés Mateo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2003
Field of study

Modern out-of-order processors tolerate long-latency memory operations by supporting a large number of in-flight instructions. This is achieved in part through proper sizing of critical resources, such as register files or instruction queues. In light of the increasing gap between processor speed and memory latency, tolerating upcoming latencies in this way would require impractical sizes of such critical resources.To tackle this scalability problem, we make a case for resource-conscious out-of-order processors. We present quantitative evidence that critical resources are increasingly underutilized in these processors. We advocate that better use of such resources should be a priority in future research in processor architectures.Peer ReviewedPostprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Hierarchical clustered register file organization for VLIW processors

Author: Ayguadé Parra Eduard
Llosa Espuny José Francisco
Valero Cortés Mateo
Zalamea León Francisco Javier
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2003
Field of study

Technology projections indicate that wire delays will become one of the biggest constraints in future microprocessor designs. To avoid long wire delays and therefore long cycle times, processor cores must be partitioned into components so that most of the communication is done locally. In this paper, we propose a novel register file organization for VLIW cores that combines clustering with a hierarchical register file organization. Functional units are organized in clusters, each one with a local first level register file. The local register files are connected to a global second level register file, which provides access to memory. All intercluster communications are done through the second level register file. This paper also proposes MIRS-HC, a novel modulo scheduling technique that simultaneously performs instruction scheduling, cluster selection, inserts communication operations, performs register allocation and spill insertion for the proposed organization. The results show that although more cycles are required to execute applications, the execution time is reduced due to a shorter cycle time. In addition, the combination of clustering and hierarchy provides a larger design exploration space that trades-off performance and technology requirements.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Modulo scheduling with integrated register spilling for clustered VLIW architectures

Author: Ayguadé Parra Eduard
Llosa Espuny José Francisco
Valero Cortés Mateo
Zalamea León Francisco Javier
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2001
Field of study

Clustering is a technique to decentralize the design of future wide issue VLIW cores and enable them to meet the technology constraints in terms of cycle time, area and power dissipation. In a clustered design, registers and functional units are grouped in clusters so that new instructions are needed to move data between them. New aggressive instruction scheduling techniques are required to minimize the negative effect of resource clustering and delays in moving data around. In this paper we present a novel software pipelining technique that performs instruction scheduling with reduced register requirements, register allocation, register spilling and inter-cluster communication in a single step. The algorithm uses limited backtracking to reconsider previously taken decisions. This backtracking provides the algorithm with additional possibilities for obtaining high throughput schedules with low spill code requirements for clustered architectures. We show that the proposed approach outperforms previously proposed techniques and that it is very scalable independently of the number of clusters, the number of communication buses and communication latency. The paper also includes an exploration of some parameters in the design of future clustered VLIW cores.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Performance evaluation of CSMT for VLIW processors

Author: Gupta Manoj
Llosa Espuny José Francisco
Sánchez Carracedo Fermín
Publication venue
Publication date: 01/07/2007
Field of study

Clustered VLIW embedded processors have become widespread due to benefits of simple hardware and low power. However, while some applications exhibit large amounts of instruction level parallelism (ILP) and benefit from very wide machines, others have little ILP, which wastes precious resources in wide processors. Simultaneous MultiThreading (SMT) is a well known technique that improves resource utilization by exploiting thread level parallelism at the instruction grain level. However, implementing SMT for VLIWs requires complex structures. CSMT (Clusterlevel Simultaneous MultiThreading) allows some degree of SMT in clustered VLIW processors. CSMT considers the set of operations that execute simultaneously in a given cluster (named bundle)as the assignment unit. All bundles belonging to a VLIW instruction from a given thread are issued simultaneously. To minimize cluster conflicts between threads, a very simple hardwarebased cluster renaming mechanism is proposed. The experimental results show that CSMT significantly improves ILP when compared with other multithreading approaches suited for VLIW. For instance, with 4 threads CSMT shows an average speedup of 113% over a single-thread VLIW architecture and 36% over Interleaved MultiThreading (IMT). In some cases, speedup can be as high as 228% over single thread architecture and 97% over IMT. Also CSMT for a 2-thread processor, achieves almost the same performance as IMT for a 4-thread processor and also outperforms it in some cases.Peer ReviewedPostprint (author’s final draft

UPCommons. Portal del coneixement obert de la UPC

Near-optimal loop tiling by means of cache miss equations and genetic algorithms

Author: Abella Ferrer Jaume
González Colás Antonio María
Llosa Espuny José Francisco
Vera Rivera Francisco Javier
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2002
Field of study

The effectiveness of the memory hierarchy is critical for the performance of current processors. The performance of the memory hierarchy can be improved by means of program transformations such as loop tiling, which is a code transformation targeted to reduce capacity misses. This paper presents a novel systematic approach to perform near-optimal loop tiling based on an accurate data locality analysis (cache miss equations) and a powerful technique to search the solution space that is based on a genetic algorithm. The results show that this approach can remove practically all capacity misses for all considered benchmarks. The reduction of replacement misses results in a decrease of the miss ratio that can be as significant as a factor of 7 for the matrix multiply kernel.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Hybrid multithreading for VLIW processors

Author: Gupta Manoj
Llosa Espuny José Francisco
Sánchez Carracedo Fermín
Publication venue
Publication date: 01/10/2009
Field of study

© ACM, 2009. This is the author's version of the work: http://doi.acm.org/10.1145/1629395.1629403Several multithreading techniques have been proposed to reduce resource underutilization in Very Long Instruction Word (VLIW) processors. Simultaneous MultiThreading (SMT) is a popular technique that improves processor performance by issuing multiple instructions from di erent threads. In VLIW processors, SMT requires extra hardware to merge instructions from di erent threads. The complexity of this hardware increases substantially with the number of threads. On the other hand, techniques like Interleaved MultiThreading (IMT) do not need any merging hardware, and support a larger number of threads at reasonable cost. In this paper, we propose Hybrid MultiThreading (HMT), a technique that at each cycle merges instructions from only a subset of threads. HMT supports a reasonable number of threads with a low merging hardware cost. For instance, it is possible to support 8 hardware threads with a merging hardware for only 2 threads. The experimental results show that using HMT improves the multithreading performance significantly. Further, supporting 8 hardware threads with HMT but using a 4-thread merging hardware achieves a performance similar to merging 8 threads simultaneously with a significantly lower merging hardware cost.Peer ReviewedPostprint (author’s final draft

UPCommons. Portal del coneixement obert de la UPC

Hypernode reduction modulo scheduling

Author: Ayguadé Parra Eduard
González Colás Antonio María
Llosa Espuny José Francisco
Valero Cortés Mateo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1995
Field of study

Software pipelining is a loop scheduling technique that extracts parallelism from loops by overlapping the execution of several consecutive iterations. Most prior scheduling research has focused on achieving minimum execution time, without regarding register requirements. Most strategies tend to stretch operand lifetimes because they schedule some operations too early or too late. The paper presents a novel strategy that simultaneously schedules some operations late and other operations early, minimizing all the stretchable dependencies and therefore reducing the registers required by the loop. The key of this strategy is a pre-ordering that selects the order in which the operations will be scheduled. The results show that the method described in this paper performs better than other heuristic methods and almost as well as a linear programming method but requiring much less time to produce the schedules.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Non-consistent dual register files to reduce register pressure

Author: Ayguadé Parra Eduard
Llosa Espuny José Francisco
Valero Cortés Mateo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1995
Field of study

The continuous grow on instruction level parallelism offered by microprocessors requires a large register file and a large number of ports to access it. This paper presents the non-consistent dual register file, an alternative implementation and management of the register file. Non-consistent dual register files support the bandwidth demands and the high register requirements, penalizing neither access time nor implementation cost. The proposal is evaluated for software pipelined loops and compared against a unified register file. Empirical results show improvements on performance and a noticeable reduction of the density of memory traffic due to a reduction of the spill code. The spill code can in general increase the minimum initiation interval and decrease loop performance. Additional improvements can be obtained when the operations are scheduled having in mind the register file organization proposed.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Swing modulo scheduling: a lifetime-sensitive approach

Author: Ayguadé Parra Eduard
González Colás Antonio María
Llosa Espuny José Francisco
Valero Cortés Mateo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1996
Field of study

This paper presents a novel software pipelining approach, which is called Swing Modulo Scheduling (SMS). It generates schedules that are near optimal in terms of initiation interval, register requirements and stage count. Swing Modulo Scheduling is an heuristic approach that has a low computational cost. The paper describes the technique and evaluates it for the Perfect Club benchmark suite. SMS is compared with other heuristic methods showing that it outperforms them in terms of the quality of the obtained schedules and compilation time. SMS is also compared with an integer linear programming approach that generates optimum schedules but with a huge computational cost, which makes it feasible only for very small loops. For a set of small loops, SMS obtained the optimum initiation interval in all the cases and its schedules required only 5% more registers and a 1% higher stage count than the optimumPeer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC