Search CORE

10 research outputs found

Efficient resources assignment schemes for clustered multithreaded processors

Author: Fernando Latorre
González Colás Antonio María
González González José
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2008
Field of study

New feature sizes provide larger number of transistors per chip that architects could use in order to further exploit instruction level parallelism. However, these technologies bring also new challenges that complicate conventional monolithic processor designs. On the one hand, exploiting instruction level parallelism is leading us to diminishing returns and therefore exploiting other sources of parallelism like thread level parallelism is needed in order to keep raising performance with a reasonable hardware complexity. On the other hand, clustering architectures have been widely studied in order to reduce the inherent complexity of current monolithic processors. This paper studies the synergies and trade-offs between two concepts, clustering and simultaneous multithreading (SMT), in order to understand the reasons why conventional SMT resource assignment schemes are not so effective in clustered processors. These trade-offs are used to propose a novel resource assignment scheme that gets and average speed up of 17.6% versus Icount improving fairness in 24%.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Why area might reduce power in nanoscale CMOS

Author: Beckett P
Publication venue: IEEE (Piscataway, NJ)
Publication date: 01/01/2005
Field of study

In this paper we explore the relationship between power and area. By exploiting parallelism (and thus using more area) one can reduce the switching frequency allowing a reduction in VDD which results in a reduction in power. Under a scaling regime which allows threshold voltage to increase as VDD decreases we find that dynamic and subthreshold power loss in CMOS exhibit a dependence on area proportional to A(s-3)s/ while gate leakage power ? A(s-6)s/, and short circuit power ? A(s-8)s/. Thus, with the large number of devices at our disposal we can exploit techniques such as spatial computing, tailoring the program directly to the hardware, to overcome the negative effects of scaling. The value of s describes the effectiveness of the technique for a particular circuit and/or algorithm - for circuits that exhibit a value of s =3, power will be a constant or reducing function of area. We briefly speculate on how s might be influenced by a move to nanoscale technology

RMIT Research Repository

Towards adaptive balanced computing (ABC) using reconfigurable functional caches (RFCs)

Author: Kim Hue-Sung
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2001
Field of study

The general-purpose computing processor performs a wide range of functions. Although the performance of general-purpose processors has been steadily increasing, certain software technologies like multimedia and digital signal processing applications demand ever more computing power. Reconfigurable computing has emerged to combine the versatility of general-purpose processors with the customization ability of ASICs. The basic premise of reconfigurability is to provide better performance and higher computing density than fixed configuration processors. Most of the research in reconfigurable computing is dedicated to on-chip functional logic. If computing resources are adaptable to the computing requirement, the maximum performance can be achieved. To overcome the gap between processor and memory technology, the size of on-chip cache memory has been consistently increasing. The larger cache memory capacity, though beneficial in general, does not guarantee a higher performance for all the applications as they may not utilize all of the cache efficiently. To utilize on-chip resources effectively and to accelerate the performance of multimedia applications specifically, we propose a new architecture---Adaptive Balanced Computing (ABC). ABC uses dynamic resource configuration of on-chip cache memory by integrating Reconfigurable Functional Caches (RFC). RFC can work as a conventional cache or as a specialized computing unit when necessary. In order to convert a cache memory to a computing unit, we include additional logic to embed multi-bit output LUTs into the cache structure. We add the reconfigurability of cache memory to a conventional processor with minimal modification to the load/store microarchitecture and with minimal compiler assistance. ABC architecture utilizes resources more efficiently by reconfiguring the cache memory to computing units dynamically. The area penalty for this reconfiguration is about 50--60% of the memory cell cache array-only area with faster cache access time. In a base array cache (parallel decoding caches), the area penalty is 10--20% of the data array with 1--2% increase in the cache access time. However, we save 27% for FIR and 44% for DCT/IDCT in area with respect to memory cell array cache and about 80% for both applications with respect to base array cache if we were to implement all these units separately (such as ASICs). The simulations with multimedia and DSP applications (DCT/IDCT and FIR/IIR) show that the resource configuration with the RFC speedups ranging from 1.04X to 3.94X in overall applications and from 2.61X to 27.4X in the core computations. The simulations with various parameters indicate that the impact of reconfiguration can be minimized if an appropriate cache organization is selected

Digital Repository @ Iowa State University (ISU)

Dynamic instruction scheduling and data forwarding in asynchronous superscalar processors

Author: Mullins Robert D.
Publication venue: The University of Edinburgh
Publication date: 01/01/2001
Field of study

Edinburgh Research Archive

Recommended from our members

Process Variation in Silicon Photonic Devices

Author: Chen Xi
Publication venue: University of Colorado Boulder
Publication date: 01/01/2013
Field of study

The high index contrast of the silicon - silicon dioxide material system allows for dense integration of optical waveguide devices. Possible applications include intra-chip, inter-chip and fiber optic interconnection systems. Optical intra-chip interconnections become more desirable as the complementary metal-oxide-semiconductor (CMOS) circuit density puts ever tighter constraint on on-chip interconnection performance. Board level, rack level and rack-to-rack data center interconnections are ever more constrained by space and bandwidth to which silicon photonic modules may offer an improvement. As fiber optic systems serve smaller and smaller area systems, integrated switching systems that are enabled by silicon photonic devices involving wavelength division multiplexing (WDM) become more desirable. In this thesis, we firstly take a brief review of the development history of information technology, optical communication and silicon photonics. Secondly we examine the optical performance of an array of photonic devices which are the basic building blocks for silicon photonic circuits. Thirdly we turn the attention to the fabrication related issues. Silicon photonic circuits are prone to the thermal and fabrication induced process variations. We discover the process variation exhibits a “random walk” pattern with spatial extent at wafer scale. Fourthly we propose a simple method to extract fundamental parameters out of fabricated silicon photonic devices. Based on the systemic wafer-scale measurement results, our method combines the advantage of both numerical simulation and simple analytical modeling techniques. Lastly, we propose a variation-aware on-chip interconnect design for multi-core processors. This design adapts to on-chip thermal and process variation effects, pointing to the improvement of wafer-scale fabrication yield and interconnect network communication throughput

CU Scholar Institutional Repository

Instruction scheduling in micronet-based asynchronous ILP processors

Author: Sotelo-Salazar Salvador
Publication venue: The University of Edinburgh
Publication date: 01/01/2003
Field of study

Edinburgh Research Archive

Algorithmes exacts et approchés pour des problèmes d'ordonnancement et de placement

Author: BAMPIS Evripidis
KACEM Fadi
Publication venue
Publication date: 01/01/2012
Field of study

Dans cette thèse, nous nous intéressons à la résolution de quelques problèmes d'optimisation combinatoires que nous avons choisi de traiter en deux volets. Dans un premier temps, nous étudions des problèmes d'optimisation issus de l'ordonnancement d'un ensemble de tâches sur des machines de calcul et où on cherche à minimiser l'énergie totale consommée par ces machines tout en préservant une qualité de service acceptable. Dans un deuxième temps, nous traitons deux problèmes d'optimisation classiques à savoir un problème d'ordonnancement dans une architecture de machines parallèles avec des temps de communication, et un problème de placement de données dans des graphes modélisant des réseaux pair-à-pair et visant à minimiser le coût total d'accès aux données.In this thesis, we focus on solving some combinatorial optimization problems that we have chosen to study in two parts. Firstly, we study optimization problems issued from scheduling a set of tasks on computing machines where we seek to minimize the total energy consumed by these machines while maintaining acceptable quality of service. In a second step, we discuss two optimization problems, namely a classical scheduling problem in architecture of parallel machines with communication delays, and a problem of placing data in graphs that represent peer-to-peer networks and the goal is to minimize the total cost of data access.EVRY-Bib. électronique (912289901) / SudocSudocFranceF

OpenGrey Repository

Conception d’une architecture robuste pour l’acquisition de grandeurs physiques dans un système aéronautique critique : application à la mesure de température, pression, couple, et vitesse d’une turbomachine

Author: Martin Romain
Publication venue: HAL CCSD
Publication date: 03/04/2015
Field of study

The acquisition of physical parameters as such as temperature, pressure, torque, and speed are necessary to flight critical systems in order to reach and ensure safety and availability required. Consequently, it requires implementing high technologies and techniques which are able to work in rugged environments.The aim of our work is to design a new architecture for sensor acquisition systems in order to be integrated onto a flight critical system. The goal of the architecture is to ensure data integrity, system's availability and safety relative to airborne critical systems. The solution adds the fault tolerance ability to the signal conditioning. Consequently, we implement additional functionalities, as such as mathematical model of the signal conditioning, in order to make the acquisition system more intelligent.Our research work is partially based on technical specifications from SYRENA project, which is a typical example of flight critical systems, which is the main thematic of our purpose.L’acquisition de paramètres physiques tels que la température, la pression, le couple et la vitesse est nécessaire aux systèmes aéronautiques critiques afin d’atteindre et d’assurer les performances requises de disponibilité et de sécurité de fonctionnement. L’acquisition de ces paramètres physiques nécessite donc la mise en oeuvre de technologies et de techniques hautement éprouvées pouvant supporter les conditions de fonctionnement sévères.L'objectif des travaux présentés dans ce mémoire est de proposer une nouvelle architecture de chaîne d'acquisition de grandeurs physiques pour être intégrée à un système aéronautique critique. Le but de cette architecture est d'améliorer l'intégrité des données mesurées tout en maintenant leur disponibilité et le niveau de sûreté de fonctionnement propre aux systèmes aéronautiques de haute criticité. La solution se déploie sous la forme d'une amélioration de la tolérance aux défauts de la chaîne de traitement du signal issu du capteur. Pour ce faire, nous intégrons des fonctions supplémentaires, dont le modèle mathématique de la chaîne d'acquisition, rendant ainsi le système plus intelligent.Dans le cadre de nos travaux de recherche, nous nous appuyons sur les spécifications techniques d'un projet industriel typique des systèmes aéronautiques critiques, qui est le coeur de notre thématique principale

Thèses en Ligne

A hardware-software codesign framework for cellular computing

Author: Mudry Pierre-André
Publication venue: Lausanne, EPFL
Publication date: 29/01/2009
Field of study

Until recently, the ever-increasing demand of computing power has been met on one hand by increasing the operating frequency of processors and on the other hand by designing architectures capable of exploiting parallelism at the instruction level through hardware mechanisms such as super-scalar execution. However, both these approaches seem to have reached a plateau, mainly due to issues related to design complexity and cost-effectiveness. To face the stabilization of performance of single-threaded processors, the current trend in processor design seems to favor a switch to coarser-grain parallelization, typically at the thread level. In other words, high computational power is achieved not only by a single, very fast and very complex processor, but through the parallel operation of several processors, each executing a different thread. Extrapolating this trend to take into account the vast amount of on-chip hardware resources that will be available in the next few decades (either through further shrinkage of silicon fabrication processes or by the introduction of molecular-scale devices), together with the predicted features of such devices (e.g., the impossibility of global synchronization or higher failure rates), it seems reasonable to foretell that current design techniques will not be able to cope with the requirements of next-generation electronic devices and that novel design tools and programming methods will have to be devised. A tempting source of inspiration to solve the problems implied by a massively parallel organization and inherently error-prone substrates is biology. In fact, living beings possess characteristics, such as robustness to damage and self-organization, which were shown in previous research as interesting to be implemented in hardware. For instance, it was possible to realize relatively simple systems, such as a self-repairing watch. Overall, these bio-inspired approaches seem very promising but their interest for a wider audience is problematic because their heavily hardware-oriented designs lack some of the flexibility achievable with a general purpose processor. In the context of this thesis, we will introduce a processor-grade processing element at the heart of a bio-inspired hardware system. This processor, based on a single-instruction, features some key properties that allow it to maintain the versatility required by the implementation of bio-inspired mechanisms and to realize general computation. We will also demonstrate that the flexibility of such a processor enables it to be evolved so it can be tailored to different types of applications. In the second half of this thesis, we will analyze how the implementation of a large number of these processors can be used on a hardware platform to explore various bio-inspired mechanisms. Based on an extensible platform of many FPGAs, configured as a networked structure of processors, the hardware part of this computing framework is backed by an open library of software components that provides primitives for efficient inter-processor communication and distributed computation. We will show that this dual software–hardware approach allows a very quick exploration of different ways to solve computational problems using bio-inspired techniques. In addition, we also show that the flexibility of our approach allows it to exploit replication as a solution to issues that concern standard embedded applications

Infoscience - École polytechnique fédérale de Lausanne

Ein Echtzeitparallelrechner zur Rezentralisierung von Steuergeräten im Automobil

Author: Aust Stefan
Publication venue
Publication date: 01/01/2013
Field of study

Publikationsserver der Technischen Universität Clausthal