9 research outputs found

    Toward fast and accurate architecture exploration in a hardware/software codesign flow

    Get PDF

    Reconfigurable interconnects in DSM systems: a focus on context switch behavior

    Get PDF
    Recent advances in the development of reconfigurable optical interconnect technologies allow for the fabrication of low cost and run-time adaptable interconnects in large distributed shared-memory (DSM) multiprocessor machines. This can allow the use of adaptable interconnection networks that alleviate the huge bottleneck present due to the gap between the processing speed and the memory access time over the network. In this paper we have studied the scheduling of tasks by the kernel of the operating system (OS) and its influence on communication between the processing nodes of the system, focusing on the traffic generated just after a context switch. We aim to use these results as a basis to propose a potential reconfiguration of the network that could provide a significant speedup

    Copper wafer bonding in three-dimensional integration

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2005.Includes bibliographical references (p. 165-176).Three-dimensional (3D) integration, in which multiple layers of devices are stacked with high density of interconnects between the layers, offers solutions for problems when the critical dimensions in integrated circuits keep shrinking. Copper wafer bonding has been considered as a strong candidate for fabrication of three-dimensional integrated circuits (3-D IC). This thesis work involves fundamental studies of copper wafer bonding and bonding performance of bonded interconnects. Copper bonded wafers exhibit good bonding qualities and present no original bonding interfaces when the bonding process occurs at 400⁰C/4000 mbar for 30 min, followed by nitrogen anneal at 400⁰C for 30 min. Oxide distribution in the bonded layer is uniform and sparse. Evolution of microstructure morphologies and grain orientations of copper bonded wafers during bonding and annealing were studied. The bonded layer reaches steady state after post-bonding anneal. The microstructure morphologies and bond strengths of copper bonded wafers under different bonding conditions were investigated.A map summarizing these results provides a useful reference on process conditions suitable for three-dimensional integration based on copper wafer bonding. Similar microstructure morphology of copper bonded interconnects was observed to that of copper bonded wafers. Specific contact resistances of bonded interconnects of approximately 10⁻⁸ [ohms]-cm² were measured by using a novel test structure which can eliminate the errors from misalignment during bonding. The bonding qualities of different interconnect sizes and densities have been investigated. In addition to increasing the bonding temperature and duration, options such as larger interconnect sizes, total bonding area, or use of dummy pads for bonding in the unused area improve the quality of bonded interconnects. Process development of silicon layer stacking based on Cu wafer bonding was successfully applied to demonstrate a strong four-layer-stack structure.Bonded Cu layers in this structure become homogeneous layers and do not show original bonding interfaces. This process can be reliably applied in three-dimensional integration applications.by Kuan-Neng Chen.Ph.D

    Untersuchungen zur Kostenoptimierung für Hardware-Emulatoren durch Anwendung von Methoden der partiellen Laufzeitrekonfiguration

    Get PDF
    Der vorliegende Band der wissenschaftlichen Schriftenreihe Eingebettete Selbstorganisierende Systeme widmet sich der Optimierung von Hardware Emulatoren durch die Anwendung von Methoden der partiellen Laufzeitrekonfiguration. An aktuelle Schaltkreis- und Systementwürfe werden zunehmend divergente Anforderungen gestellt. Einer sehr kurzen Entwicklungszeit für eine schnelle Markteinführung steht, um teure und aufwändige Re-Desings zu verhindern, eine möglichst umfangreiche Testabdeckung des Entwurfs gegenüber. Um die Zeit für die Tests zu reduzieren, kommen überwiegend FPGA-basierte HW-Emulatoren zum Einsatz. Durch den Einfluss der steigenden Komplexität aktueller Entwürfe auf die Emulator-Plattform reduziert sich jedoch signifikant die Performance der Emulatoren. Die in Emulatoren eingesetzten FPGAs sind aber zunehmend partiell zur Laufzeit rekonfigurierbar. Der in der vorliegenden Arbeit umgesetzte Ansatz behandelt die Anwendung von Methoden der Laufzeitrekonfiguration auf dem Gebiet der Hardware-Emulation. Dafür ist zunächst eine Partitionierung des zu testenden Entwurfs in möglichst funktional unabhängige Systemteile notwendig. Für eine optimierte und ressourceneffiziente Platzierung der einzelnen HW-Module während der Emulation, ist ein ebenfalls auf dem FPGA platziertes Kommunikationsnetzwerk implementiert. Der vorgestellte Ansatz wird an verschiedenen Beispielen anschaulich illustriert. So kann der Leser die Mächtigkeit der entwickelten Methodik nachvollziehen und wird motiviert, das Verfahren auch auf weitere Anwendungsfälle zu übertragen.Current circuit and system designs consist a lot of gate numbers and divergent requirements. In contrast to a short development and time to market schedule, the needs for perfect test coverage and quality are rising. One approach to cover this problem is the FPGA based functional test of electronic circuits. State of the art FPGA platforms doesn't consist enough gates to support fully custom designs. The thesis catches this problem and gives some approaches to use partial dynamic reconfiguration to solve the size problem. A fully automated design flow demonstrates partial partitioning of designs, modifications to use dynamic reconfiguration and its schedule. At the end of the work, some examples demonstrates the power of the approach

    Dynamically reconfigurable asynchronous processor

    Get PDF
    The main design requirements for today's mobile applications are: · high throughput performance. · high energy efficiency. · high programmability. Until now, the choice of platform has often been limited to Application-Specific Integrated Circuits (ASICs), due to their best-of-breed performance and power consumption. The economies of scale possible with these high-volume markets have traditionally been able to hide the high Non-Recurring Engineering (NRE) costs required for designing and fabricating new ASICs. However, with the NREs and design time escalating with each generation of mobile applications, this practice may be reaching its limit. Designers today are looking at programmable solutions, so that they can respond more rapidly to changes in the market and spread costs over several generations of mobile applications. However, there have been few feasible alternatives to ASICs: Digital Signals Processors (DSPs) and microprocessors cannot meet the throughput requirements, whereas Field-Programmable Gate Arrays (FPGAs) require too much area and power. Coarse-grained dynamically reconfigurable architectures offer better solutions for high throughput applications, when power and area considerations are taken into account. One promising example is the Reconfigurable Instruction Cell Array (RICA). RICA consists of an array of cells with an interconnect that can be dynamically reconfigured on every cycle. This allows quite complex datapaths to be rendered onto the fabric and executed in a single configuration - making these architectures particularly suitable to stream processing. Furthermore, RICA can be programmed from C, making it a good fit with existing design methodologies. However the RICA architecture has a drawback: poor scalability in terms of area and power. As the core gets bigger, the number of sequential elements in the array must be increased significantly to maintain the ability to achieve high throughputs through pipelining. As a result, a larger clock tree is required to synchronise the increased number of sequential elements. The clock tree therefore takes up a larger percentage of the area and power consumption of the core. This thesis presents a novel Dynamically Reconfigurable Asynchronous Processor (DRAP), aimed at high-throughput mobile applications. DRAP is based on the RICA architecture, but uses asynchronous design techniques - methods of designing digital systems without clocks. The absence of a global clock signal makes DRAP more scalable in terms of power and area overhead than its synchronous counterpart. The DRAP architecture maintains most of the benefits of custom asynchronous design, whilst also providing programmability via conventional high-level languages. Results show that the DRAP processor delivers considerably lower power consumption when compared to a market-leading Very Long Instruction Word (VLIW) processor and a low-power ARM processor. For example, DRAP resulted in a reduction in power consumption of 20 times compared to the ARM7 processor, and 29 times compared to the TIC64x VLIW, when running the same benchmark capped to the same throughput and for the same process technology (0.13μm). When compared to an equivalent RICA design, DRAP was up to 22% larger than RICA but resulted in a power reduction of up to 1.9 times. It was also capable of achieving up to 2.8 times higher throughputs than RICA for the same benchmarks
    corecore