7 research outputs found

    New Logic-In-Memory Paradigms: An Architectural and Technological Perspective

    Get PDF
    Processing systems are in continuous evolution thanks to the constant technological advancement and architectural progress. Over the years, computing systems have become more and more powerful, providing support for applications, such as Machine Learning, that require high computational power. However, the growing complexity of modern computing units and applications has had a strong impact on power consumption. In addition, the memory plays a key role on the overall power consumption of the system, especially when considering data-intensive applications. These applications, in fact, require a lot of data movement between the memory and the computing unit. The consequence is twofold: Memory accesses are expensive in terms of energy and a lot of time is wasted in accessing the memory, rather than processing, because of the performance gap that exists between memories and processing units. This gap is known as the memory wall or the von Neumann bottleneck and is due to the different rate of progress between complementary metal-oxide semiconductor (CMOS) technology and memories. However, CMOS scaling is also reaching a limit where it would not be possible to make further progress. This work addresses all these problems from an architectural and technological point of view by: (1) Proposing a novel Configurable Logic-in-Memory Architecture that exploits the in-memory computing paradigm to reduce the memory wall problem while also providing high performance thanks to its flexibility and parallelism; (2) exploring a non-CMOS technology as possible candidate technology for the Logic-in-Memory paradigm

    Exploring New Computing Paradigms for Data-Intensive Applications

    Get PDF
    L'abstract è presente nell'allegato / the abstract is in the attachmen

    Integrating simultaneous bi-direction signalling in the test fabric of 3D stacked integrated circuits.

    Get PDF
    Jennions, Ian K. - Associate SupervisorThe world has seen significant advancements in electronic devices’ capabilities, most notably the ability to embed ultra-large-scale functionalities in lightweight, area and power-efficient devices. There has been an enormous push towards quality and reliability in consumer electronics that have become an indispensable part of human life. Consequently, the tests conducted on these devices at the final stages before these are shipped out to the customers have a very high significance in the research community. However, researchers have always struggled to find a balance between the test time (hence the test cost) and the test overheads; unfortunately, these two are inversely proportional. On the other hand, the ever-increasing demand for more powerful and compact devices is now facing a new challenge. Historically, with the advancements in manufacturing technology, electronic devices witnessed miniaturizing at an exponential pace, as predicted by Moore’s law. However, further geometric or effective 2D scaling seems complicated due to performance and power concerns with smaller technology nodes. One promising way forward is by forming 3D Stacked Integrated Circuits (SICs), in which the individual dies are stacked vertically and interconnected using Through Silicon Vias (TSVs) before being packaged as a single chip. This allows more functionality to be embedded with a reduced footprint and addresses another critical problem being observed in 2D designs: increasingly long interconnects and latency issues. However, as more and more functionality is embedded into a small area, it becomes increasingly challenging to access the internal states (to observe or control) after the device is fabricated, which is essential for testing. This access is restricted by the limited number of Chip Terminals (IC pins and the vertical Through Silicon Vias) that a chip could be fitted with, the power consumption concerns, and the chip area overheads that could be allocated for testing. This research investigates Simultaneous Bi-Directional Signaling (SBS) for use in Test Access Mechanism (TAM) designs in 3D SICs. SBS enables chip terminals to simultaneously send and receive test vectors on a single Chip Terminal (CT), effectively doubling the per-pin efficiency, which could be translated into additional test channels for test time reduction or Chip Terminal reduction for resource efficiency. The research shows that SBS-based test access methods have significant potential in reducing test times and/or test resources compared to traditional approaches, thereby opening up new avenues towards cost-effectiveness and reliability of future electronics.PhD in Manufacturin

    Performance Analysis of Complex Shared Memory Systems

    Get PDF
    Systems for high performance computing are getting increasingly complex. On the one hand, the number of processors is increasing. On the other hand, the individual processors are getting more and more powerful. In recent years, the latter is to a large extent achieved by increasing the number of cores per processor. Unfortunately, scientific applications often fail to fully utilize the available computational performance. Therefore, performance analysis tools that help to localize and fix performance problems are indispensable. Large scale systems for high performance computing typically consist of multiple compute nodes that are connected via network. Performance analysis tools that analyze performance problems that arise from using multiple nodes are readily available. However, the increasing number of cores per processor that can be observed within the last decade represents a major change in the node architecture. Therefore, this work concentrates on the analysis of the node performance. The goal of this thesis is to improve the understanding of the achieved application performance on existing hardware. It can be observed that the scaling of parallel applications on multi-core processors differs significantly from the scaling on multiple processors. Therefore, the properties of shared resources in contemporary multi-core processors as well as remote accesses in multi-processor systems are investigated and their respective impact on the application performance is analyzed. As a first step, a comprehensive suite of highly optimized micro-benchmarks is developed. These benchmarks are able to determine the performance of memory accesses depending on the location and coherence state of the data. They are used to perform an in-depth analysis of the characteristics of memory accesses in contemporary multi-processor systems, which identifies potential bottlenecks. However, in order to localize performance problems, it also has to be determined to which extend the application performance is limited by certain resources. Therefore, a methodology to derive metrics for the utilization of individual components in the memory hierarchy as well as waiting times caused by memory accesses is developed in the second step. The approach is based on hardware performance counters, which record the number of certain hardware events. The developed micro-benchmarks are used to selectively stress individual components, which can be used to identify the events that provide a reasonable assessment for the utilization of the respective component and the amount of time that is spent waiting for memory accesses to complete. Finally, the knowledge gained from this process is used to implement a visualization of memory related performance issues in existing performance analysis tools. The results of the micro-benchmarks reveal that the increasing number of cores per processor and the usage of multiple processors per node leads to complex systems with vastly different performance characteristics of memory accesses depending on the location of the accessed data. Furthermore, it can be observed that the aggregated throughput of shared resources in multi-core processors does not necessarily scale linearly with the number of cores that access them concurrently, which limits the scalability of parallel applications. It is shown that the proposed methodology for the identification of meaningful hardware performance counters yields useful metrics for the localization of memory related performance limitations
    corecore