1,993 research outputs found

    Resource-aware scheduling for 2D/3D multi-/many-core processor-memory systems

    Get PDF
    This dissertation addresses the complexities of 2D/3D multi-/many-core processor-memory systems, focusing on two key areas: enhancing timing predictability in real-time multi-core processors and optimizing performance within thermal constraints. The integration of an increasing number of transistors into compact chip designs, while boosting computational capacity, presents challenges in resource contention and thermal management. The first part of the thesis improves timing predictability. We enhance shared cache interference analysis for set-associative caches, advancing the calculation of Worst-Case Execution Time (WCET). This development enables accurate assessment of cache interference and the effectiveness of partitioned schedulers in real-world scenarios. We introduce TCPS, a novel task and cache-aware partitioned scheduler that optimizes cache partitioning based on task-specific WCET sensitivity, leading to improved schedulability and predictability. Our research explores various cache and scheduling configurations, providing insights into their performance trade-offs. The second part focuses on thermal management in 2D/3D many-core systems. Recognizing the limitations of Dynamic Voltage and Frequency Scaling (DVFS) in S-NUCA many-core processors, we propose synchronous thread migrations as a thermal management strategy. This approach culminates in the HotPotato scheduler, which balances performance and thermal safety. We also introduce 3D-TTP, a transient temperature-aware power budgeting strategy for 3D-stacked systems, reducing the need for Dynamic Thermal Management (DTM) activation. Finally, we present 3QUTM, a novel method for 3D-stacked systems that combines core DVFS and memory bank Low Power Modes with a learning algorithm, optimizing response times within thermal limits. This research contributes significantly to enhancing performance and thermal management in advanced processor-memory systems

    The planar anodic Al2O3-ZrO2 nanocomposite capacitor dielectrics for advanced passive device integration

    Get PDF
    The need for integrated passive devices (IPDs) emerges from the increasing consumer demand for electronic product miniaturization. Metal-insulator-metal (MIM) capacitors are vital components of IPD systems. Developing new materials and technologies is essential for advancing capacitor characteristics and co-integrating with other electronic passives. Here we present an innovative electrochemical technology joined with the sputter-deposition of Al and Zr layers to synthesize novel planar nanocomposite metal-oxide dielectrics consisting of ZrO2 nanorods self-embedded into the nanoporous Al2O3 matrix such that its pores are entirely filled with zirconium oxide. The technology is utilized in MIM capacitors characterized by modern surface and interface analysis techniques and electrical measurements. In the 95-480 nm thickness range, the best-achieved MIM device characteristics are the one-layer capacitance density of 112 nF center dot cm(-2), the loss tangent of 4 center dot 10(-3) at frequencies up to 1 MHz, the leakage current density of 40 pA center dot cm(-2), the breakdown field strength of up to 10 MV center dot cm(-1), the energy density of 100 J center dot cm(-3), the quadratic voltage coefficient of capacitance of 4 ppm center dot V-2, and the temperature coefficient of capacitance of 480 ppm center dot K-1 at 293-423 K at 1 MHz. The outstanding performance, stability, and tunable capacitors' characteristics allow for their application in low-pass filters, coupling/decoupling/bypass circuits, RC oscillators, energy-storage devices, ultrafast charge/discharge units, or high-precision analog-to-digital converters. The capacitor technology based on the non-porous planar anodic-oxide dielectrics complements the electrochemical conception of IPDs that combined, until now, the anodized aluminum interconnection, microresistors, and microinductors, all co-related in one system for use in portable electronic devices

    Mathematical optimization and machine learning to support PCB topology identification

    Get PDF
    In this paper, we study an identification problem for schematics with different concurring topologies. A framework is proposed, that is both supported by mathematical optimization and machine learning algorithms. Through the use of Python libraries, such as scikit-rf, which allows for the emulation of network analyzer measurements, and a physical microstrip line simulation on PCBs, data for training and testing the framework are provided. In addition to an individual treatment of the concurring topologies and subsequent comparison, a method is introduced to tackle the identification of the optimum topology directly via a standard optimization or machine learning setup: An encoder-decoder sequence is trained with schematics of different topologies, to generate a flattened representation of the rated graph representation of the considered schematics. Still containing the relevant topology information in encoded (i.e., flattened) form, the so obtained latent space representations of schematics can be used for standard optimization of machine learning processes. Using now the encoder to map schematics on latent variables or the decoder to reconstruct schematics from their latent space representation, various machine learning and optimization setups can be applied to treat the given identification task. The proposed framework is presented and validated for a small model problem comprising different circuit topologies.</p

    LIVE MERCHANT ENVIRONMENT

    Get PDF
    The present disclosure provides a system and a method for operating a live merchant platform. The disclosure proposes using a live merchant platform, associated with a merchant and an acquirer, to test a range of payment experiences across issuers. The live merchant platform is utilized for authenticating cardholder details and authorizing payment details. The live merchant platform sends an authentication request message to an issuer via a payment network. Thereafter, the request message is forwarded to the core processing environment from a security protocol environment via a gateway environment. Further, the live merchant platform receives authenticated response from an issuer environment via the payment network

    Towards trustworthy computing on untrustworthy hardware

    Get PDF
    Historically, hardware was thought to be inherently secure and trusted due to its obscurity and the isolated nature of its design and manufacturing. In the last two decades, however, hardware trust and security have emerged as pressing issues. Modern day hardware is surrounded by threats manifested mainly in undesired modifications by untrusted parties in its supply chain, unauthorized and pirated selling, injected faults, and system and microarchitectural level attacks. These threats, if realized, are expected to push hardware to abnormal and unexpected behaviour causing real-life damage and significantly undermining our trust in the electronic and computing systems we use in our daily lives and in safety critical applications. A large number of detective and preventive countermeasures have been proposed in literature. It is a fact, however, that our knowledge of potential consequences to real-life threats to hardware trust is lacking given the limited number of real-life reports and the plethora of ways in which hardware trust could be undermined. With this in mind, run-time monitoring of hardware combined with active mitigation of attacks, referred to as trustworthy computing on untrustworthy hardware, is proposed as the last line of defence. This last line of defence allows us to face the issue of live hardware mistrust rather than turning a blind eye to it or being helpless once it occurs. This thesis proposes three different frameworks towards trustworthy computing on untrustworthy hardware. The presented frameworks are adaptable to different applications, independent of the design of the monitored elements, based on autonomous security elements, and are computationally lightweight. The first framework is concerned with explicit violations and breaches of trust at run-time, with an untrustworthy on-chip communication interconnect presented as a potential offender. The framework is based on the guiding principles of component guarding, data tagging, and event verification. The second framework targets hardware elements with inherently variable and unpredictable operational latency and proposes a machine-learning based characterization of these latencies to infer undesired latency extensions or denial of service attacks. The framework is implemented on a DDR3 DRAM after showing its vulnerability to obscured latency extension attacks. The third framework studies the possibility of the deployment of untrustworthy hardware elements in the analog front end, and the consequent integrity issues that might arise at the analog-digital boundary of system on chips. The framework uses machine learning methods and the unique temporal and arithmetic features of signals at this boundary to monitor their integrity and assess their trust level

    Stormwater management using play areas: potential, limitations and design considerations

    Get PDF
    Studien undersøker hvordan urbane lekeområder kan brukes til å håndtere problemene med overvann ved å utforme dem som multifunksjonelle områder. Studien begynner med å undersøke nåværende praksis i ulike prosjekter, og beskriver vanlige design-trender og funksjoner, sammen med nyttige klassifiseringer for forskjellige typer lekeområder. Videre går studien inn på relevante regler og forskning som omhandler rekreasjonsbruk av overvann, og foreslår nøkkelprinsipper for praktisk anvendelse av overvannsfunksjoner på lekeområder, med fokus på helse- og sikkerhetsaspekter. Det også foreslås mer detaljerte sjekklister for hver type lekeområde for å gjenspeile deres spesifikke behov og mulige designløsninger. Til slutt blir de foreslåtte prinsippene brukt på en case-studie av Lillestrøm by og en detaljert analyse av Volla skole og park. Case-studien fant at skolene i Lillestrøm har størst potensial for å bidra til overvannsløsninger på grunn av deres areal bidrag og nærhet til større overvannslinjer. Analysen av Volla skole og park demonstrerer hvordan den multifunksjonelle prinsippet kan brukes for å øke lekemulighetene og redusere overflateavrenning fra området. Selv om det var utfordrende å kvantifisere fordelene med den multifunksjonelle design-tilnærmingen, antyder studien at å kombinere funksjonaliteten til klimatilpasningsmetoder og lekeområder i norske byer kan bringe nyttige verdier. Det anbefales ytterligere forskning på kostnadseffektiviteten til flerbruksområder, få mer informasjon om hvordan slike områder fungerer i endrende værforhold og universell utforming av overvannstiltak. Generelt gir studien nyttige innsikter for byplanleggere og landskapsarkitekter om hvordan man kan tildele tilstrekkelig plass for overvannshåndtering i byer samtidig som man skaper hyggelige og bærekraftige lekeområder.The study investigates how urban play areas can be involved into addressing stormwater issues by designing them as multifunctional spaces. The study begins by examining current practices in a range of projects and outlining common design trends and functionalities, along with useful classifications for different types of play areas. Further, the study delves into relevant regulations and research concerning the recreational use of runoff and proposes key principles for the practical application of stormwater functions to play areas, with a focus on health and safety concerns. The study also provides more detailed checklists for each type of play area to reflect their specific needs and potential design solutions. Finally, the proposed principles are applied to a case study of Lillestrøm city and a detailed analysis of Volla school and park. It is found that schools in Lillestrøm have the greatest potential for contributing to stormwater solutions due to their total area and proximity to major runoff lines. The analysis of Volla school and park demonstrates how the multifunctional principle can be applied to enhance playability and reduce surface runoff generated by the site. Although it was challenging to quantify the benefits of the multifunctional design approach, the study suggests that joining the functionality of climate adaptation methods and play spaces in Norwegian cities can bring additional values. It recommends further research on the cost-effectiveness of multipurpose spaces, gaining more information on how such spaces function in changing weather and the universal design of stormwater facilities. Overall, the research provides useful insights for city planners and landscape architects on how to allocate sufficient space for stormwater management in cities while also creating enjoyable and sustainable play spaces

    EnforceSNN: Enabling Resilient and Energy-Efficient Spiking Neural Network Inference considering Approximate DRAMs for Embedded Systems

    Full text link
    Spiking Neural Networks (SNNs) have shown capabilities of achieving high accuracy under unsupervised settings and low operational power/energy due to their bio-plausible computations. Previous studies identified that DRAM-based off-chip memory accesses dominate the energy consumption of SNN processing. However, state-of-the-art works do not optimize the DRAM energy-per-access, thereby hindering the SNN-based systems from achieving further energy efficiency gains. To substantially reduce the DRAM energy-per-access, an effective solution is to decrease the DRAM supply voltage, but it may lead to errors in DRAM cells (i.e., so-called approximate DRAM). Towards this, we propose \textit{EnforceSNN}, a novel design framework that provides a solution for resilient and energy-efficient SNN inference using reduced-voltage DRAM for embedded systems. The key mechanisms of our EnforceSNN are: (1) employing quantized weights to reduce the DRAM access energy; (2) devising an efficient DRAM mapping policy to minimize the DRAM energy-per-access; (3) analyzing the SNN error tolerance to understand its accuracy profile considering different bit error rate (BER) values; (4) leveraging the information for developing an efficient fault-aware training (FAT) that considers different BER values and bit error locations in DRAM to improve the SNN error tolerance; and (5) developing an algorithm to select the SNN model that offers good trade-offs among accuracy, memory, and energy consumption. The experimental results show that our EnforceSNN maintains the accuracy (i.e., no accuracy loss for BER less-or-equal 10^-3) as compared to the baseline SNN with accurate DRAM, while achieving up to 84.9\% of DRAM energy saving and up to 4.1x speed-up of DRAM data throughput across different network sizes.Comment: Accepted for publication at Frontiers in Neuroscience - Section Neuromorphic Engineerin

    Approximate Computing Survey, Part II: Application-Specific & Architectural Approximation Techniques and Applications

    Full text link
    The challenging deployment of compute-intensive applications from domains such Artificial Intelligence (AI) and Digital Signal Processing (DSP), forces the community of computing systems to explore new design approaches. Approximate Computing appears as an emerging solution, allowing to tune the quality of results in the design of a system in order to improve the energy efficiency and/or performance. This radical paradigm shift has attracted interest from both academia and industry, resulting in significant research on approximation techniques and methodologies at different design layers (from system down to integrated circuits). Motivated by the wide appeal of Approximate Computing over the last 10 years, we conduct a two-part survey to cover key aspects (e.g., terminology and applications) and review the state-of-the art approximation techniques from all layers of the traditional computing stack. In Part II of our survey, we classify and present the technical details of application-specific and architectural approximation techniques, which both target the design of resource-efficient processors/accelerators & systems. Moreover, we present a detailed analysis of the application spectrum of Approximate Computing and discuss open challenges and future directions.Comment: Under Review at ACM Computing Survey

    A Phase Change Memory and DRAM Based Framework For Energy-Efficient and High-Speed In-Memory Stochastic Computing

    Get PDF
    Convolutional Neural Networks (CNNs) have proven to be highly effective in various fields related to Artificial Intelligence (AI) and Machine Learning (ML). However, the significant computational and memory requirements of CNNs make their processing highly compute and memory-intensive. In particular, the multiply-accumulate (MAC) operation, which is a fundamental building block of CNNs, requires enormous arithmetic operations. As the input dataset size increases, the traditional processor-centric von-Neumann computing architecture becomes ill-suited for CNN-based applications. This results in exponentially higher latency and energy costs, making the processing of CNNs highly challenging. To overcome these challenges, researchers have explored the Processing-In Memory (PIM) technique, which involves placing the processing unit inside or near the memory unit. This approach reduces data migration length and utilizes the internal memory bandwidth at the memory chip level. However, developing a reliable PIM-based system with minimal hardware modifications and design complexity remains a significant challenge. The proposed solution in the report suggests utilizing different memory technologies, such as Dynamic RAM (DRAM) and phase change memory (PCM), with Stochastic arithmetic and minimal add-on logic. Stochastic computing is a technique that uses random numbers to perform arithmetic operations instead of traditional binary representation. This technique reduces hardware requirements for CNN\u27s arithmetic operations, making it possible to implement them with minimal add-on logic. The report details the workflow for performing arithmetical operations used by CNNs, including MAC, activation, and floating-point functions. The proposed solution includes designs for scalable Stochastic Number Generator (SNG), DRAM CNN accelerator, non-volatile memory (NVM) class PCRAM-based CNN accelerator, and DRAM-based stochastic to binary conversion (StoB) for in-situ deep learning. These designs utilize stochastic computing to reduce the hardware requirements for CNN\u27s arithmetic operations and enable energy and time-efficient processing of CNNs. The report also identifies future research directions for the proposed designs, including in-situ PCRAM-based SNG, ODIN (A Bit-Parallel Stochastic Arithmetic Based Accelerator for In-Situ Neural Network Processing in Phase Change RAM), ATRIA (Bit-Parallel Stochastic Arithmetic Based Accelerator for In-DRAM CNN Processing), and AGNI (In-Situ, Iso-Latency Stochastic-to-Binary Number Conversion for In-DRAM Deep Learning), and presents initial findings for these ideas. In summary, the proposed solution in the report offers a comprehensive approach to address the challenges of processing CNNs, and the proposed designs have the potential to improve the energy and time efficiency of CNNs significantly. Using Stochastic Computing and different memory technologies enables the development of reliable PIM-based systems with minimal hardware modifications and design complexity, providing a promising path for the future of CNN-based applications

    Adaptive Intelligent Systems for Extreme Environments

    Get PDF
    As embedded processors become powerful, a growing number of embedded systems equipped with artificial intelligence (AI) algorithms have been used in radiation environments to perform routine tasks to reduce radiation risk for human workers. On the one hand, because of the low price, commercial-off-the-shelf devices and components are becoming increasingly popular to make such tasks more affordable. Meanwhile, it also presents new challenges to improve radiation tolerance, the capability to conduct multiple AI tasks and deliver the power efficiency of the embedded systems in harsh environments. There are three aspects of research work that have been completed in this thesis: 1) a fast simulation method for analysis of single event effect (SEE) in integrated circuits, 2) a self-refresh scheme to detect and correct bit-flips in random access memory (RAM), and 3) a hardware AI system with dynamic hardware accelerators and AI models for increasing flexibility and efficiency. The variances of the physical parameters in practical implementation, such as the nature of the particle, linear energy transfer and circuit characteristics, may have a large impact on the final simulation accuracy, which will significantly increase the complexity and cost in the workflow of the transistor level simulation for large-scale circuits. It makes it difficult to conduct SEE simulations for large-scale circuits. Therefore, in the first research work, a new SEE simulation scheme is proposed, to offer a fast and cost-efficient method to evaluate and compare the performance of large-scale circuits which subject to the effects of radiation particles. The advantages of transistor and hardware description language (HDL) simulations are combined here to produce accurate SEE digital error models for rapid error analysis in large-scale circuits. Under the proposed scheme, time-consuming back-end steps are skipped. The SEE analysis for large-scale circuits can be completed in just few hours. In high-radiation environments, bit-flips in RAMs can not only occur but may also be accumulated. However, the typical error mitigation methods can not handle high error rates with low hardware costs. In the second work, an adaptive scheme combined with correcting codes and refreshing techniques is proposed, to correct errors and mitigate error accumulation in extreme radiation environments. This scheme is proposed to continuously refresh the data in RAMs so that errors can not be accumulated. Furthermore, because the proposed design can share the same ports with the user module without changing the timing sequence, it thus can be easily applied to the system where the hardware modules are designed with fixed reading and writing latency. It is a challenge to implement intelligent systems with constrained hardware resources. In the third work, an adaptive hardware resource management system for multiple AI tasks in harsh environments was designed. Inspired by the “refreshing” concept in the second work, we utilise a key feature of FPGAs, partial reconfiguration, to improve the reliability and efficiency of the AI system. More importantly, this feature provides the capability to manage the hardware resources for deep learning acceleration. In the proposed design, the on-chip hardware resources are dynamically managed to improve the flexibility, performance and power efficiency of deep learning inference systems. The deep learning units provided by Xilinx are used to perform multiple AI tasks simultaneously, and the experiments show significant improvements in power efficiency for a wide range of scenarios with different workloads. To further improve the performance of the system, the concept of reconfiguration was further extended. As a result, an adaptive DL software framework was designed. This framework can provide a significant level of adaptability support for various deep learning algorithms on an FPGA-based edge computing platform. To meet the specific accuracy and latency requirements derived from the running applications and operating environments, the platform may dynamically update hardware and software (e.g., processing pipelines) to achieve better cost, power, and processing efficiency compared to the static system
    corecore