618 research outputs found
A Phase Change Memory and DRAM Based Framework For Energy-Efficient and High-Speed In-Memory Stochastic Computing
Convolutional Neural Networks (CNNs) have proven to be highly effective in various fields related to Artificial Intelligence (AI) and Machine Learning (ML). However, the significant computational and memory requirements of CNNs make their processing highly compute and memory-intensive. In particular, the multiply-accumulate (MAC) operation, which is a fundamental building block of CNNs, requires enormous arithmetic operations. As the input dataset size increases, the traditional processor-centric von-Neumann computing architecture becomes ill-suited for CNN-based applications. This results in exponentially higher latency and energy costs, making the processing of CNNs highly challenging.
To overcome these challenges, researchers have explored the Processing-In Memory (PIM) technique, which involves placing the processing unit inside or near the memory unit. This approach reduces data migration length and utilizes the internal memory bandwidth at the memory chip level. However, developing a reliable PIM-based system with minimal hardware modifications and design complexity remains a significant challenge.
The proposed solution in the report suggests utilizing different memory technologies, such as Dynamic RAM (DRAM) and phase change memory (PCM), with Stochastic arithmetic and minimal add-on logic. Stochastic computing is a technique that uses random numbers to perform arithmetic operations instead of traditional binary representation. This technique reduces hardware requirements for CNN\u27s arithmetic operations, making it possible to implement them with minimal add-on logic.
The report details the workflow for performing arithmetical operations used by CNNs, including MAC, activation, and floating-point functions. The proposed solution includes designs for scalable Stochastic Number Generator (SNG), DRAM CNN accelerator, non-volatile memory (NVM) class PCRAM-based CNN accelerator, and DRAM-based stochastic to binary conversion (StoB) for in-situ deep learning. These designs utilize stochastic computing to reduce the hardware requirements for CNN\u27s arithmetic operations and enable energy and time-efficient processing of CNNs.
The report also identifies future research directions for the proposed designs, including in-situ PCRAM-based SNG, ODIN (A Bit-Parallel Stochastic Arithmetic Based Accelerator for In-Situ Neural Network Processing in Phase Change RAM), ATRIA (Bit-Parallel Stochastic Arithmetic Based Accelerator for In-DRAM CNN Processing), and AGNI (In-Situ, Iso-Latency Stochastic-to-Binary Number Conversion for In-DRAM Deep Learning), and presents initial findings for these ideas.
In summary, the proposed solution in the report offers a comprehensive approach to address the challenges of processing CNNs, and the proposed designs have the potential to improve the energy and time efficiency of CNNs significantly. Using Stochastic Computing and different memory technologies enables the development of reliable PIM-based systems with minimal hardware modifications and design complexity, providing a promising path for the future of CNN-based applications
Radiation Tolerant Electronics, Volume II
Research on radiation tolerant electronics has increased rapidly over the last few years, resulting in many interesting approaches to model radiation effects and design radiation hardened integrated circuits and embedded systems. This research is strongly driven by the growing need for radiation hardened electronics for space applications, high-energy physics experiments such as those on the large hadron collider at CERN, and many terrestrial nuclear applications, including nuclear energy and safety management. With the progressive scaling of integrated circuit technologies and the growing complexity of electronic systems, their ionizing radiation susceptibility has raised many exciting challenges, which are expected to drive research in the coming decade.After the success of the first Special Issue on Radiation Tolerant Electronics, the current Special Issue features thirteen articles highlighting recent breakthroughs in radiation tolerant integrated circuit design, fault tolerance in FPGAs, radiation effects in semiconductor materials and advanced IC technologies and modelling of radiation effects
Study and design of an interface for remote audio processing
This project focused on the study and design of an interface for remote audio processing, with the objective of acquiring by filtering, biasing, and amplifying an analog
signal before digitizing it by means of two MCP3208 ADCs to achieve a 24-bit resolution signal. The resulting digital signal was then transmitted to a Raspberry Pi
using SPI protocol, where it was processed by a Flask server that could be accessed
from both local and remote networks.
The design of the PCB was a critical component of the project, as it had to accommodate various components and ensure accurate signal acquisition and transmission.
The PCB design was created using KiCad software, which allowed for the precise
placement and routing of all components. A major challenge in the design of the interface was to ensure that the analog signal was not distorted during acquisition and
amplification. This was achieved through careful selection of amplifier components
and using high-pass and low-pass filters to remove any unwanted noise.
Once the analog signal was acquired and digitized, the resulting digital signal was
transmitted to the Raspberry Pi using SPI protocol. The Raspberry Pi acted as
the host for a Flask server, which could be accessed from local and remote networks
using a web browser. The Flask server allowed for the processing of the digital signal
and provided a user interface for controlling the gain and filtering parameters of the
analog signal. This enabled the user to adjust the signal parameters to suit their
specific requirements, making the interface highly flexible and adaptable to a variety
of audio processing applications.
The final interface was capable of remote audio processing, making it highly useful
in scenarios where the audio signal needed to be acquired and processed in a location
separate from the user. For example, it could be used in a recording studio, where the
audio signal from the microphone could be remotely processed using the interface.
The gain and filtering parameters could be adjusted in real-time, allowing the sound
engineer to fine-tune the audio signal to produce the desired recording.
In conclusion, the project demonstrated the feasibility and potential benefits of
using a remote audio processing system for various applications. The design of the
PCB, selection of components, and use of the Flask server enabled the creation of
an interface that was highly flexible, accurate, and adaptable to a variety of audio
processing requirements. Overall, the project represents a significant step forward
in the field of remote audio processing, with the potential to benefit many different
applications in the future
Modelling, Dimensioning and Optimization of 5G Communication Networks, Resources and Services
This reprint aims to collect state-of-the-art research contributions that address challenges in the emerging 5G networks design, dimensioning and optimization. Designing, dimensioning and optimization of communication networks resources and services have been an inseparable part of telecom network development. The latter must convey a large volume of traffic, providing service to traffic streams with highly differentiated requirements in terms of bit-rate and service time, required quality of service and quality of experience parameters. Such a communication infrastructure presents many important challenges, such as the study of necessary multi-layer cooperation, new protocols, performance evaluation of different network parts, low layer network design, network management and security issues, and new technologies in general, which will be discussed in this book
Low Power Memory/Memristor Devices and Systems
This reprint focusses on achieving low-power computation using memristive devices. The topic was designed as a convenient reference point: it contains a mix of techniques starting from the fundamental manufacturing of memristive devices all the way to applications such as physically unclonable functions, and also covers perspectives on, e.g., in-memory computing, which is inextricably linked with emerging memory devices such as memristors. Finally, the reprint contains a few articles representing how other communities (from typical CMOS design to photonics) are fighting on their own fronts in the quest towards low-power computation, as a comparison with the memristor literature. We hope that readers will enjoy discovering the articles within
A multi-level functional IR with rewrites for higher-level synthesis of accelerators
Specialised accelerators deliver orders of magnitude higher energy-efficiency than
general-purpose processors. Field Programmable Gate Arrays (FPGAs) have become
the substrate of choice, because the ever-changing nature of modern workloads, such
as machine learning, demands reconfigurability. However, they are notoriously hard
to program directly using Hardware Description Languages (HDLs). Traditional High-Level Synthesis (HLS) tools improve productivity, but come with their own problems.
They often produce sub-optimal designs and programmers are still required to write
hardware-specific code, thus development cycles remain long.
This thesis proposes Shir, a higher-level synthesis approach for high-performance
accelerator design with a hardware-agnostic programming entry point, a multi-level
Intermediate Representation (IR), a compiler and rewrite rules for optimisation.
First, a novel, multi-level functional IR structure for accelerator design is described.
The IRs operate on different levels of abstraction, cleanly separating different hardware
concerns. They enable the expression of different forms of parallelism and standard
memory features, such as asynchronous off-chip memories or synchronous on-chip
buffers, as well as arbitration of such shared resources. Exposing these features at the
IR level is essential for achieving high performance.
Next, mechanical lowering procedures are introduced to automatically compile
a program specification through Shir’s functional IRs until low-level HDL code for
FPGA synthesis is emitted. Each lowering step gradually adds implementation details.
Finally, this thesis presents rewrite rules for automatic optimisations around parallelisation, buffering and data reshaping. Reshaping operations pose a challenge to
functional approaches in particular. They introduce overheads that compromise performance or even prevent the generation of synthesisable hardware designs altogether.
This fundamental issue is solved by the application of rewrite rules.
The viability of this approach is demonstrated by running matrix multiplication
and 2D convolution on an Intel Arria 10 FPGA. A limited design space exploration is
conducted, confirming the ability of the IR to exploit various hardware features. Using
rewrite rules for optimisation, it is possible to generate high-performance designs
that are competitive with highly tuned OpenCL implementations and that outperform
hardware-agnostic OpenCL code. The performance impact of the optimisations is
further evaluated showing that they are essential to achieving high performance, and
in many cases also necessary to produce hardware that fits the resource constraints
- …