118 research outputs found
A Scalable Correlator Architecture Based on Modular FPGA Hardware, Reuseable Gateware, and Data Packetization
A new generation of radio telescopes is achieving unprecedented levels of
sensitivity and resolution, as well as increased agility and field-of-view, by
employing high-performance digital signal processing hardware to phase and
correlate large numbers of antennas. The computational demands of these imaging
systems scale in proportion to BMN^2, where B is the signal bandwidth, M is the
number of independent beams, and N is the number of antennas. The
specifications of many new arrays lead to demands in excess of tens of PetaOps
per second.
To meet this challenge, we have developed a general purpose correlator
architecture using standard 10-Gbit Ethernet switches to pass data between
flexible hardware modules containing Field Programmable Gate Array (FPGA)
chips. These chips are programmed using open-source signal processing libraries
we have developed to be flexible, scalable, and chip-independent. This work
reduces the time and cost of implementing a wide range of signal processing
systems, with correlators foremost among them,and facilitates upgrading to new
generations of processing technology. We present several correlator
deployments, including a 16-antenna, 200-MHz bandwidth, 4-bit, full Stokes
parameter application deployed on the Precision Array for Probing the Epoch of
Reionization.Comment: Accepted to Publications of the Astronomy Society of the Pacific. 31
pages. v2: corrected typo, v3: corrected Fig. 1
Izhikevich neural model and STDP learning algorithm mapping on spiking neural network hardware emulator
From the 20th century, biological mechanisms of the brain behaviour have become more and more interesting for the research communities in information fields due to the computational power of the systems they inspire. In fact, despite the lack of consensus about the information processing actually involved in brain, biological processes have served as reference for recent computational models. The first Artificial Neural Networks (ANNs) were developed as simplified versions of biological neural net- works in terms of structure and function. Today, the third generation of artificial network is that of the Spiking Neural Networks (SNNs), which reach a more realistic modelling by utilizing true biological features, like spikes, to transmit information between neurons. The proposal of this thesis is to embed the Izhikevich neuron model and a full custom "Spike timing dependent plasticity" (STDP) learning algorithm in an architecture called HEENS (Hardware Emulator of Evolved Neural System). HEENS is a multi-chip structure developed at the "Universitat Politecnica de Catalunya" (UPC) based on a ring link topology connecting several SIMD processors reproducing each one a group of neuron of a Spiking neural network. The Izhikevich neuron model is a worldwide adopted mathematical model for reproducing the neural membrane potential evolution, observed in some mammalian cortex, a long time and according to external stimuli. STDP is a biological learning algorithm which shapes the strength of a synaptic connection according to the timing with which that connection takes part to the overall spiking activity of the post or pre-synaptic neurons. This master thesis project, in particular, acts at algorithm level and at instruction level as well at architectural level. It takes place analysing the mathematical models for the right data parallelism, writing the assembly program describing the routine common to all the neurons of the implemented SNN, modifying the instruction set and the existing hardware of the HEENS architecture, in order to fullfil the biological model needs from a computational and performance point of view. HEENS architecture is described in VHDL code, its set-up operations (assembler for code translation, generation of memories, Network configuration) are performed by Python scripts, the comparison between the actual behaviour of HEENS to that of the mathematical models is instead performed via MatLAB scripts. The latter allow: to imitate the performances of a special purpose hardware; to generate source files in order to synchronize and align the model and the architecture even with the randomization of several neural parameters; to make some design choices; to verify and to show the results
An Energy Efficient non-volatile FPGA Digital Processor for Brain Neuromodulation
PhD ThesisBrain stimulation technologies have the potential to provide considerable clinical benefits for people with a range of neurological disorders. Recent neuroscience studies have shown that considerable information of brain states is contained in the low frequency local field potential (If-LFP; below 5Hz) recordings with application in real-time closed-loop neurostimulation for treating neurological disorders. Given these signals can be sampled at low sampling rate and hence provide sparse data streams, there is an opportunity to design implantable neuroprosthesis with long battery lifecycles which enables enough processing power to implement long-term, real-time closed loop control algorithms. In this thesis, a closed-loop embedded digital processor has been created for use in rodent neuroscience experiments. The first contribution of this work is to develop a mathematical analytical design approach of feedback controller for suppressing high-amplitude epileptic activity in the neuron mass model to form a better understanding of how to perform a better closed-loop stimulation to control seizures. The second contribution and the third contribution are combined to present an exploratory energy-efficient digital processor architecture built with commercial off-the-shelf non-volatile FPGAs and microcontroller for sparse data processing of brain neuromodulation. A digital hardware design of an exemplar PID control algorithm has been implemented on this proposed digital architecture. A new power computing diagram of this time-driven approach significantly reduced the power consumption which suggests that a digital combined control system of non-volatile FPGAs and microcontroller outweighs a digital control system of microcontroller with microcontroller regarding computing time cost and energy consumption supposing one microcontroller is always required. Taken together, this digital energy-efficient processor architecture gives important insights and viewpoints for the further advancements of neuroprosthesis for brain neurostimulation to achieve lower power consumption for sparse sampling data rate
Generating Circuit Tests by Exploiting Designed Behavior
This thesis describes two programs for generating tests for digital circuits that exploit several kinds of expert knowledge not used by previous approaches. First, many test generation problems can be solved efficiently using operation relations, a novel representation of circuit behavior that connects internal component operations with directly executable circuit operations. Operation relations can be computed efficiently by searching traces of simulated circuit behavior. Second, experts write test programs rather than test vectors because programs are more readable and compact. Test programs can be constructed automatically by merging program fragments using expert-supplied goal-refinement rules and domain-independent planning techniques
The Journal of Microelectronic Research 2009
https://scholarworks.rit.edu/meec_archive/1017/thumbnail.jp
Enhancing Real-time Embedded Image Processing Robustness on Reconfigurable Devices for Critical Applications
Nowadays, image processing is increasingly used in several application fields, such as biomedical, aerospace, or automotive. Within these fields, image processing is used to serve both non-critical and critical tasks. As example, in automotive, cameras are becoming key sensors in increasing car safety, driving assistance and driving comfort. They have been employed for infotainment (non-critical), as well as for some driver assistance tasks (critical), such as Forward Collision Avoidance, Intelligent Speed Control, or Pedestrian Detection.
The complexity of these algorithms brings a challenge in real-time image processing systems, requiring high computing capacity, usually not available in processors for embedded systems. Hardware acceleration is therefore crucial, and devices such as Field Programmable Gate Arrays (FPGAs) best fit the growing demand of computational capabilities. These devices can assist embedded processors by significantly speeding-up computationally intensive software algorithms.
Moreover, critical applications introduce strict requirements not only from the real-time constraints, but also from the device reliability and algorithm robustness points of view. Technology scaling is highlighting reliability problems related to aging phenomena, and to the increasing sensitivity of digital devices to external radiation events that can cause transient or even permanent faults. These faults can lead to wrong information processed or, in the worst case, to a dangerous system failure. In this context, the reconfigurable nature of FPGA devices can be exploited to increase the system reliability and robustness by leveraging Dynamic Partial Reconfiguration features.
The research work presented in this thesis focuses on the development of techniques for implementing efficient and robust real-time embedded image processing hardware accelerators and systems for mission-critical applications. Three main challenges have been faced and will be discussed, along with proposed solutions, throughout the thesis: (i) achieving real-time performances, (ii) enhancing algorithm robustness, and (iii) increasing overall system's dependability.
In order to ensure real-time performances, efficient FPGA-based hardware accelerators implementing selected image processing algorithms have been developed. Functionalities offered by the target technology, and algorithm's characteristics have been constantly taken into account while designing such accelerators, in order to efficiently tailor algorithm's operations to available hardware resources.
On the other hand, the key idea for increasing image processing algorithms' robustness is to introduce self-adaptivity features at algorithm level, in order to maintain constant, or improve, the quality of results for a wide range of input conditions, that are not always fully predictable at design-time (e.g., noise level variations). This has been accomplished by measuring at run-time some characteristics of the input images, and then tuning the algorithm parameters based on such estimations. Dynamic reconfiguration features of modern reconfigurable FPGA have been extensively exploited in order to integrate run-time adaptivity into the designed hardware accelerators.
Tools and methodologies have been also developed in order to increase the overall system dependability during reconfiguration processes, thus providing safe run-time adaptation mechanisms. In addition, taking into account the target technology and the environments in which the developed hardware accelerators and systems may be employed, dependability issues have been analyzed, leading to the development of a platform for quickly assessing the reliability and characterizing the behavior of hardware accelerators implemented on reconfigurable FPGAs when they are affected by such faults
Adaptive Receiver Design for High Speed Optical Communication
Conventional input/output (IO) links consume power, independent of changes
in the bandwidth demand by the system they are deployed in. As the system is
designed to satisfy the peak bandwidth demand, most of the time the IO links
are idle but still consuming power. In big data centers, the overall utilization
ratio of IO links is less than 10%, corresponding to a large amount of energy
wasted for idle operation.
This work demonstrates a 60 Gb/s high sensitivity non-return-to-zero (NRZ)
optical receiver in 14 nm FinFET technology with less than 7 ns power-on time.
The power on time includes the data detection, analog bias settling, photo-diode
DC current cancellation, and phase locking by the clock and data recovery circuit
(CDR). The receiver autonomously detects the data demand on the link
via a proposed link protocol and does not require any external enable or disable
signals. The proposed link protocol is designed to minimize the off-state power
consumption and power-on time of the link.
In order to achieve high data-rate and high-sensitivity while maintaining
the power budget, a 1-tap decision feedback equalization method is applied in
digital domain. The sensitivity is measured to be -8 dBm, -11 dBm, and -13 dBm
OMA (optical modulation amplitude) at 60 Gb/s, 48 Gb/s, and 32 Gb/s data rates,
respectively. The energy efficiency in always-on mode is around 2.2 pJ/bit for all
data-rates with the help of supply and bias scaling.
The receiver incorporates a phase interpolator based clock-and-data recovery
circuit with approximately 80 MHz jitter-tolerance corner frequency, thanks to
the low-latency full custom CDR logic design.
This work demonstrates the fastest ever reported CMOS optical receiver and
runs almost at twice the data-rate of the state-of-the-art CMOS optical receiver
by the time of the publication. The data-rate is comparable to BiCMOS optical
receivers but at a fraction of the power consumption
- …