3,273 research outputs found
Pattern-Based FPGA Logic Block and Clustering Algorithm
In classical FPGA, LUTs and DFFs are pre-packed into BLEs and then BLEs are grouped into logic blocks. We propose a novel logic block architecture with fast combinational paths between LUTs, called pattern-based logic blocks. A new clustering algorithm is developed to release the potential of pattern-based logic blocks. Experimental results show that the novel architecture and the associated clustering algorithm lead to a 14% performance gain and a 8% wirelength reduction with a 3% area overhead compared to conventional architecture in large control-instensive benchmarks
Binary object recognition system on FPGA with bSOM
Tri-state Self Organizing Map (bSOM), which takes binary inputs and maintains tri-state weights, has been used for classification rather than clustering in this paper. The major contribution here is the demonstration of the potential use of the modified bSOM in security surveillance, as a recognition system on FPGA
A binary self-organizing map and its FPGA implementation
A binary Self Organizing Map (SOM) has been designed and
implemented on a Field Programmable Gate Array (FPGA) chip. A novel learning algorithm which takes binary inputs and maintains tri-state weights is presented. The binary SOM has the capability of recognizing binary input sequences after training. A novel tri-state rule is used in updating the network weights during the training phase. The rule implementation is highly suited to the FPGA architecture, and allows extremely rapid training. This architecture may be used in real-time for fast pattern clustering and classification of the binary features
Seven strategies for tolerating highly defective fabrication
In this article we present an architecture that supports fine-grained sparing and resource matching. The base logic structure is a set of interconnected PLAs. The PLAs and their interconnections consist of large arrays of interchangeable nanowires, which serve as programmable product and sum terms and as programmable interconnect links. Each nanowire can have several defective programmable junctions. We can test nanowires for functionality and use only the subset that provides appropriate conductivity and electrical characteristics. We then perform a matching between nanowire junction programmability and application logic needs to use almost all the nanowires even though most of them have defective junctions. We employ seven high-level strategies to achieve this level of defect tolerance
Accelerated hardware video object segmentation: From foreground detection to connected components labelling
This is the preprint version of the Article - Copyright @ 2010 ElsevierThis paper demonstrates the use of a single-chip FPGA for the segmentation of moving objects in a video sequence. The system maintains highly accurate background models, and integrates the detection of foreground pixels with the labelling of objects using a connected components algorithm. The background models are based on 24-bit RGB values and 8-bit gray scale intensity values. A multimodal background differencing algorithm is presented, using a single FPGA chip and four blocks of RAM. The real-time connected component labelling algorithm, also designed for FPGA implementation, run-length encodes the output of the background subtraction, and performs connected component analysis on this representation. The run-length encoding, together with other parts of the algorithm, is performed in parallel; sequential operations are minimized as the number of run-lengths are typically less than the number of pixels. The two algorithms are pipelined together for maximum efficiency
FPGA-based Anomalous trajectory detection using SOFM
A system for automatically classifying the trajectory of a moving object in a scene as usual or suspicious is presented. The system uses an unsupervised neural network (Self Organising Feature Map) fully implemented on a reconfigurable hardware architecture (Field Programmable Gate Array) to cluster trajectories acquired over a period, in order to detect novel ones. First order motion information, including first order moving average smoothing, is generated from the 2D image coordinates (trajectories). The classification is dynamic and achieved in real-time. The dynamic classifier is achieved using a SOFM and a probabilistic model. Experimental results show less than 15\% classification error, showing the robustness of our approach over others in literature and the speed-up over the use of conventional microprocessor as compared to the use of an off-the-shelf FPGA prototyping board
A FPGA-based architecture for real-time cluster finding in the LHCb silicon pixel detector
The data acquisition system of the LHCb experiment has been substantially
upgraded for the LHC Run 3, with the unprecedented capability of reading out
and fully reconstructing all proton–proton collisions in real time, occurring
with an average rate of 30 MHz, for a total data flow of approximately
32 Tb/s. The high demand of computing power required by this task has
motivated a transition to a hybrid heterogeneous computing architecture,
where a farm of graphics cores, GPUs, is used in addition to general–purpose
processors, CPUs, to speed up the execution of reconstruction algorithms. In
a continuing effort to improve real–time processing capabilities of this new
DAQ system, also with a view to further luminosity increases in the future,
low–level, highly–parallelizable tasks are increasingly being addressed at the
earliest stages of the data acquisition chain, using special–purpose computing
accelerators. A promising solution is offered by custom–programmable FPGA
devices, that are well suited to perform high–volume computations with
high throughput and degree of parallelism, limited power consumption and
latency. In this context, a two–dimensional FPGA–friendly cluster–finder
algorithm has been developed to reconstruct hit positions in the new vertex
pixel detector (VELO) of the LHCb Upgrade experiment. The associated
firmware architecture, implemented in VHDL language, has been integrated
within the VELO readout, without the need for extra cards, as a further
enhancement of the DAQ system. This pre–processing allows the first level
of the software trigger to accept a 11% higher rate of events, as the ready–
made hit coordinates accelerate the track reconstruction, while leading to a
drop in electrical power consumption, as the FPGA implementation requires
O(50x) less power than the GPU one. The tracking performance of this novel
system, being indistinguishable from a full–fledged software implementation,
allows the raw pixel data to be dropped immediately at the readout level,
yielding the additional benefit of a 14% reduction in data flow. The clustering
architecture has been commissioned during the start of LHCb Run 3 and it
currently runs in real time during physics data taking, reconstructing VELO
hit coordinates on–the–fly at the LHC collision rate
- …