1,287 research outputs found

    Parameterized Implementation of K-means Clustering on Reconfigurable Systems

    Get PDF
    Processing power of pattern classification algorithms on conventional platforms has not been able to keep up with exponentially growing datasets. However, algorithms such as k-means clustering include significant potential parallelism that could be exploited to enhance processing speed on conventional platforms. A better and effective solution to speed-up the algorithm performance is the use of a hardware assist since parallel kernels can be partitioned and concurrently run on hardware as opposed to the sequential software flow. A parameterized hardware implementation of k-means clustering is presented as a proof of concept on the Pilchard Reconfigurable computing system. The hardware implementation is shown to have speedups of about 500 over conventional implementations on a general-purpose processor. A scalability analysis is done to provide a future direction to take the current implementation of 3 classes and scale it to over N classes

    Fault and Defect Tolerant Computer Architectures: Reliable Computing With Unreliable Devices

    Get PDF
    This research addresses design of a reliable computer from unreliable device technologies. A system architecture is developed for a fault and defect tolerant (FDT) computer. Trade-offs between different techniques are studied and yield and hardware cost models are developed. Fault and defect tolerant designs are created for the processor and the cache memory. Simulation results for the content-addressable memory (CAM)-based cache show 90% yield with device failure probabilities of 3 x 10(-6), three orders of magnitude better than non fault tolerant caches of the same size. The entire processor achieves 70% yield with device failure probabilities exceeding 10(-6). The required hardware redundancy is approximately 15 times that of a non-fault tolerant design. While larger than current FT designs, this architecture allows the use of devices much more likely to fail than silicon CMOS. As part of model development, an improved model is derived for NAND Multiplexing. The model is the first accurate model for small and medium amounts of redundancy. Previous models are extended to account for dependence between the inputs and produce more accurate results

    Mapping Framework for Heterogeneous Reconfigurable Architectures:Combining Temporal Partitioning and Multiprocessor Scheduling

    Get PDF

    Development of FPGA based Standalone Tunable Fuzzy Logic Controllers

    Get PDF
    Soft computing techniques differ from conventional (hard) computing, in that unlike hard computing, it is tolerant of imprecision, uncertainty, partial truth, and approximation. In effect, the role model for soft computing is the human mind and its ability to address day-to-day problems. The principal constituents of Soft Computing (SC) are Fuzzy Logic (FL), Evolutionary Computation (EC), Machine Learning (ML) and Artificial Neural Networks (ANNs). This thesis presents a generic hardware architecture for type-I and type-II standalone tunable Fuzzy Logic Controllers (FLCs) in Field Programmable Gate Array (FPGA). The designed FLC system can be remotely configured or tuned according to expert operated knowledge and deployed in different applications to replace traditional Proportional Integral Derivative (PID) controllers. This re-configurability is added as a feature to existing FLCs in literature. The FLC parameters which are needed for tuning purpose are mainly input range, output range, number of inputs, number of outputs, the parameters of the membership functions like slope and center points, and an If-Else rule base for the fuzzy inference process. Online tuning enables users to change these FLC parameters in real-time and eliminate repeated hardware programming whenever there is a need to change. Realization of these systems in real-time is difficult as the computational complexity increases exponentially with an increase in the number of inputs. Hence, the challenge lies in reducing the rule base significantly such that the inference time and the throughput time is perceivable for real-time applications. To achieve these objectives, Modified Rule Active 2 Overlap Membership Function (MRA2-OMF), Modified Rule Active 3 Overlap Membership Function (MRA3-OMF), Modified Rule Active 4 Overlap Membership Function (MRA4-OMF), and Genetic Algorithm (GA) base rule optimization methods are proposed and implemented. These methods reduce the effective rules without compromising system accuracy and improve the cycle time in terms of Fuzzy Logic Inferences Per Second (FLIPS). In the proposed system architecture, the FLC is segmented into three independent modules, fuzzifier, inference engine with rule base, and defuzzifier. Fuzzy systems employ fuzzifier to convert the real world crisp input into the fuzzy output. In type 2 fuzzy systems there are two fuzzifications happen simultaneously from upper and lower membership functions (UMF and LMF) with subtractions and divisions. Non-restoring, very high radix, and newton raphson approximation are most widely used division algorithms in hardware implementations. However, these prevalent methods have a cost of more latency. In order to overcome this problem, a successive approximation division algorithm based type 2 fuzzifier is introduced. It has been observed that successive approximation based fuzzifier computation is faster than the other type 2 fuzzifier. A hardware-software co-design is established on Virtex 5 LX110T FPGA board. The MATLAB Graphical User Interface (GUI) acquires the fuzzy (type 1 or type 2) parameters from users and a Universal Asynchronous Receiver/Transmitter (UART) is dedicated to data communication between the hardware and the fuzzy toolbox. This GUI is provided to initiate control, input, rule transfer, and then to observe the crisp output on the computer. A proposed method which can support canonical fuzzy IF-THEN rules, which includes special cases of the fuzzy rule base is included in Digital Fuzzy Logic Controller (DFLC) architecture. For this purpose, a mealy state machine is incorporated into the design. The proposed FLCs are implemented on Xilinx Virtex-5 LX110T. DFLC peripheral integration with Micro-Blaze (MB) processor through Processor Logic Bus (PLB) is established for Intellectual Property (IP) core validation. The performance of the proposed systems are compared to Fuzzy Toolbox of MATLAB. Analysis of these designs is carried out by using Hardware-In-Loop (HIL) test to control various plant models in MATLAB/Simulink environments

    Hybrid FPGA: Architecture and Interface

    No full text
    Hybrid FPGAs (Field Programmable Gate Arrays) are composed of general-purpose logic resources with different granularities, together with domain-specific coarse-grained units. This thesis proposes a novel hybrid FPGA architecture with embedded coarse-grained Floating Point Units (FPUs) to improve the floating point capability of FPGAs. Based on the proposed hybrid FPGA architecture, we examine three aspects to optimise the speed and area for domain-specific applications. First, we examine the interface between large coarse-grained embedded blocks (EBs) and fine-grained elements in hybrid FPGAs. The interface includes parameters for varying: (1) aspect ratio of EBs, (2) position of the EBs in the FPGA, (3) I/O pins arrangement of EBs, (4) interconnect flexibility of EBs, and (5) location of additional embedded elements such as memory. Second, we examine the interconnect structure for hybrid FPGAs. We investigate how large and highdensity EBs affect the routing demand for hybrid FPGAs over a set of domain-specific applications. We then propose three routing optimisation methods to meet the additional routing demand introduced by large EBs: (1) identifying the best separation distance between EBs, (2) adding routing switches on EBs to increase routing flexibility, and (3) introducing wider channel width near the edge of EBs. We study and compare the trade-offs in delay, area and routability of these three optimisation methods. Finally, we employ common subgraph extraction to determine the number of floating point adders/subtractors, multipliers and wordblocks in the FPUs. The wordblocks include registers and can implement fixed point operations. We study the area, speed and utilisation trade-offs of the selected FPU subgraphs in a set of floating point benchmark circuits. We develop an optimised coarse-grained FPU, taking into account both architectural and system-level issues. Furthermore, we investigate the trade-offs between granularities and performance by composing small FPUs into a large FPU. The results of this thesis would help design a domain-specific hybrid FPGA to meet user requirements, by optimising for speed, area or a combination of speed and area

    Accelerated hardware video object segmentation: From foreground detection to connected components labelling

    Get PDF
    This is the preprint version of the Article - Copyright @ 2010 ElsevierThis paper demonstrates the use of a single-chip FPGA for the segmentation of moving objects in a video sequence. The system maintains highly accurate background models, and integrates the detection of foreground pixels with the labelling of objects using a connected components algorithm. The background models are based on 24-bit RGB values and 8-bit gray scale intensity values. A multimodal background differencing algorithm is presented, using a single FPGA chip and four blocks of RAM. The real-time connected component labelling algorithm, also designed for FPGA implementation, run-length encodes the output of the background subtraction, and performs connected component analysis on this representation. The run-length encoding, together with other parts of the algorithm, is performed in parallel; sequential operations are minimized as the number of run-lengths are typically less than the number of pixels. The two algorithms are pipelined together for maximum efficiency
    corecore