978 research outputs found

    Parameterized Implementation of K-means Clustering on Reconfigurable Systems

    Get PDF
    Processing power of pattern classification algorithms on conventional platforms has not been able to keep up with exponentially growing datasets. However, algorithms such as k-means clustering include significant potential parallelism that could be exploited to enhance processing speed on conventional platforms. A better and effective solution to speed-up the algorithm performance is the use of a hardware assist since parallel kernels can be partitioned and concurrently run on hardware as opposed to the sequential software flow. A parameterized hardware implementation of k-means clustering is presented as a proof of concept on the Pilchard Reconfigurable computing system. The hardware implementation is shown to have speedups of about 500 over conventional implementations on a general-purpose processor. A scalability analysis is done to provide a future direction to take the current implementation of 3 classes and scale it to over N classes

    Dynamically and partially reconfigurable hardware architectures for high performance microarray bioinformatics data analysis

    Get PDF
    The field of Bioinformatics and Computational Biology (BCB) is a multidisciplinary field that has emerged due to the computational demands of current state-of-the-art biotechnology. BCB deals with the storage, organization, retrieval, and analysis of biological datasets, which have grown in size and complexity in recent years especially after the completion of the human genome project. The advent of Microarray technology in the 1990s has resulted in the new concept of high throughput experiment, which is a biotechnology that measures the gene expression profiles of thousands of genes simultaneously. As such, Microarray requires high computational power to extract the biological relevance from its high dimensional data. Current general purpose processors (GPPs) has been unable to keep-up with the increasing computational demands of Microarrays and reached a limit in terms of clock speed. Consequently, Field Programmable Gate Arrays (FPGAs) have been proposed as a low power viable solution to overcome the computational limitations of GPPs and other methods. The research presented in this thesis harnesses current state-of-the-art FPGAs and tools to accelerate some of the most widely used data mining methods used for the analysis of Microarray data in an effort to investigate the viability of the technology as an efficient, low power, and economic solution for the analysis of Microarray data. Three widely used methods have been selected for the FPGA implementations: one is the un-supervised Kmeans clustering algorithm, while the other two are supervised classification methods, namely, the K-Nearest Neighbour (K-NN) and Support Vector Machines (SVM). These methods are thought to benefit from parallel implementation. This thesis presents detailed designs and implementations of these three BCB applications on FPGA captured in Verilog HDL, whose performance are compared with equivalent implementations running on GPPs. In addition to acceleration, the benefits of current dynamic partial reconfiguration (DPR) capability of modern Xilinx’ FPGAs are investigated with reference to the aforementioned data mining methods. Implementing K-means clustering on FPGA using non-DPR design flow has outperformed equivalent implementations in GPP and GPU in terms of speed-up by two orders and one order of magnitude, respectively; while being eight times more power efficient than GPP and four times more than a GPU implementation. As for the energy efficiency, the FPGA implementation was 615 times more energy efficient than GPPs, and 31 times more than GPUs. Over and above, the FPGA implementation outperformed the GPP and GPU implementations in terms of speed-up as the dimensionality of the Microarray data increases. Additionally, the DPR implementations of the K-means clustering have shown speed-up in partial reconfiguration time of ~5x and 17x over full chip reconfiguration for single-core and eight-core implementations, respectively. Two architectures of the K-NN classifier have been implemented on FPGA, namely, A1 and A2. The K-NN implementation based on A1 architecture achieved a speed-up of ~76x over an equivalent GPP implementation whereas the A2 architecture achieved ~68x speedup. Furthermore, the FPGA implementation outperformed the equivalent GPP implementation when the dimensionality of data was increased. In addition, The DPR implementations of the K-NN classifier have achieved speed-ups in reconfiguration time between ~4x to 10x over full chip reconfiguration when reconfiguring portion of the classifier or the complete classifier. Similar to K-NN, two architectures of the SVM classifier were implemented on FPGA whereby the former outperformed an equivalent GPP implementation by ~61x and the latter by ~49x. As for the DPR implementation of the SVM classifier, it has shown a speed-up of ~8x in reconfiguration time when reconfiguring the complete core or when exchanging it with a K-NN core forming a multi-classifier. The aforementioned implementations clearly show FPGAs to be an efficacious, efficient and economic solution for bioinformatics Microarrays data analysis

    Implementation of genetic algorithm based fuzzy logic controller with automatic rule extraction in FPGA

    Get PDF
    A number of fuzzy logic controllers are being designed till now to replace complex, non-linear and huge controlling equipment in numerous industrial sectors. But the designing of these controllers requires thorough knowledge about the controlled process. For this purpose a highly experienced experts are required, which is not feasible all the time. Most of these processes are non-linear and depend on large number of parameters. Thus mathematical representation of these systems is an arduous line of work. This project addresses these problems by proposing using of genetic algorithm based Fuzzy Logic systems as controllers. The system includes algorithms which are run on a capable computing platform, to read an experimental data sheet obtained from experimental observations of the system and generate a fine tuned rule base that is to be used in the fuzzy logic controller hardware. The hardware is implemented in an FPGA. Transfer of synthesized rule base from the computer to the FPGA implementation and crisp output value back to the computer is done by UART. A graphical user interface is provided that runs on the computer

    A Model-based Design Framework for Application-specific Heterogeneous Systems

    Get PDF
    The increasing heterogeneity of computing systems enables higher performance and power efficiency. However, these improvements come at the cost of increasing the overall complexity of designing such systems. These complexities include constructing implementations for various types of processors, setting up and configuring communication protocols, and efficiently scheduling the computational work. The process for developing such systems is iterative and time consuming, with no well-defined performance goal. Current performance estimation approaches use source code implementations that require experienced developers and time to produce. We present a framework to aid in the design of heterogeneous systems and the performance tuning of applications. Our framework supports system construction: integrating custom hardware accelerators with existing cores into processors, integrating processors into cohesive systems, and mapping computations to processors to achieve overall application performance and efficient hardware usage. It also facilitates effective design space exploration using processor models (for both existing and future processors) that do not require source code implementations to estimate performance. We evaluate our framework using a variety of applications and implement them in systems ranging from low power embedded systems-on-chip (SoC) to high performance systems consisting of commercial-off-the-shelf (COTS) components. We show how the design process is improved, reducing the number of design iterations and unnecessary source code development ultimately leading to higher performing efficient systems

    A Study on Efficient Designs of Approximate Arithmetic Circuits

    Get PDF
    Approximate computing is a popular field where accuracy is traded with energy. It can benefit applications such as multimedia, mobile computing and machine learning which are inherently error resilient. Error introduced in these applications to a certain degree is beyond human perception. This flexibility can be exploited to design area, delay and power efficient architectures. However, care must be taken on how approximation compromises the correctness of results. This research work aims to provide approximate hardware architectures with error metrics and design metrics analyzed and their effects in image processing applications. Firstly, we study and propose unsigned array multipliers based on probability statistics and with approximate 4-2 compressors, full adders and half adders. This work deals with a new design approach for approximation of multipliers. The partial products of the multiplier are altered to introduce varying probability terms. Logic complexity of approximation is varied for the accumulation of altered partial products based on their probability. The proposed approximation is utilized in two variants of 16-bit multipliers. Synthesis results reveal that two proposed multipliers achieve power savings of 72% and 38% respectively compared to an exact multiplier. They have better precision when compared to existing approximate multipliers. Mean relative error distance (MRED) figures are as low as 7.6% and 0.02% for the proposed approximate multipliers, which are better than the previous state-of-the-art works. Performance of the proposed multipliers is evaluated with geometric mean filtering application, where one of the proposed models achieves the highest peak signal to noise ratio (PSNR). Second, approximation is proposed for signed Booth multiplication. Approximation is introduced in partial product generation and partial product accumulation circuits. In this work, three multipliers (ABM-M1, ABM-M2, and ABM-M3) are proposed in which the modified Booth algorithm is approximated. In all three designs, approximate Booth partial product generators are designed with different variations of approximation. The approximations are performed by reducing the logic complexity of the Booth partial product generator, and the accumulation of partial products is slightly modified to improve circuit performance. Compared to the exact Booth multiplier, ABM-M1 achieves up to 15% reduction in power consumption with an MRED value of 7.9 × 10-4. ABM-M2 has power savings of up to 60% with an MRED of 1.1 × 10-1. ABM-M3 has power savings of up to 50% with an MRED of 3.4 × 10-3. Compared to existing approximate Booth multipliers, the proposed multipliers ABM-M1 and ABM-M3 achieve up to a 41% reduction in power consumption while exhibiting very similar error metrics. Image multiplication and matrix multiplication are used as case studies to illustrate the high performance of the proposed approximate multipliers. Third, distributed arithmetic based sum of products units approximation is analyzed. Sum of products units are key elements in many digital signal processing applications. Three approximate sum of products models which are based on distributed arithmetic are proposed. They are designed for different levels of accuracy. First model of approximate sum of products achieves an improvement up to 64% on area and 70% on power, when compared to conventional unit. Other two models provide an improvement of 32% and 48% on area and 54% and 58% on power, respectively, with a reduced error rate compared to the first model. Third model achieves MRED and normalized mean error distance (NMED) as low as 0.05% and 0.009%. Performance of approximate units is evaluated with a noisy image smoothing application, where the proposed models are capable of achieving higher PSNR than existing state of the art techniques. Fourth, approximation is applied in division architecture. Two approximation models are proposed for restoring divider. In the first design, approximation is performed at circuit level, where approximate divider cells are utilized in place of exact ones by simplifying the logic equations. In the second model, restoring divider is analyzed strategically and number of restoring divider cells are reduced by finding the portions of divisor and dividend with significant information. An approximation factor pp is used in both designs. In model 1, the design with p=8 has a 58% reduction in both area and power consumption compared to exact design, with a Q-MRED of 1.909 × 10-2 and Q-NMED of 0.449 × 10-2. The second model with an approximation factor p=4 has 54% area savings and 62% power savings compared to exact design. The proposed models are found to have better error metrics compared to existing designs, with better performance at similar error values. A change detection image processing application is used for real time assessment of proposed and existing approximate dividers and one of the models achieves a PSNR of 54.27 dB

    Development of FPGA based Standalone Tunable Fuzzy Logic Controllers

    Get PDF
    Soft computing techniques differ from conventional (hard) computing, in that unlike hard computing, it is tolerant of imprecision, uncertainty, partial truth, and approximation. In effect, the role model for soft computing is the human mind and its ability to address day-to-day problems. The principal constituents of Soft Computing (SC) are Fuzzy Logic (FL), Evolutionary Computation (EC), Machine Learning (ML) and Artificial Neural Networks (ANNs). This thesis presents a generic hardware architecture for type-I and type-II standalone tunable Fuzzy Logic Controllers (FLCs) in Field Programmable Gate Array (FPGA). The designed FLC system can be remotely configured or tuned according to expert operated knowledge and deployed in different applications to replace traditional Proportional Integral Derivative (PID) controllers. This re-configurability is added as a feature to existing FLCs in literature. The FLC parameters which are needed for tuning purpose are mainly input range, output range, number of inputs, number of outputs, the parameters of the membership functions like slope and center points, and an If-Else rule base for the fuzzy inference process. Online tuning enables users to change these FLC parameters in real-time and eliminate repeated hardware programming whenever there is a need to change. Realization of these systems in real-time is difficult as the computational complexity increases exponentially with an increase in the number of inputs. Hence, the challenge lies in reducing the rule base significantly such that the inference time and the throughput time is perceivable for real-time applications. To achieve these objectives, Modified Rule Active 2 Overlap Membership Function (MRA2-OMF), Modified Rule Active 3 Overlap Membership Function (MRA3-OMF), Modified Rule Active 4 Overlap Membership Function (MRA4-OMF), and Genetic Algorithm (GA) base rule optimization methods are proposed and implemented. These methods reduce the effective rules without compromising system accuracy and improve the cycle time in terms of Fuzzy Logic Inferences Per Second (FLIPS). In the proposed system architecture, the FLC is segmented into three independent modules, fuzzifier, inference engine with rule base, and defuzzifier. Fuzzy systems employ fuzzifier to convert the real world crisp input into the fuzzy output. In type 2 fuzzy systems there are two fuzzifications happen simultaneously from upper and lower membership functions (UMF and LMF) with subtractions and divisions. Non-restoring, very high radix, and newton raphson approximation are most widely used division algorithms in hardware implementations. However, these prevalent methods have a cost of more latency. In order to overcome this problem, a successive approximation division algorithm based type 2 fuzzifier is introduced. It has been observed that successive approximation based fuzzifier computation is faster than the other type 2 fuzzifier. A hardware-software co-design is established on Virtex 5 LX110T FPGA board. The MATLAB Graphical User Interface (GUI) acquires the fuzzy (type 1 or type 2) parameters from users and a Universal Asynchronous Receiver/Transmitter (UART) is dedicated to data communication between the hardware and the fuzzy toolbox. This GUI is provided to initiate control, input, rule transfer, and then to observe the crisp output on the computer. A proposed method which can support canonical fuzzy IF-THEN rules, which includes special cases of the fuzzy rule base is included in Digital Fuzzy Logic Controller (DFLC) architecture. For this purpose, a mealy state machine is incorporated into the design. The proposed FLCs are implemented on Xilinx Virtex-5 LX110T. DFLC peripheral integration with Micro-Blaze (MB) processor through Processor Logic Bus (PLB) is established for Intellectual Property (IP) core validation. The performance of the proposed systems are compared to Fuzzy Toolbox of MATLAB. Analysis of these designs is carried out by using Hardware-In-Loop (HIL) test to control various plant models in MATLAB/Simulink environments
    corecore