74 research outputs found

    Dynamically and partially reconfigurable hardware architectures for high performance microarray bioinformatics data analysis

    Get PDF
    The field of Bioinformatics and Computational Biology (BCB) is a multidisciplinary field that has emerged due to the computational demands of current state-of-the-art biotechnology. BCB deals with the storage, organization, retrieval, and analysis of biological datasets, which have grown in size and complexity in recent years especially after the completion of the human genome project. The advent of Microarray technology in the 1990s has resulted in the new concept of high throughput experiment, which is a biotechnology that measures the gene expression profiles of thousands of genes simultaneously. As such, Microarray requires high computational power to extract the biological relevance from its high dimensional data. Current general purpose processors (GPPs) has been unable to keep-up with the increasing computational demands of Microarrays and reached a limit in terms of clock speed. Consequently, Field Programmable Gate Arrays (FPGAs) have been proposed as a low power viable solution to overcome the computational limitations of GPPs and other methods. The research presented in this thesis harnesses current state-of-the-art FPGAs and tools to accelerate some of the most widely used data mining methods used for the analysis of Microarray data in an effort to investigate the viability of the technology as an efficient, low power, and economic solution for the analysis of Microarray data. Three widely used methods have been selected for the FPGA implementations: one is the un-supervised Kmeans clustering algorithm, while the other two are supervised classification methods, namely, the K-Nearest Neighbour (K-NN) and Support Vector Machines (SVM). These methods are thought to benefit from parallel implementation. This thesis presents detailed designs and implementations of these three BCB applications on FPGA captured in Verilog HDL, whose performance are compared with equivalent implementations running on GPPs. In addition to acceleration, the benefits of current dynamic partial reconfiguration (DPR) capability of modern Xilinx’ FPGAs are investigated with reference to the aforementioned data mining methods. Implementing K-means clustering on FPGA using non-DPR design flow has outperformed equivalent implementations in GPP and GPU in terms of speed-up by two orders and one order of magnitude, respectively; while being eight times more power efficient than GPP and four times more than a GPU implementation. As for the energy efficiency, the FPGA implementation was 615 times more energy efficient than GPPs, and 31 times more than GPUs. Over and above, the FPGA implementation outperformed the GPP and GPU implementations in terms of speed-up as the dimensionality of the Microarray data increases. Additionally, the DPR implementations of the K-means clustering have shown speed-up in partial reconfiguration time of ~5x and 17x over full chip reconfiguration for single-core and eight-core implementations, respectively. Two architectures of the K-NN classifier have been implemented on FPGA, namely, A1 and A2. The K-NN implementation based on A1 architecture achieved a speed-up of ~76x over an equivalent GPP implementation whereas the A2 architecture achieved ~68x speedup. Furthermore, the FPGA implementation outperformed the equivalent GPP implementation when the dimensionality of data was increased. In addition, The DPR implementations of the K-NN classifier have achieved speed-ups in reconfiguration time between ~4x to 10x over full chip reconfiguration when reconfiguring portion of the classifier or the complete classifier. Similar to K-NN, two architectures of the SVM classifier were implemented on FPGA whereby the former outperformed an equivalent GPP implementation by ~61x and the latter by ~49x. As for the DPR implementation of the SVM classifier, it has shown a speed-up of ~8x in reconfiguration time when reconfiguring the complete core or when exchanging it with a K-NN core forming a multi-classifier. The aforementioned implementations clearly show FPGAs to be an efficacious, efficient and economic solution for bioinformatics Microarrays data analysis

    Towards the development of a reliable reconfigurable real-time operating system on FPGAs

    Get PDF
    In the last two decades, Field Programmable Gate Arrays (FPGAs) have been rapidly developed from simple “glue-logic” to a powerful platform capable of implementing a System on Chip (SoC). Modern FPGAs achieve not only the high performance compared with General Purpose Processors (GPPs), thanks to hardware parallelism and dedication, but also better programming flexibility, in comparison to Application Specific Integrated Circuits (ASICs). Moreover, the hardware programming flexibility of FPGAs is further harnessed for both performance and manipulability, which makes Dynamic Partial Reconfiguration (DPR) possible. DPR allows a part or parts of a circuit to be reconfigured at run-time, without interrupting the rest of the chip’s operation. As a result, hardware resources can be more efficiently exploited since the chip resources can be reused by swapping in or out hardware tasks to or from the chip in a time-multiplexed fashion. In addition, DPR improves fault tolerance against transient errors and permanent damage, such as Single Event Upsets (SEUs) can be mitigated by reconfiguring the FPGA to avoid error accumulation. Furthermore, power and heat can be reduced by removing finished or idle tasks from the chip. For all these reasons above, DPR has significantly promoted Reconfigurable Computing (RC) and has become a very hot topic. However, since hardware integration is increasing at an exponential rate, and applications are becoming more complex with the growth of user demands, highlevel application design and low-level hardware implementation are increasingly separated and layered. As a consequence, users can obtain little advantage from DPR without the support of system-level middleware. To bridge the gap between the high-level application and the low-level hardware implementation, this thesis presents the important contributions towards a Reliable, Reconfigurable and Real-Time Operating System (R3TOS), which facilitates the user exploitation of DPR from the application level, by managing the complex hardware in the background. In R3TOS, hardware tasks behave just like software tasks, which can be created, scheduled, and mapped to different computing resources on the fly. The novel contributions of this work are: 1) a novel implementation of an efficient task scheduler and allocator; 2) implementation of a novel real-time scheduling algorithm (FAEDF) and two efficacious allocating algorithms (EAC and EVC), which schedule tasks in real-time and circumvent emerging faults while maintaining more compact empty areas. 3) Design and implementation of a faulttolerant microprocessor by harnessing the existing FPGA resources, such as Error Correction Code (ECC) and configuration primitives. 4) A novel symmetric multiprocessing (SMP)-based architectures that supports shared memory programing interface. 5) Two demonstrations of the integrated system, including a) the K-Nearest Neighbour classifier, which is a non-parametric classification algorithm widely used in various fields of data mining; and b) pairwise sequence alignment, namely the Smith Waterman algorithm, used for identifying similarities between two biological sequences. R3TOS gives considerably higher flexibility to support scalable multi-user, multitasking applications, whereby resources can be dynamically managed in respect of user requirements and hardware availability. Benefiting from this, not only the hardware resources can be more efficiently used, but also the system performance can be significantly increased. Results show that the scheduling and allocating efficiencies have been improved up to 2x, and the overall system performance is further improved by ~2.5x. Future work includes the development of Network on Chip (NoC), which is expected to further increase the communication throughput; as well as the standardization and automation of our system design, which will be carried out in line with the enablement of other high-level synthesis tools, to allow application developers to benefit from the system in a more efficient manner

    Embedded Machine Learning: Emphasis on Hardware Accelerators and Approximate Computing for Tactile Data Processing

    Get PDF
    Machine Learning (ML) a subset of Artificial Intelligence (AI) is driving the industrial and technological revolution of the present and future. We envision a world with smart devices that are able to mimic human behavior (sense, process, and act) and perform tasks that at one time we thought could only be carried out by humans. The vision is to achieve such a level of intelligence with affordable, power-efficient, and fast hardware platforms. However, embedding machine learning algorithms in many application domains such as the internet of things (IoT), prostheses, robotics, and wearable devices is an ongoing challenge. A challenge that is controlled by the computational complexity of ML algorithms, the performance/availability of hardware platforms, and the application\u2019s budget (power constraint, real-time operation, etc.). In this dissertation, we focus on the design and implementation of efficient ML algorithms to handle the aforementioned challenges. First, we apply Approximate Computing Techniques (ACTs) to reduce the computational complexity of ML algorithms. Then, we design custom Hardware Accelerators to improve the performance of the implementation within a specified budget. Finally, a tactile data processing application is adopted for the validation of the proposed exact and approximate embedded machine learning accelerators. The dissertation starts with the introduction of the various ML algorithms used for tactile data processing. These algorithms are assessed in terms of their computational complexity and the available hardware platforms which could be used for implementation. Afterward, a survey on the existing approximate computing techniques and hardware accelerators design methodologies is presented. Based on the findings of the survey, an approach for applying algorithmic-level ACTs on machine learning algorithms is provided. Then three novel hardware accelerators are proposed: (1) k-Nearest Neighbor (kNN) based on a selection-based sorter, (2) Tensorial Support Vector Machine (TSVM) based on Shallow Neural Networks, and (3) Hybrid Precision Binary Convolution Neural Network (BCNN). The three accelerators offer a real-time classification with monumental reductions in the hardware resources and power consumption compared to existing implementations targeting the same tactile data processing application on FPGA. Moreover, the approximate accelerators maintain a high classification accuracy with a loss of at most 5%

    Modelling and characterisation of distributed hardware acceleration

    Get PDF
    Hardware acceleration has become more commonly utilised in networked computing systems. The growing complexity of applications mean that traditional CPU architectures can no longer meet stringent latency constraints. Alternative computing architectures such as GPUs and FPGAs are increasingly available, along with simpler, more software-like development flows. The work presented in this thesis characterises the overheads associated with these accelerator architectures. A holistic view encompassing both computation and communication latency must be considered. Experimental results obtained through this work show that networkattached accelerators scale better than server-hosted deployments, and that host ingestion overheads are comparable to network traversal times in some cases. Along with the choice of processing platforms, it is becoming more important to consider how workloads are partitioned and where in the network tasks are being performed. Manual allocation and evaluation of tasks to network nodes does not scale with network and workload complexity. A mathematical formulation of this problem is presented within this thesis that takes into account all relevant performance metrics. Unlike other works, this model takes into account growing hardware heterogeneity and workload complexity, and is generalisable to a range of scenarios. This model can be used in an optimisation that generates lower cost results with latency performance close to theoretical maximums compared to naive placement approaches. With the mathematical formulation and experimental results that characterise hardware accelerator overheads, the work presented in this thesis can be used to make informed design decisions about both where to allocate tasks and deploy accelerators in the network, and the associated costs

    Massively-parallel and concurrent SVM architectures

    Get PDF
    This work presents several Support Vector Machine (SVM) architectures developed by the Author with the intent of exploiting the inherent parallel structures and potential- concurrency underpinning the SVM’s mathematical operation. Two SVM training sub- system prototypes are presented - a brute-force search classification training architecture, and, Artificial Neural Network (ANN)-mapped optimisation architectures for both SVM classification training and SVM regression training. This work also proposes and proto- types a set of parallelised SVM Digital Signal Processor (DSP) pipeline architectures. The parallelised SVM DSP pipeline architectures have been modelled in C and implemented in VHDL for the synthesis and fitting on an Altera Stratix V FPGA. Each system pre- sented in this work has been applied to a problem domain application appropriate to the SVM system’s architectural limitations - including the novel application of the SVM as a chaotic and non-linear system parameter-identification tool. The SVM brute-force search classification training architecture has been modelled for datasets of 2 dimensions and composed of linear and non-linear problems requiring only 4 support vectors by utilising the linear kernel and the polynomial kernel respectively. The system has been implemented in Matlab and non-exhaustively verified using the holdout method with a trivial linearly separable classification problem dataset and a trivial non- linear XOR classification problem dataset. While the architecture was a feasible design for software-based implementations targeting 2-dimensional datasets the architectural com- plexity and unmanageable number of parallelisable operations introduced by increasing data-dimensionality and the number of support vectors subsequently resulted in the Au- thor pursuing different parallelised-architecture strategies. Two distinct ANN-mapped optimisation strategies developed and proposed for SVM classification training and SVM regression training have been modelled in Matlab; the architectures have been designed such that any dimensionality dataset can be applied by configuring the appropriate dimensionality and support vector parameters. Through Monte-Carlo testing using the datasets examined in this work the gain parameters in- herent in the architectural design of the systems were found to be difficult to tune, and, system convergence to acceptable sets of training support vectors were unachieved. The ANN-mapped optimisation strategies were thus deemed inappropriate for SVM training with the applied datasets without more design effort and architectural modification work. The parallelised SVM DSP pipeline architecture prototypes data-set dimensionality, sup- port vector set counts, and latency ranges follow. In each case the Field Programmable Gate Array (FPGA) pipeline prototype latency unsurprisingly outclassed the correspond- ing C-software model execution times by at least 3 orders of magnitude. The SVM classi- fication training DSP pipeline FPGA prototypes are compatible with data-sets spanning 2 to 8 dimensions, support vector sets of up to 16 support vectors, and have a pipeline latency range spanning from a minimum of 0.18 microseconds to a maximum of 0.28 mi- croseconds. The SVM classification function evaluation DSP pipeline FPGA prototypes are compatible with data-sets spanning 2 to 8 dimensions, support vector sets of up to 32 support vectors, and have a pipeline latency range spanning from a minimum of 0.16 microseconds to a maximum of 0.24 microseconds. The SVM regression training DSP pipeline FPGA prototypes are compatible with data-sets spanning 2 to 8 dimensions, support vector sets of up to 16 support vectors, and have a pipeline latency range span- ning from a minimum of 0.20 microseconds to a maximum of 0.30 microseconds. The SVM regression function evaluation DSP pipeline FPGA prototypes are compatible with data-sets spanning 2 to 8 dimensions, support vector sets of up to 16 support vectors, and have a pipeline latency range spanning from a minimum of 0.20 microseconds to a maximum of 0.30 microseconds. Finally, utilising LIBSVM training and the parallelised SVM DSP pipeline function eval- uation architecture prototypes, SVM classification and SVM regression was successfully applied to Rajkumar’s oil and gas pipeline fault detection and failure system legacy data- set yielding excellent results. Also utilising LIBSVM training, and, the parallelised SVM DSP pipeline function evaluation architecture prototypes, both SVM classification and SVM regression was applied to several chaotic systems as a feasibility study into the ap- plication of the SVM machine learning paradigm for chaotic and non-linear dynamical system parameter-identification. SVM classification was applied to the Lorenz Attrac- tor and an ANN-based chaotic oscillator to a reasonably acceptable degree of success. SVM classification was applied to the Mackey-Glass attractor yielding poor results. SVM regression was applied Lorenz Attractor and an ANN-based chaotic oscillator yielding av- erage but encouraging results. SVM regression was applied to the Mackey-Glass attractor yielding poor results

    Pattern Recognition

    Get PDF
    Pattern recognition is a very wide research field. It involves factors as diverse as sensors, feature extraction, pattern classification, decision fusion, applications and others. The signals processed are commonly one, two or three dimensional, the processing is done in real- time or takes hours and days, some systems look for one narrow object class, others search huge databases for entries with at least a small amount of similarity. No single person can claim expertise across the whole field, which develops rapidly, updates its paradigms and comprehends several philosophical approaches. This book reflects this diversity by presenting a selection of recent developments within the area of pattern recognition and related fields. It covers theoretical advances in classification and feature extraction as well as application-oriented works. Authors of these 25 works present and advocate recent achievements of their research related to the field of pattern recognition

    Massively-parallel and concurrent SVM architectures

    Get PDF
    This work presents several Support Vector Machine (SVM) architectures developed by the Author with the intent of exploiting the inherent parallel structures and potential- concurrency underpinning the SVM’s mathematical operation. Two SVM training sub- system prototypes are presented - a brute-force search classification training architecture, and, Artificial Neural Network (ANN)-mapped optimisation architectures for both SVM classification training and SVM regression training. This work also proposes and proto- types a set of parallelised SVM Digital Signal Processor (DSP) pipeline architectures. The parallelised SVM DSP pipeline architectures have been modelled in C and implemented in VHDL for the synthesis and fitting on an Altera Stratix V FPGA. Each system pre- sented in this work has been applied to a problem domain application appropriate to the SVM system’s architectural limitations - including the novel application of the SVM as a chaotic and non-linear system parameter-identification tool. The SVM brute-force search classification training architecture has been modelled for datasets of 2 dimensions and composed of linear and non-linear problems requiring only 4 support vectors by utilising the linear kernel and the polynomial kernel respectively. The system has been implemented in Matlab and non-exhaustively verified using the holdout method with a trivial linearly separable classification problem dataset and a trivial non- linear XOR classification problem dataset. While the architecture was a feasible design for software-based implementations targeting 2-dimensional datasets the architectural com- plexity and unmanageable number of parallelisable operations introduced by increasing data-dimensionality and the number of support vectors subsequently resulted in the Au- thor pursuing different parallelised-architecture strategies. Two distinct ANN-mapped optimisation strategies developed and proposed for SVM classification training and SVM regression training have been modelled in Matlab; the architectures have been designed such that any dimensionality dataset can be applied by configuring the appropriate dimensionality and support vector parameters. Through Monte-Carlo testing using the datasets examined in this work the gain parameters in- herent in the architectural design of the systems were found to be difficult to tune, and, system convergence to acceptable sets of training support vectors were unachieved. The ANN-mapped optimisation strategies were thus deemed inappropriate for SVM training with the applied datasets without more design effort and architectural modification work. The parallelised SVM DSP pipeline architecture prototypes data-set dimensionality, sup- port vector set counts, and latency ranges follow. In each case the Field Programmable Gate Array (FPGA) pipeline prototype latency unsurprisingly outclassed the correspond- ing C-software model execution times by at least 3 orders of magnitude. The SVM classi- fication training DSP pipeline FPGA prototypes are compatible with data-sets spanning 2 to 8 dimensions, support vector sets of up to 16 support vectors, and have a pipeline latency range spanning from a minimum of 0.18 microseconds to a maximum of 0.28 mi- croseconds. The SVM classification function evaluation DSP pipeline FPGA prototypes are compatible with data-sets spanning 2 to 8 dimensions, support vector sets of up to 32 support vectors, and have a pipeline latency range spanning from a minimum of 0.16 microseconds to a maximum of 0.24 microseconds. The SVM regression training DSP pipeline FPGA prototypes are compatible with data-sets spanning 2 to 8 dimensions, support vector sets of up to 16 support vectors, and have a pipeline latency range span- ning from a minimum of 0.20 microseconds to a maximum of 0.30 microseconds. The SVM regression function evaluation DSP pipeline FPGA prototypes are compatible with data-sets spanning 2 to 8 dimensions, support vector sets of up to 16 support vectors, and have a pipeline latency range spanning from a minimum of 0.20 microseconds to a maximum of 0.30 microseconds. Finally, utilising LIBSVM training and the parallelised SVM DSP pipeline function eval- uation architecture prototypes, SVM classification and SVM regression was successfully applied to Rajkumar’s oil and gas pipeline fault detection and failure system legacy data- set yielding excellent results. Also utilising LIBSVM training, and, the parallelised SVM DSP pipeline function evaluation architecture prototypes, both SVM classification and SVM regression was applied to several chaotic systems as a feasibility study into the ap- plication of the SVM machine learning paradigm for chaotic and non-linear dynamical system parameter-identification. SVM classification was applied to the Lorenz Attrac- tor and an ANN-based chaotic oscillator to a reasonably acceptable degree of success. SVM classification was applied to the Mackey-Glass attractor yielding poor results. SVM regression was applied Lorenz Attractor and an ANN-based chaotic oscillator yielding av- erage but encouraging results. SVM regression was applied to the Mackey-Glass attractor yielding poor results

    Acoustic Condition Monitoring & Fault Diagnostics for Industrial Systems

    Get PDF
    Condition monitoring and fault diagnostics for industrial systems is required for cost reduction, maintenance scheduling, and reducing system failures. Catastrophic failure usually causes significant damage and may cause injury or fatality, making early and accurate fault diagnostics of paramount importance. Existing diagnostics can be improved by augmenting or replacing with acoustic measurements, which have proven advantages over more traditional vibration measurements including, earlier detection of emerging faults, increased diagnostic accuracy, remote sensors and easier setup and operation. However, industry adoption of acoustics remains in relative infancy due to vested confidence and reliance on existing measurement and, perceived difficulties with noise contamination and diagnostic accuracy. Researched acoustic monitoring examples typically employ specialist surface-mount transducers, signal amplification, and complex feature extraction and machine learning algorithms, focusing on noise rejection and fault classification. Usually, techniques are fine-tuned to maximise diagnostic performance for the given problem. The majority investigate mechanical fault modes, particularly Roller Element Bearings (REBs), owing to the mechanical impacts producing detectable acoustic waves. The first contribution of this project is a suitability study into the use of low-cost consumer-grade acoustic sensors for fault diagnostics of six different REB health conditions, comparing against vibration measurements. Experimental results demonstrate superior acoustic performance throughout but particularly at lower rotational speed and axial load. Additionally, inaccuracies caused by dynamic operational parameters (speed in this case), are minimised by novel multi-Support Vector Machine training. The project then expands on existing work to encompass diagnostics for a previously unreported electrical fault mode present on a Brush-Less Direct Current motor drive system. Commonly studied electrical faults, such as a broken rotor bar or squirrel cage, result from mechanical component damage artificially seeded and not spontaneous. Here, electrical fault modes are differentiated as faults caused by issues with the power supply, control system or software (not requiring mechanical damage or triggering intervention). An example studied here is a transient current instability, generated by non-linear interaction of the motor electrical parameters, parasitic components and digital controller realisation. Experimental trials successfully demonstrate real-time feature extraction and further validate consumer-grade sensors for industrial system diagnostics. Moreover, this marks the first known diagnosis of an electrically-seeded fault mode as defined in this work. Finally, approaching an industry-ready diagnostic system, the newly released PYNQ-Z2 Field Programmable Gate Array is used to implement the first known instance of multiple feature extraction algorithms that operate concurrently in continuous real-time. A proposed deep-learning algorithm can analyse the features to determine the optimum feature extraction combination for ongoing continuous monitoring. The proposed black-box, all-in-one solution, is capable of accurate unsupervised diagnostics on almost any application, maintaining excellent diagnostic performance. This marks a major leap forward from fine-tuned feature extraction performed offline for artificially seeded mechanical defects to multiple real-time feature extraction demonstrated on a spontaneous electrical fault mode with a versatile and adaptable system that is low-cost, readily available, with simple setup and operation. The presented concept represents an industry-ready all-in-one acoustic diagnostic solution, that is hoped to increase adoption of acoustic methods, greatly improving diagnostics and minimising catastrophic failures

    19th SC@RUG 2022 proceedings 2021-2022

    Get PDF
    • 

    corecore