53 research outputs found
Dynamically and partially reconfigurable hardware architectures for high performance microarray bioinformatics data analysis
The field of Bioinformatics and Computational Biology (BCB) is a multidisciplinary field
that has emerged due to the computational demands of current state-of-the-art biotechnology.
BCB deals with the storage, organization, retrieval, and analysis of biological datasets,
which have grown in size and complexity in recent years especially after the completion of
the human genome project. The advent of Microarray technology in the 1990s has resulted in
the new concept of high throughput experiment, which is a biotechnology that measures the
gene expression profiles of thousands of genes simultaneously. As such, Microarray requires
high computational power to extract the biological relevance from its high dimensional data.
Current general purpose processors (GPPs) has been unable to keep-up with the increasing
computational demands of Microarrays and reached a limit in terms of clock speed.
Consequently, Field Programmable Gate Arrays (FPGAs) have been proposed as a low
power viable solution to overcome the computational limitations of GPPs and other methods.
The research presented in this thesis harnesses current state-of-the-art FPGAs and tools to
accelerate some of the most widely used data mining methods used for the analysis of
Microarray data in an effort to investigate the viability of the technology as an efficient, low
power, and economic solution for the analysis of Microarray data. Three widely used
methods have been selected for the FPGA implementations: one is the un-supervised Kmeans
clustering algorithm, while the other two are supervised classification methods,
namely, the K-Nearest Neighbour (K-NN) and Support Vector Machines (SVM). These
methods are thought to benefit from parallel implementation. This thesis presents detailed
designs and implementations of these three BCB applications on FPGA captured in Verilog
HDL, whose performance are compared with equivalent implementations running on GPPs.
In addition to acceleration, the benefits of current dynamic partial reconfiguration (DPR)
capability of modern Xilinx’ FPGAs are investigated with reference to the aforementioned
data mining methods.
Implementing K-means clustering on FPGA using non-DPR design flow has
outperformed equivalent implementations in GPP and GPU in terms of speed-up by two
orders and one order of magnitude, respectively; while being eight times more power
efficient than GPP and four times more than a GPU implementation. As for the energy
efficiency, the FPGA implementation was 615 times more energy efficient than GPPs, and 31 times more than GPUs. Over and above, the FPGA implementation outperformed the
GPP and GPU implementations in terms of speed-up as the dimensionality of the Microarray
data increases. Additionally, the DPR implementations of the K-means clustering have
shown speed-up in partial reconfiguration time of ~5x and 17x over full chip reconfiguration
for single-core and eight-core implementations, respectively.
Two architectures of the K-NN classifier have been implemented on FPGA, namely, A1
and A2. The K-NN implementation based on A1 architecture achieved a speed-up of ~76x
over an equivalent GPP implementation whereas the A2 architecture achieved ~68x speedup.
Furthermore, the FPGA implementation outperformed the equivalent GPP
implementation when the dimensionality of data was increased. In addition, The DPR
implementations of the K-NN classifier have achieved speed-ups in reconfiguration time
between ~4x to 10x over full chip reconfiguration when reconfiguring portion of the
classifier or the complete classifier.
Similar to K-NN, two architectures of the SVM classifier were implemented on FPGA
whereby the former outperformed an equivalent GPP implementation by ~61x and the latter
by ~49x. As for the DPR implementation of the SVM classifier, it has shown a speed-up of
~8x in reconfiguration time when reconfiguring the complete core or when exchanging it
with a K-NN core forming a multi-classifier.
The aforementioned implementations clearly show FPGAs to be an efficacious, efficient
and economic solution for bioinformatics Microarrays data analysis
Embedded Machine Learning: Emphasis on Hardware Accelerators and Approximate Computing for Tactile Data Processing
Machine Learning (ML) a subset of Artificial Intelligence (AI) is driving the industrial
and technological revolution of the present and future. We envision a world with smart
devices that are able to mimic human behavior (sense, process, and act) and perform
tasks that at one time we thought could only be carried out by humans. The vision
is to achieve such a level of intelligence with affordable, power-efficient, and fast hardware
platforms. However, embedding machine learning algorithms in many application domains
such as the internet of things (IoT), prostheses, robotics, and wearable devices is an ongoing
challenge. A challenge that is controlled by the computational complexity of ML algorithms,
the performance/availability of hardware platforms, and the application\u2019s budget (power
constraint, real-time operation, etc.). In this dissertation, we focus on the design and
implementation of efficient ML algorithms to handle the aforementioned challenges. First, we
apply Approximate Computing Techniques (ACTs) to reduce the computational complexity of
ML algorithms. Then, we design custom Hardware Accelerators to improve the performance
of the implementation within a specified budget. Finally, a tactile data processing application
is adopted for the validation of the proposed exact and approximate embedded machine
learning accelerators.
The dissertation starts with the introduction of the various ML algorithms used for
tactile data processing. These algorithms are assessed in terms of their computational
complexity and the available hardware platforms which could be used for implementation.
Afterward, a survey on the existing approximate computing techniques and hardware
accelerators design methodologies is presented. Based on the findings of the survey, an
approach for applying algorithmic-level ACTs on machine learning algorithms is provided.
Then three novel hardware accelerators are proposed: (1) k-Nearest Neighbor (kNN) based
on a selection-based sorter, (2) Tensorial Support Vector Machine (TSVM) based on Shallow
Neural Networks, and (3) Hybrid Precision Binary Convolution Neural Network (BCNN).
The three accelerators offer a real-time classification with monumental reductions in the
hardware resources and power consumption compared to existing implementations targeting
the same tactile data processing application on FPGA. Moreover, the approximate accelerators
maintain a high classification accuracy with a loss of at most 5%
Recent Advances in Embedded Computing, Intelligence and Applications
The latest proliferation of Internet of Things deployments and edge computing combined with artificial intelligence has led to new exciting application scenarios, where embedded digital devices are essential enablers. Moreover, new powerful and efficient devices are appearing to cope with workloads formerly reserved for the cloud, such as deep learning. These devices allow processing close to where data are generated, avoiding bottlenecks due to communication limitations. The efficient integration of hardware, software and artificial intelligence capabilities deployed in real sensing contexts empowers the edge intelligence paradigm, which will ultimately contribute to the fostering of the offloading processing functionalities to the edge. In this Special Issue, researchers have contributed nine peer-reviewed papers covering a wide range of topics in the area of edge intelligence. Among them are hardware-accelerated implementations of deep neural networks, IoT platforms for extreme edge computing, neuro-evolvable and neuromorphic machine learning, and embedded recommender systems
Pattern Recognition
Pattern recognition is a very wide research field. It involves factors as diverse as sensors, feature extraction, pattern classification, decision fusion, applications and others. The signals processed are commonly one, two or three dimensional, the processing is done in real- time or takes hours and days, some systems look for one narrow object class, others search huge databases for entries with at least a small amount of similarity. No single person can claim expertise across the whole field, which develops rapidly, updates its paradigms and comprehends several philosophical approaches. This book reflects this diversity by presenting a selection of recent developments within the area of pattern recognition and related fields. It covers theoretical advances in classification and feature extraction as well as application-oriented works. Authors of these 25 works present and advocate recent achievements of their research related to the field of pattern recognition
Improved Subset Generation For The MU-Decoder
The MU-Decoder is a hardware subset generator that finds use in partial reconfiguration of FPGAs and in numerous other applications. It is capable of generating a set S of subsets of a large set Z_n with n elements. If the subsets in S satisfy the “isomorphic totally- ordered property”, then the MU-Decoder works very efficiently to produce a set of u subsets in O(log n) time and Θ(n √u log n) gate cost. In contrast, a vain approach requires Θ(un) gate cost. We show that this low cost for the MU-Decoder can be achieved without the isomorphism constraint, thereby allowing S to include a much wider range of subsets. We also show that if additional constraints on the relative sizes of the subsets in S can be placed, then u subsets can be generated with Θ(n √u) cost. This uses a new hardware enhancement proposed in this thesis. Finally, we show that by properly selecting S and by using some elements of traditional methods, a set of Θ (un^log( log (n/log n))) subsets can be produced with Θ(n √u) cost
Build framework and runtime abstraction for partial reconfiguration on FPGA SoCs
Growth in edge computing has increased the requirement for edge systems to process larger volumes of real-time data, such as with image processing and machine learning; which are increasingly demanding of computing resources. Offloading tasks to the cloud provides some relief but is network dependant, high latency and expensive. Alternative architectures such as GPUs provide higher performance acceleration for this type of data processing but trade processing performance for an increase in power consumption. Another option is the Field Programmable Gate Array; a flexible matrix of logic that can be configured by a designer to provide a highly optimised computation path for incoming data. There are drawbacks; the FPGA design process is complex, the domain is dissimilar to software and the tools require bespoke expertise. A designer must manage the hardware to software paradigm introduced when tightly-coupled with general purpose processor. Advanced features, such as the ability to partially reconfigure (PR) specific regions of the FPGA, further increase this complexity. This thesis presents theory and demonstration of custom frameworks and tools for increasing abstraction and simplifying control over PR applications. We present mechanisms for networked PR; a mechanism for bypassing the traditional software networking stack to trigger PR with reduced latency and increased determinism. We developed a build framework for automating the end-to-end PR design process for Linux based systems as well as an abstracted runtime for managing the resulting applications. Finally, we take expand on this work and present a high level abstraction for PR on cyber physical systems, with a demonstration using the Robot Operating System. This work is released as open source contributions, designed to enable future PR research
Optics for AI and AI for Optics
Artificial intelligence is deeply involved in our daily lives via reinforcing the digital transformation of modern economies and infrastructure. It relies on powerful computing clusters, which face bottlenecks of power consumption for both data transmission and intensive computing. Meanwhile, optics (especially optical communications, which underpin today’s telecommunications) is penetrating short-reach connections down to the chip level, thus meeting with AI technology and creating numerous opportunities. This book is about the marriage of optics and AI and how each part can benefit from the other. Optics facilitates on-chip neural networks based on fast optical computing and energy-efficient interconnects and communications. On the other hand, AI enables efficient tools to address the challenges of today’s optical communication networks, which behave in an increasingly complex manner. The book collects contributions from pioneering researchers from both academy and industry to discuss the challenges and solutions in each of the respective fields
Run-time reconfiguration for efficient tracking of implanted magnets with a myokinetic control interface applied to robotic hands
Tese (doutorado)—Universidade de Brasília, Faculdade de Tecnologia, Departamento de Engenharia Mecânica, 2021.Este trabalho introduz a aplicação de soluções de aprendizagem de máquinas visado ao problema do rastreamento de posição do antebraço baseado em sensores magnéticos. Especi ficamente, emprega-se uma estratégia baseada em dados para criar modelos matemáticos que possam traduzir as informações magnéticas medidas em entradas utilizáveis para dispositivos protéticos. Estes modelos são implementados em FPGAs usando operadores customizados de ponto flutuante para otimizar o consumo de hardware e energia, que são importantes em dispositivos embarcados. A arquitetura de hardware é proposta para ser implementada como um sistema com reconfiguração dinâmica parcial, reduzindo potencialmente a utilização de recursos e o consumo de energia da FPGA. A estratégia de dados proposta e sua implemen tação de hardware pode alcançar uma latência na ordem de microssegundos e baixo consumo de energia, o que encoraja mais pesquisas para melhorar os métodos aqui desenvolvidos para outras aplicações.Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES).This work introduces the application of embedded machine learning solutions for the problem of magnetic sensors-based limb tracking. Namely, we employ a data-driven strat egy to create mathematical models that can translate the magnetic information measured to usable inputs for prosthetic devices. These models are implemented in FPGAs using cus tomized floating-point operations to optimize hardware and energy consumption, which are important in wearable devices. The hardware architecture is proposed to be implemented as a dynamically partial reconfigured system, potentially reducing resource utilization and power consumption of the FPGA. The proposed data-driven strategy and its hardware implementa tion can achieve a latency in the order of microseconds and low energy consumption, which encourages further research on improving the methods herein devised for other application
Addressing Complexity and Intelligence in Systems Dependability Evaluation
Engineering and computing systems are increasingly complex, intelligent, and open adaptive. When it comes to the dependability evaluation of such systems, there are certain challenges posed by the characteristics of “complexity” and “intelligence”. The first aspect of complexity is the dependability modelling of large systems with many interconnected components and dynamic behaviours such as Priority, Sequencing and Repairs. To address this, the thesis proposes a novel hierarchical solution to dynamic fault tree analysis using Semi-Markov Processes. A second aspect of complexity is the environmental conditions that may impact dependability and their modelling. For instance, weather and logistics can influence maintenance actions and hence dependability of an offshore wind farm. The thesis proposes a semi-Markov-based maintenance model called “Butterfly Maintenance Model (BMM)” to model this complexity and accommodate it in dependability evaluation. A third aspect of complexity is the open nature of system of systems like swarms of drones which makes complete design-time dependability analysis infeasible. To address this aspect, the thesis proposes a dynamic dependability evaluation method using Fault Trees and Markov-Models at runtime.The challenge of “intelligence” arises because Machine Learning (ML) components do not exhibit programmed behaviour; their behaviour is learned from data. However, in traditional dependability analysis, systems are assumed to be programmed or designed. When a system has learned from data, then a distributional shift of operational data from training data may cause ML to behave incorrectly, e.g., misclassify objects. To address this, a new approach called SafeML is developed that uses statistical distance measures for monitoring the performance of ML against such distributional shifts. The thesis develops the proposed models, and evaluates them on case studies, highlighting improvements to the state-of-the-art, limitations and future work
- …