2,879 research outputs found
Development of FPGA based Standalone Tunable Fuzzy Logic Controllers
Soft computing techniques differ from conventional (hard) computing, in that unlike hard computing, it is tolerant of imprecision, uncertainty, partial truth, and approximation. In effect, the role model for soft computing is the human mind and its ability to address day-to-day problems. The principal constituents of Soft Computing (SC) are Fuzzy Logic (FL), Evolutionary Computation (EC), Machine Learning (ML) and Artificial Neural Networks (ANNs).
This thesis presents a generic hardware architecture for type-I and type-II standalone tunable Fuzzy Logic Controllers (FLCs) in Field Programmable Gate Array (FPGA). The designed FLC system can be remotely configured or tuned according to expert operated knowledge and deployed in different applications to replace traditional Proportional Integral Derivative (PID) controllers. This re-configurability is added as a feature to existing FLCs in literature. The FLC parameters which are needed for tuning purpose are mainly input range, output range, number of inputs, number of outputs, the parameters of the membership functions like slope and center points, and an If-Else rule base for the fuzzy inference process. Online tuning enables users to change these FLC parameters in real-time and eliminate repeated hardware programming whenever there is a need to change. Realization of these systems in real-time is difficult as the computational complexity increases exponentially with an increase in the number of inputs. Hence, the challenge lies in reducing the rule base significantly such that the inference time and the throughput time is perceivable for real-time applications.
To achieve these objectives, Modified Rule Active 2 Overlap Membership Function (MRA2-OMF), Modified Rule Active 3 Overlap Membership Function (MRA3-OMF), Modified Rule Active 4 Overlap Membership Function (MRA4-OMF), and Genetic Algorithm (GA) base rule optimization methods are proposed and implemented. These methods reduce the effective rules without compromising system accuracy and improve the cycle time in terms of Fuzzy Logic Inferences Per Second (FLIPS). In the proposed system architecture, the FLC is segmented into three independent modules, fuzzifier, inference engine with rule base, and defuzzifier.
Fuzzy systems employ fuzzifier to convert the real world crisp input into the fuzzy output. In type 2 fuzzy systems there are two fuzzifications happen simultaneously from upper and lower membership functions (UMF and LMF) with subtractions and divisions. Non-restoring, very high radix, and newton raphson approximation are most widely used division algorithms in hardware implementations. However, these prevalent methods have a cost of more latency. In order to overcome this problem, a successive approximation division algorithm based type 2 fuzzifier is introduced. It has been observed that successive approximation based fuzzifier computation is faster than the other type 2 fuzzifier.
A hardware-software co-design is established on Virtex 5 LX110T FPGA board. The MATLAB Graphical User Interface (GUI) acquires the fuzzy (type 1 or type 2) parameters from users and a Universal Asynchronous Receiver/Transmitter (UART) is dedicated to data communication between the hardware and the fuzzy toolbox. This GUI is provided to initiate control, input, rule transfer, and then to observe the crisp output on the computer. A proposed method which can support canonical fuzzy IF-THEN rules, which includes special cases of the fuzzy rule base is included in Digital Fuzzy Logic Controller (DFLC) architecture. For this purpose, a mealy state machine is incorporated into the design. The proposed FLCs are implemented on Xilinx Virtex-5 LX110T. DFLC peripheral integration with Micro-Blaze (MB) processor through Processor Logic Bus (PLB) is established for Intellectual Property (IP) core validation. The performance of the proposed systems are compared to Fuzzy Toolbox of MATLAB. Analysis of these designs is carried out by using Hardware-In-Loop (HIL) test to control various plant models in MATLAB/Simulink environments
Recommended from our members
Enabling high-performance, mixed-signal approximate computing
textFor decades, the semiconductor industry enjoyed exponential improvements in microprocessor power and performance with the device scaling of successive technology generations. Scaling limitations at sub-micron technologies, however, have ceased to provide these historical performance improvements within a limited power budget. While device scaling provides a larger number of transistors per chip, for the same chip area, a growing percentage of the chip will have to be powered off at any given time due to power constraints. As such, the architecture community has focused on energy-efficient designs and is looking to specialized hardware to provide gains in performance. A focus on energy efficiency, along with increasingly less reliable transistors due to device scaling, has led to research in the area of approximate computing, where accuracy is traded for energy efficiency when precise computation is not required. There is a growing body of approximation-tolerant applications that, for example, compute on noisy or incomplete data, such as real-world sensor inputs, or make approximations to decrease the computation load in the analysis of cumbersome data sets. These approximation-tolerant applications span application domains, such as machine learning, image processing, robotics, and financial analysis, among others. Since the advent of the modern processor, computing models have largely presumed the attribute of accuracy. A willingness to relax accuracy requirements, however, with goal of gaining energy efficiency, warrants the re-investigation of the potential of analog computing. Analog hardware offers the opportunity for fast and low-power computation; however, it presents challenges in the form of accuracy. Where analog compute blocks have been applied to solve fixed-function problems, general-purpose computing has relied on digital hardware implementations that provide generality and programmability. The work presented in this thesis aims to answer the following questions: Can analog circuits be successfully integrated into general-purpose computing to provide performance and energy savings? And, what is required to address the historical analog challenges of inaccuracy, programmability, and a lack of generality to enable such an approach? This thesis work investigates a neural approach as a means to address the historical analog challenges of inaccuracy, programmability, and generality and to enable the use of analog circuits in general-purpose, high-performance computing. The first piece of this thesis work investigates the use of analog circuits at the microarchitecture level in the form of an analog neural branch predictor. The task of branch prediction can tolerate imprecision, as roll-back mechanisms correct for branch mispredictions, and application-level accuracy remains unaffected. We show that analog circuits enable the implementation of a highly-accurate, neural-prediction algorithm that is infeasible to implement in the digital domain. The second piece of this thesis work presents a neural accelerator that targets approximation-tolerant code. Analog neural acceleration provides application speedup of 3.3x and energy savings of 12.1x with a quality loss less than 10% for all except one approximation-tolerant benchmark. These results show that, using a neural approach, analog circuits can be applied to provide performance and energy efficiency in high-performance, general-purpose computing.Computer Science
Hardware Implementation of Deep Network Accelerators Towards Healthcare and Biomedical Applications
With the advent of dedicated Deep Learning (DL) accelerators and neuromorphic
processors, new opportunities are emerging for applying deep and Spiking Neural
Network (SNN) algorithms to healthcare and biomedical applications at the edge.
This can facilitate the advancement of the medical Internet of Things (IoT)
systems and Point of Care (PoC) devices. In this paper, we provide a tutorial
describing how various technologies ranging from emerging memristive devices,
to established Field Programmable Gate Arrays (FPGAs), and mature Complementary
Metal Oxide Semiconductor (CMOS) technology can be used to develop efficient DL
accelerators to solve a wide variety of diagnostic, pattern recognition, and
signal processing problems in healthcare. Furthermore, we explore how spiking
neuromorphic processors can complement their DL counterparts for processing
biomedical signals. After providing the required background, we unify the
sparsely distributed research on neural network and neuromorphic hardware
implementations as applied to the healthcare domain. In addition, we benchmark
various hardware platforms by performing a biomedical electromyography (EMG)
signal processing task and drawing comparisons among them in terms of inference
delay and energy. Finally, we provide our analysis of the field and share a
perspective on the advantages, disadvantages, challenges, and opportunities
that different accelerators and neuromorphic processors introduce to healthcare
and biomedical domains. This paper can serve a large audience, ranging from
nanoelectronics researchers, to biomedical and healthcare practitioners in
grasping the fundamental interplay between hardware, algorithms, and clinical
adoption of these tools, as we shed light on the future of deep networks and
spiking neuromorphic processing systems as proponents for driving biomedical
circuits and systems forward.Comment: Submitted to IEEE Transactions on Biomedical Circuits and Systems (21
pages, 10 figures, 5 tables
Digital and Analog Computing Paradigms in Printed Electronics
Da das Ende von Moore\u27s Gesetz schon absehbar ist, müssen neue Wege gefunden werden um den innovationsgetriebenen IT-Markt mit neuartiger Elektronik zu sättigen. Durch den Einsatz von kostengünstiger Hardware mit flexiblem Formfaktor, welche auf neuartigen Materialien und Technologien beruhen, können neue Anwendungsbereiche erschlossen werden, welche über konventionelle siliziumbasierte Elektronik hinausgehen. Im Fokus sind hier insbesondere elektronische Systeme, welche es ermöglichen Konsumgüter für den täglichen Bedarf zu überwachen - z.B. im Zusammenhang einer Qualitätskontrolle - indem sie in das Produkt integriert werden als Teil einer intelligenten Verpackung und dadurch nur begrenzte Produktlebenszeit erfordern. Weitere vorhersehbare Anwendungsbereiche sind tragbare Elektronik oder Produkte für das "Internet der Dinge". Hier entstehen Systemanforderungen wie flexible, dehnbare Hardware unter Einsatz von ungiftigen Materialien.
Aus diesem Grund werden additive Technologien herangezogen, wie zum Beispiel gedruckte Elektronik, welche als komplementär zu siliziumbasierten Technologien betrachtet wird, da sie durch den simplen Herstellungsprozess sehr geringe Produktionskosten ermöglicht, und darüber hinaus auf ungiftigen und funktionalen Materialien basiert, welche auf flexible Plastik- oder Papiersubstrate aufgetragen werden können. Unter den verschiedenen Druckprozessen ist insbesondere der Tintenstrahldruck für zukünftige gedruckte Elektronikanwendungen interessant, da er eine Herstellung vor Ort und nach Bedarf ermöglicht auf Grund seines maskenlosen Druckprozesses. Da sich jedoch die Technologie der Tintenstrahl-druckbaren Elektronik in der Frühphasenentwicklung befindet, ist es fraglich ob Schaltungen für zukünftige Anwendungsfelder überhaupt entworfen werden können, beziehungsweise ob sie überhaupt herstellbar sind. Da die laterale Auflösung von Druckprozessen sich um mehrere Größenordnungen über siliziumbasierten Herstellungstechnologien befindet und des Weiteren entweder nur p- oder n-dotierte Transistoren verfügbar sind, können existierende Schaltungsentwürfe nicht direkt in die gedruckte Elektronik überführt werden. Dies führt zu der wissenschaftlichen Fragestellung, welche Rechenparadigmen überhaupt sinnvoll anwendbar sind im Bereich der gedruckten Elektronik. Die Beantwortung dieser Frage wird Schaltungsdesignern in der Zukunft helfen, erfolgreich gedruckte Schaltungen für den sich rasch entwickelnden Konsumgütermarkt zu entwerfen und zu produzieren.
Aus diesem Anlass exploriert diese Arbeit verschiedene Rechenparadigmen und Schaltungsentwürfe, welche als essenziell für zukünftige, gedruckte Systeme betrachtet werden. Die erfolgte Analyse beruht auf der recht jungen "Electrolyte-gated Transistor" (EGT) Technologie, welche auf einem kostengünstigen Tintenstrahldruckverfahren basiert und sehr geringe Betriebsspannungen ermöglicht. Da bisher nur einfache Logik-Gatter in der EGT-Technologie realisiert wurden, wird in dieser Arbeit der Entwurfsraum weiter exploriert, durch die Entwicklung von gedruckten Speicherbausteinen, Lookup Tabellen, künstliche Neuronen und Entscheidungsbäume. Besonders bei dem künstlichen Neuron und den Entscheidungsbäumen wird Bezug auf Hardware-Implementierungen von Algorithmen des maschinellen Lernens gemacht und die Skalierung der Schaltungen auf die Anwendungsebene aufgezeigt.
Die Rechenparadigmen, welche in dieser Arbeit evaluiert wurden, reichen von digitalen, analogen, neuromorphen Berechnungen bis zu stochastischen Verfahren. Zusätzlich wurden individuell anpassbare Schaltungsentwürfe untersucht, welche durch das Tintenstrahldruckverfahren ermöglicht werden und zu substanziellen Verbesserungen bezüglich des Flächenbedarfs, Leistungsverbrauch und Schaltungslatenzen führen, indem variable Entwurfsparameter in die Schaltung fest verdrahtet werden. Da die explorierten Schaltungen die Komplexität von bisher hergestellter, gedruckter Hardware weit übertreffen, ist es prinzipiell nicht automatisch garantiert, dass sie herstellbar sind, was insbesondere die nicht-digitalen Schaltungen betrifft. Aus diesem Grund wurden in dieser Arbeit EGT-basierte Hardware-Prototypen hergestellt und bezüglich Flächenbedarf, Leistungsverbrauch und Latenz charakterisiert. Die Messergebnisse können verwendet werden, um eine Extrapolation auf komplexere anwendungsbezogenere Schaltungsentwürfe durchzuführen. In diesem Zusammenhang wurden Validierungen von den entwickelten Hardware-Implementierungen von Algorithmen des maschinellen Lernens durchgeführt, um einen Wirksamkeitsnachweis zu erhalten.
Die Ergebnisse dieser Thesis führen zu mehreren Schlussfolgerungen. Zum ersten kann gefolgert werden, dass die sequentielle Verarbeitung von Algorithmen in gedruckter EGT-basierter Hardware prinzipiell möglich ist, da, wie in dieser Arbeit dargestellt wird, neben kombinatorischen Schaltungen auch Speicherbausteine implementiert werden können. Letzteres wurde experimentell validiert. Des Weiteren können analoge und neuromorphe Rechenparadigmen sinnvoll eingesetzt werden, um gedruckte Hardware für maschinelles Lernen zu realisieren, um gegenüber konventionellen Methoden die Komplexität von Schaltungsentwürfen erheblich zu minimieren, welches schlussendlich zu einer höheren Produktionsausbeute im Herstellungsprozess führt. Ebenso können neuronale Netzwerkarchitekturen, welche auf Stochastic Computing basieren, zur Reduzierung des Hardwareumfangs gegenüber konventionellen Implementierungen verwendet werden. Letztlich kann geschlussfolgert werden, dass durch den Tintenstrahldruckprozess Schaltungsentwürfe bezüglich Kundenwünschen während der Herstellung individuell angepasst werden können, um die Anwendbarkeit von gedruckter Hardware generell zu erhöhen, da auch hier geringerer Hardwareaufwand im Vergleich zu konventionellen Schaltungsentwürfen erreicht wird.
Es wird antizipiert, dass die in dieser Thesis vorgestellten Forschungsergebnisse relevant sind für Informatiker, Elektrotechniker und Materialwissenschaftler, welche aktiv im Bereich der druckbaren Elektronik arbeiten. Die untersuchten Rechenparadigmen und ihr Einfluss auf Verhalten und wichtige Charakteristiken gedruckter Hardware geben Einblicke darüber, wie gedruckte Schaltungen in der Zukunft effizient umgesetzt werden können, um neuartige auf Druckverfahren-basierte Produkte im Elektronikbereich zu ermöglichen
Recommended from our members
Technical Review of Residential Programmable Communicating Thermostat Implementation for Title 24-2008
X-TIME: An in-memory engine for accelerating machine learning on tabular data with CAMs
Structured, or tabular, data is the most common format in data science. While
deep learning models have proven formidable in learning from unstructured data
such as images or speech, they are less accurate than simpler approaches when
learning from tabular data. In contrast, modern tree-based Machine Learning
(ML) models shine in extracting relevant information from structured data. An
essential requirement in data science is to reduce model inference latency in
cases where, for example, models are used in a closed loop with simulation to
accelerate scientific discovery. However, the hardware acceleration community
has mostly focused on deep neural networks and largely ignored other forms of
machine learning. Previous work has described the use of an analog content
addressable memory (CAM) component for efficiently mapping random forests. In
this work, we focus on an overall analog-digital architecture implementing a
novel increased precision analog CAM and a programmable network on chip
allowing the inference of state-of-the-art tree-based ML models, such as
XGBoost and CatBoost. Results evaluated in a single chip at 16nm technology
show 119x lower latency at 9740x higher throughput compared with a
state-of-the-art GPU, with a 19W peak power consumption
A RISC-V Matrix Multiplier Using Systolic Arrays
Many modern day applications can be solved with the usage of machine learning, which involves training a computer to learn on large amounts of data without direct programmer guidance. Conventional computers typically use normal general purpose central processing units, though more specialized tasks may take advantage of more parallel hardware such as graphics processing units. In the pursuit of increased performance to facilitate increasingly more complex machine learning models, researchers in both academia and industry look towards field-programmable gate arrays and application specific integrated circuits for their needs. Various implementations, both theoretical and practical, exist across a wide variety of designs. A custom design, using systolic arrays and built on the existing RISC-V Instruction Set Architecture, will be used to accelerate matrix calculations, with example performance on the MNIST dataset measured
Analog Photonics Computing for Information Processing, Inference and Optimisation
This review presents an overview of the current state-of-the-art in photonics
computing, which leverages photons, photons coupled with matter, and
optics-related technologies for effective and efficient computational purposes.
It covers the history and development of photonics computing and modern
analogue computing platforms and architectures, focusing on optimization tasks
and neural network implementations. The authors examine special-purpose
optimizers, mathematical descriptions of photonics optimizers, and their
various interconnections. Disparate applications are discussed, including
direct encoding, logistics, finance, phase retrieval, machine learning, neural
networks, probabilistic graphical models, and image processing, among many
others. The main directions of technological advancement and associated
challenges in photonics computing are explored, along with an assessment of its
efficiency. Finally, the paper discusses prospects and the field of optical
quantum computing, providing insights into the potential applications of this
technology.Comment: Invited submission by Journal of Advanced Quantum Technologies;
accepted version 5/06/202
Implementation of Block-based Neural Networks on Reconfigurable Computing Platforms
Block-based Neural Networks (BbNNs) provide a flexible and modular architecture to support adaptive applications in dynamic environments. Reconfigurable computing (RC) platforms provide computational efficiency combined with flexibility. Hence, RC provides an ideal match to evolvable BbNN applications. BbNNs are very convenient to build once a library of neural network blocks is built. This library-based approach for the design of BbNNs is extremely useful to automate implementations of BbNNs and evaluate their performance on RC platforms. This is important because, for a given application there may be hundreds to thousands of candidate BbNN implementations possible and evaluating each of them for accuracy and performance, using software simulations will take a very long time, which would not be acceptable for adaptive environments. This thesis focuses on the development and characterization of a library of parameterized VHDL models of neural network blocks, which may be used to build any BbNN. The use of these models is demonstrated in the XOR pattern classification problem and mobile robot navigation problem. For a given application, one may be interested in fabricating an ASIC, once the weights and architecture of the BbNN is decided. Pointers to ASIC implementation of BbNNs with initial results are also included in this thesis
In-memory computing with emerging memory devices: Status and outlook
Supporting data for "In-memory computing with emerging memory devices: status and outlook", submitted to APL Machine Learning
- …