78 research outputs found
On FPGA implementations for bioinformatics, neural prosthetics and reinforcement learning problems.
Mak Sui Tung Terrence.Thesis (M.Phil.)--Chinese University of Hong Kong, 2005.Includes bibliographical references (leaves 132-142).Abstracts in English and Chinese.Abstract --- p.iList of Tables --- p.ivList of Figures --- p.vAcknowledgements --- p.ixChapter 1. --- Introduction --- p.1Chapter 1.1 --- Bioinformatics --- p.1Chapter 1.2 --- Neural Prosthetics --- p.4Chapter 1.3 --- Learning in Uncertainty --- p.5Chapter 1.4 --- The Field Programmable Gate Array (FPGAs) --- p.7Chapter 1.5 --- Scope of the Thesis --- p.10Chapter 2. --- A Hybrid GA-DP Approach for Searching Equivalence Sets --- p.14Chapter 2.1 --- Introduction --- p.16Chapter 2.2 --- Equivalence Set Criterion --- p.18Chapter 2.3 --- Genetic Algorithm and Dynamic Programming --- p.19Chapter 2.3.1 --- Genetic Algorithm Formulation --- p.20Chapter 2.3.2 --- Bounded Mutation --- p.21Chapter 2.3.3 --- Conditioned Crossover --- p.22Chapter 2.3.4 --- Implementation --- p.22Chapter 2.4 --- FPGAs Implementation of GA-DP --- p.24Chapter 2.4.1 --- System Overview --- p.25Chapter 2.4.2 --- Parallel Computation for Transitive Closure --- p.26Chapter 2.4.3 --- Genetic Operation Realization --- p.28Chapter 2.5 --- Discussion --- p.30Chapter 2.6 --- Limitation and Future Work --- p.33Chapter 2.7 --- Conclusion --- p.34Chapter 3. --- An FPGA-based Architecture for Maximum-Likelihood Phylogeny Evaluation --- p.35Chapter 3.1 --- Introduction --- p.36Chapter 3.2 --- Maximum-Likelihood Model --- p.39Chapter 3.3 --- Hardware Mapping for Pruning Algorithm --- p.41Chapter 3.3.1 --- Related Works --- p.41Chapter 3.3.2 --- Number Representation --- p.42Chapter 3.3.3 --- Binary Tree Representation --- p.43Chapter 3.3.4 --- Binary Tree Traversal --- p.45Chapter 3.3.5 --- Maximum-Likelihood Evaluation Algorithm --- p.46Chapter 3.4 --- System Architecture --- p.49Chapter 3.4.1 --- Transition Probability Unit --- p.50Chapter 3.4.2 --- State-Parallel Computation Unit --- p.51Chapter 3.4.3 --- Error Computation --- p.54Chapter 3.5 --- Discussion --- p.56Chapter 3.5.1 --- Hardware Resource Consumption --- p.56Chapter 3.5.2 --- Delay Evaluation --- p.57Chapter 3.6 --- Conclusion --- p.59Chapter 4. --- Field Programmable Gate Array Implementation of Neuronal Ion Channel Dynamics --- p.61Chapter 4.1 --- Introduction --- p.62Chapter 4.2 --- Background --- p.63Chapter 4.2.1 --- Analog VLSI Model for Hebbian Synapse --- p.63Chapter 4.2.2 --- A Unifying Model of Bi-directional Synaptic Plasticity --- p.64Chapter 4.2.3 --- Non-NMDA Receptor Channel Regulation --- p.65Chapter 4.3 --- FPGAs Implementation --- p.65Chapter 4.3.1 --- FPGA Design Flow --- p.65Chapter 4.3.2 --- Digital Model of NMD A and AMPA receptors --- p.65Chapter 4.3.3 --- Synapse Modification --- p.67Chapter 4.4 --- Results --- p.68Chapter 4.4.1 --- Simulation Results --- p.68Chapter 4.5 --- Discussion --- p.70Chapter 4.6 --- Conclusion --- p.71Chapter 5. --- Continuous-Time and Discrete-Time Inference Networks for Distributed Dynamic Programming --- p.72Chapter 5.1 --- Introduction --- p.74Chapter 5.2 --- Background --- p.77Chapter 5.2.1 --- Markov decision process (MDPs) --- p.78Chapter 5.2.2 --- Learning in the MDPs --- p.80Chapter 5.2.3 --- Bellman Optimal Criterion --- p.80Chapter 5.2.4 --- Value Iteration --- p.81Chapter 5.3 --- A Computational Framework for Continuous-Time Inference Network --- p.82Chapter 5.3.1 --- Binary Relation Inference Network --- p.83Chapter 5.3.2 --- Binary Relation Inference Network for MDPs --- p.85Chapter 5.3.3 --- Continuous-Time Inference Network for MDPs --- p.87Chapter 5.4 --- Convergence Consideration --- p.88Chapter 5.5 --- Numerical Simulation --- p.90Chapter 5.5.1 --- Example 1: Random Walk --- p.90Chapter 5.5.2 --- Example 2: Random Walk on a Grid --- p.94Chapter 5.5.3 --- Example 3: Stochastic Shortest Path Problem --- p.97Chapter 5.5.4 --- Relationships Between λ and γ --- p.99Chapter 5.6 --- Discrete-Time Inference Network --- p.100Chapter 5.6.1 --- Results --- p.101Chapter 5.7 --- Conclusion --- p.102Chapter 6. --- On Distributed g-Learning Network --- p.104Chapter 6.1 --- Introduction --- p.105Chapter 6.2 --- Distributed Q-Learniing Network --- p.108Chapter 6.2.1 --- Distributed Q-Learning Network --- p.109Chapter 6.2.2 --- Q-Learning Network Architecture --- p.111Chapter 6.3 --- Experimental Results --- p.114Chapter 6.3.1 --- Random Walk --- p.114Chapter 6.3.2 --- The Shortest Path Problem --- p.116Chapter 6.4 --- Discussion --- p.120Chapter 6.4.1 --- Related Work --- p.121Chapter 6.5 --- FPGAs Implementation --- p.122Chapter 6.5.1 --- Distributed Registering Approach --- p.123Chapter 6.5.2 --- Serial BRAM Storing Approach --- p.124Chapter 6.5.3 --- Comparison --- p.125Chapter 6.5.4 --- Discussion --- p.127Chapter 6.6 --- Conclusion --- p.128Chapter 7. --- Summary --- p.129Bibliography --- p.132AppendixChapter A. --- Simplified Floating-Point Arithmetic --- p.143Chapter B. --- "Logarithm, Exponential and Division Implementation" --- p.144Chapter B.1 --- Introduction --- p.144Chapter B.2 --- Approximation Scheme --- p.145Chapter B.2.1 --- Logarithm --- p.145Chapter B.2.2 --- Exponentiation --- p.147Chapter B.2.3 --- Division --- p.148Chapter C. --- Analog VLSI Implementation --- p.150Chapter C.1 --- Site Function --- p.150Chapter C.1.1 --- Multiplication Cell --- p.150Chapter C.2 --- The Unit Function --- p.153Chapter C.3 --- The Inference Network Computation --- p.154Chapter C.4 --- Layout --- p.157Chapter C.5 --- Fabrication --- p.159Chapter C.5.1 --- Testing and Characterization --- p.16
A Review on ANFIS based Linearization of Non Linear Sensors
Low cost sensors having high sensitivity, better resolution and linear characteristics are required for industrial applications based on instrumentation and control. Unfortunately, the natural non linear characteristic of sensor itself and also the dynamic nature of the environment, aging effect, inherent sensor’s noise and data loss due to transients or intermittent faults affects the sensor characteristics non linearly. As the transfer characteristic of most sensors is nonlinear in nature, obtaining data from such a nonlinear sensor, by using an optimized device, has always been a design challenge. Linearization of nonlinear sensor characteristic in digital environment, is a vital step in the instrument signal conditioning process. This paper gives a brief review about how to overcome this nonlinear characteristic of the sensor using artificial intelligence such as Hybrid Neuro Fuzzy Logic (HNFL) based on digital linearization technique using VLSI technology such as Field Programmable Gate Array (FPGA)
Analog Photonics Computing for Information Processing, Inference and Optimisation
This review presents an overview of the current state-of-the-art in photonics
computing, which leverages photons, photons coupled with matter, and
optics-related technologies for effective and efficient computational purposes.
It covers the history and development of photonics computing and modern
analogue computing platforms and architectures, focusing on optimization tasks
and neural network implementations. The authors examine special-purpose
optimizers, mathematical descriptions of photonics optimizers, and their
various interconnections. Disparate applications are discussed, including
direct encoding, logistics, finance, phase retrieval, machine learning, neural
networks, probabilistic graphical models, and image processing, among many
others. The main directions of technological advancement and associated
challenges in photonics computing are explored, along with an assessment of its
efficiency. Finally, the paper discusses prospects and the field of optical
quantum computing, providing insights into the potential applications of this
technology.Comment: Invited submission by Journal of Advanced Quantum Technologies;
accepted version 5/06/202
Exploiting All-Programmable System on Chips for Closed-Loop Real-Time Neural Interfaces
High-density microelectrode arrays (HDMEAs) feature thousands of recording electrodes
in a single chip with an area of few square millimeters. The obtained electrode density is
comparable and even higher than the typical density of neuronal cells in cortical cultures.
Commercially available HDMEA-based acquisition systems are able to record the neural
activity from the whole array at the same time with submillisecond resolution. These devices
are a very promising tool and are increasingly used in neuroscience to tackle fundamental
questions regarding the complex dynamics of neural networks. Even if electrical or optical
stimulation is generally an available feature of such systems, they lack the capability of
creating a closed-loop between the biological neural activity and the artificial system. Stimuli
are usually sent in an open-loop manner, thus violating the inherent working basis of neural
circuits that in nature are constantly reacting to the external environment. This forbids to
unravel the real mechanisms behind the behavior of neural networks.
The primary objective of this PhD work is to overcome such limitation by creating a fullyreconfigurable
processing system capable of providing real-time feedback to the ongoing
neural activity recorded with HDMEA platforms. The potentiality of modern heterogeneous
FPGAs has been exploited to realize the system. In particular, the Xilinx Zynq All Programmable
System on Chip (APSoC) has been used. The device features reconfigurable
logic, specialized hardwired blocks, and a dual-core ARM-based processor; the synergy of
these components allows to achieve high elaboration performances while maintaining a high
level of flexibility and adaptivity. The developed system has been embedded in an acquisition
and stimulation setup featuring the following platforms:
\u2022 3\ub7Brain BioCam X, a state-of-the-art HDMEA-based acquisition platform capable of
recording in parallel from 4096 electrodes at 18 kHz per electrode.
\u2022 PlexStim\u2122 Electrical Stimulator System, able to generate electrical stimuli with
custom waveforms to 16 different output channels.
\u2022 Texas Instruments DLP\uae LightCrafter\u2122 Evaluation Module, capable of projecting
608x684 pixels images with a refresh rate of 60 Hz; it holds the function of optical
stimulation.
All the features of the system, such as band-pass filtering and spike detection of all the
recorded channels, have been validated by means of ex vivo experiments. Very low-latency
has been achieved while processing the whole input data stream in real-time. In the case
of electrical stimulation the total latency is below 2 ms; when optical stimuli are needed,
instead, the total latency is a little higher, being 21 ms in the worst case.
The final setup is ready to be used to infer cellular properties by means of closed-loop
experiments. As a proof of this concept, it has been successfully used for the clustering
and classification of retinal ganglion cells (RGCs) in mice retina. For this experiment, the
light-evoked spikes from thousands of RGCs have been correctly recorded and analyzed in
real-time. Around 90% of the total clusters have been classified as ON- or OFF-type cells.
In addition to the closed-loop system, a denoising prototype has been developed. The main
idea is to exploit oversampling techniques to reduce the thermal noise recorded by HDMEAbased
acquisition systems. The prototype is capable of processing in real-time all the input
signals from the BioCam X, and it is currently being tested to evaluate the performance in
terms of signal-to-noise-ratio improvement
組合せ最適化問題のための測定フィードバック型コヒーレント・イジングマシンの実現と評価
学位の種別: 課程博士審査委員会委員 : (主査)東京大学教授 合原 一幸, 東京大学教授 岩田 覚, 東京大学准教授 平田 祥人, 東京大学准教授 大西 立顕, 東京大学准教授 鈴木 大慈University of Tokyo(東京大学
L'intertextualité dans les publications scientifiques
La base de données bibliographiques de l'IEEE contient un certain nombre de duplications avérées avec indication des originaux copiés. Ce corpus est utilisé pour tester une méthode d'attribution d'auteur. La combinaison de la distance intertextuelle avec la fenêtre glissante et diverses techniques de classification permet d'identifier ces duplications avec un risque d'erreur très faible. Cette expérience montre également que plusieurs facteurs brouillent l'identité de l'auteur scientifique, notamment des collectifs de chercheurs à géométrie variable et une forte dose d'intertextualité acceptée voire recherchée
Who wrote this scientific text?
The IEEE bibliographic database contains a number of proven duplications with indication of the original paper(s) copied. This corpus is used to test a method for the detection of hidden intertextuality (commonly named "plagiarism"). The intertextual distance, combined with the sliding window and with various classification techniques, identifies these duplications with a very low risk of error. These experiments also show that several factors blur the identity of the scientific author, including variable group authorship and the high levels of intertextuality accepted, and sometimes desired, in scientific papers on the same topic
Scalable Hardware Efficient Deep Spatio-Temporal Inference Networks
Deep machine learning (DML) is a promising field of research that has enjoyed much success in recent years. Two of the predominant deep learning architectures studied in the literature are Convolutional Neural Networks (CNNs) and Deep Belief Networks (DBNs). Both have been successfully applied to many standard benchmarks with a primary focus on machine vision and speech processing domains.
Many real-world applications involve time-varying signals and, consequently, necessitate models that efficiently represent both temporal and spatial attributes. However, neither DBNs nor CNNs are designed to naturally capture temporal dependencies in observed data, often resulting in the inadequate transformation of spatio-temporal signals into wide spatial structures. It is argued that deep machine learning without proper temporal representation mechanisms is unable to extract meaningful information from many time-varying natural signals.
Another clear emerging need is in growing deep learning architectures with the size of the problem at hand, suggesting that such architectures should map well to custom hardware platforms. The latter offer much better performance than that achievable using CPUs or even GPUs. Analog computation is a unique potential solution to the scalability challenge offering the benefits of low power consumption and smaller physical size when compared to digital implementations. However, these benefits come with the consequence of inaccurate computations and noise.
This work presents an enhanced formulation of DeSTIN - a Deep Spatio-Temporal Inference Network (DeSTIN) that is inherently designed to capture both spatial and temporal dependencies in the data provided. The regular structure of DeSTIN, its computational requirements, and local connectivity render it hardware-efficient and highly scalable. Implementation of DeSTIN using analog computation is studied in detail, where the architectural robustness to various distortions in its signals is demonstrated. To the best of our knowledge, this is the first time custom analog hardware has been developed for deep machine learning. Key enhancements to previous formulations of DeSTIN are discussed in detail and results on standard benchmarks are presented. This work helps pave the way for advancing deep learning to address some of the long-standing challenges in machine learning
Autonomous Recovery Of Reconfigurable Logic Devices Using Priority Escalation Of Slack
Field Programmable Gate Array (FPGA) devices offer a suitable platform for survivable hardware architectures in mission-critical systems. In this dissertation, active dynamic redundancy-based fault-handling techniques are proposed which exploit the dynamic partial reconfiguration capability of SRAM-based FPGAs. Self-adaptation is realized by employing reconfiguration in detection, diagnosis, and recovery phases. To extend these concepts to semiconductor aging and process variation in the deep submicron era, resilient adaptable processing systems are sought to maintain quality and throughput requirements despite the vulnerabilities of the underlying computational devices. A new approach to autonomous fault-handling which addresses these goals is developed using only a uniplex hardware arrangement. It operates by observing a health metric to achieve Fault Demotion using Recon- figurable Slack (FaDReS). Here an autonomous fault isolation scheme is employed which neither requires test vectors nor suspends the computational throughput, but instead observes the value of a health metric based on runtime input. The deterministic flow of the fault isolation scheme guarantees success in a bounded number of reconfigurations of the FPGA fabric. FaDReS is then extended to the Priority Using Resource Escalation (PURE) online redundancy scheme which considers fault-isolation latency and throughput trade-offs under a dynamic spare arrangement. While deep-submicron designs introduce new challenges, use of adaptive techniques are seen to provide several promising avenues for improving resilience. The scheme developed is demonstrated by hardware design of various signal processing circuits and their implementation on a Xilinx Virtex-4 FPGA device. These include a Discrete Cosine Transform (DCT) core, Motion Estimation (ME) engine, Finite Impulse Response (FIR) Filter, Support Vector Machine (SVM), and Advanced Encryption Standard (AES) blocks in addition to MCNC benchmark circuits. A iii significant reduction in power consumption is achieved ranging from 83% for low motion-activity scenes to 12.5% for high motion activity video scenes in a novel ME engine configuration. For a typical benchmark video sequence, PURE is shown to maintain a PSNR baseline near 32dB. The diagnosability, reconfiguration latency, and resource overhead of each approach is analyzed. Compared to previous alternatives, PURE maintains a PSNR within a difference of 4.02dB to 6.67dB from the fault-free baseline by escalating healthy resources to higher-priority signal processing functions. The results indicate the benefits of priority-aware resiliency over conventional redundancy approaches in terms of fault-recovery, power consumption, and resource-area requirements. Together, these provide a broad range of strategies to achieve autonomous recovery of reconfigurable logic devices under a variety of constraints, operating conditions, and optimization criteria
Nova combinação de hardware e de software para veículos de desporto automóvel baseada no processamento directo de funções gráficas
Doutoramento em Engenharia EletrónicaThe main motivation for the work presented here began with previously
conducted experiments with a programming concept at the time named
"Macro". These experiments led to the conviction that it would be possible to
build a system of engine control from scratch, which could eliminate many of
the current problems of engine management systems in a direct and intrinsic
way. It was also hoped that it would minimize the full range of software and
hardware needed to make a final and fully functional system.
Initially, this paper proposes to make a comprehensive survey of the state of
the art in the specific area of software and corresponding hardware of
automotive tools and automotive ECUs. Problems arising from such software
will be identified, and it will be clear that practically all of these problems stem
directly or indirectly from the fact that we continue to make comprehensive use
of extremely long and complex "tool chains". Similarly, in the hardware, it will
be argued that the problems stem from the extreme complexity and
inter-dependency inside processor architectures. The conclusions are
presented through an extensive list of "pitfalls" which will be thoroughly
enumerated, identified and characterized.
Solutions will also be proposed for the various current issues and for the
implementation of these same solutions. All this final work will be part of a
"proof-of-concept" system called "ECU2010". The central element of this
system is the before mentioned "Macro" concept, which is an graphical block
representing one of many operations required in a automotive system having
arithmetic, logic, filtering, integration, multiplexing functions among others. The
end result of the proposed work is a single tool, fully integrated, enabling the
development and management of the entire system in one simple visual
interface. Part of the presented result relies on a hardware platform fully
adapted to the software, as well as enabling high flexibility and scalability in
addition to using exactly the same technology for ECU, data logger and
peripherals alike.
Current systems rely on a mostly evolutionary path, only allowing online
calibration of parameters, but never the online alteration of their own
automotive functionality algorithms. By contrast, the system developed and
described in this thesis had the advantage of following a "clean-slate"
approach, whereby everything could be rethought globally. In the end, out of all
the system characteristics, "LIVE-Prototyping" is the most relevant feature,
allowing the adjustment of automotive algorithms (eg. Injection, ignition,
lambda control, etc.) 100% online, keeping the engine constantly working,
without ever having to stop or reboot to make such changes. This consequently
eliminates any "turnaround delay" typically present in current automotive
systems, thereby enhancing the efficiency and handling of such systems.A principal motivação para o trabalho que conduziu a esta tese residiu na
constatação de que os actuais métodos de modelação de centralinas
automóveis conduzem a significativos problemas de desenvolvimento e
manutenção. Como resultado dessa constatação, o objectivo deste trabalho
centrou-se no desenvolvimento de um conceito de arquitectura que rompe
radicalmente com os modelos state-of-the-art e que assenta num conjunto de
conceitos que vieram a ser designados de "Macro" e "Celular ECU". Com este
modelo pretendeu-se simultaneamente minimizar a panóplia de software e de
hardware necessários à obtenção de uma sistema funcional final.
Inicialmente, esta tese propõem-se fazer um levantamento exaustivo do
estado da arte na área específica do software e correspondente hardware das
ferramentas e centralinas automóveis. Os problemas decorrentes de tal
software serão identificados e, dessa identificação deverá ficar claro, que
praticamente todos esses problemas têm origem directa ou indirecta no facto
de se continuar a fazer um uso exaustivo de "tool chains" extremamente
compridas e complexas. De forma semelhante, no hardware, os problemas
têm origem na extrema complexidade e inter-dependência das arquitecturas
dos processadores. As consequências distribuem-se por uma extensa lista de
"pitfalls" que também serão exaustivamente enumeradas, identificadas e
caracterizadas.
São ainda propostas soluções para os diversos problemas actuais e
correspondentes implementações dessas mesmas soluções. Todo este
trabalho final faz parte de um sistema "proof-of-concept" designado
"ECU2010". O elemento central deste sistema é o já referido conceito de
“Macro”, que consiste num bloco gráfico que representa uma de muitas
operações necessárias num sistema automóvel, como sejam funções
aritméticas, lógicas, de filtragem, de integração, de multiplexagem, entre
outras. O resultado final do trabalho proposto assenta numa única ferramenta,
totalmente integrada que permite o desenvolvimento e gestão de todo o
sistema de forma simples numa única interface visual. Parte do resultado
apresentado assenta numa plataforma hardware totalmente adaptada ao
software, bem como na elevada flexibilidade e escalabilidade, para além de
permitir a utilização de exactamente a mesma tecnologia quer para a
centralina, como para o datalogger e para os periféricos.
Os sistemas actuais assentam num percurso maioritariamente evolutivo,
apenas permitindo a calibração online de parâmetros, mas nunca a alteração
online dos próprios algoritmos das funcionalidades automóveis. Pelo contrário,
o sistema desenvolvido e descrito nesta tese apresenta a vantagem de seguir
um "clean-slate approach", pelo que tudo pode ser globalmente repensado. No
final e para além de todas as restantes características, o
“LIVE-PROTOTYPING” é a funcionalidade mais relevante, ao permitir alterar
algoritmos automóveis (ex: injecção, ignição, controlo lambda, etc.) de forma
100% online, mantendo o motor constantemente a trabalhar e sem nunca ter
de o parar ou re-arrancar para efectuar tais alterações. Isto elimina
consequentemente qualquer "turnaround delay" tipicamente presente em
qualquer sistema automóvel actual, aumentando de forma significativa a
eficiência global do sistema e da sua utilização
- …