154 research outputs found

    High performance and error resilient probabilistic inference system for machine learning

    Get PDF
    Many real-world machine learning applications can be considered as inferring the best label assignment of maximum a posteriori probability (MAP) problems. Since these MAP problems are NP-hard in general, they are often dealt with using approximate inference algorithms on Markov random field (MRF) such as belief propagation (BP). However, this approximate inference is still computationally demanding, and thus custom hardware accelerators have been attractive for high performance and energy efficiency. There are various custom hardware implementations that employ BP to achieve reasonable performance for the real-world applications such as stereo matching. Due to lack of convergence guarantees, however, BP often fails to provide the right answer, thus degrading performance of the hardware. Therefore, we consider sequential tree-reweighted message passing (TRW-S), which avoids many of these convergence problems with BP via sequential execution of its computations but challenges parallel implementation for high throughput. In this work, therefore, we propose a novel streaming hardware architecture that parallelizes the sequential computations of TRW-S. Experimental results on stereo matching benchmarks show promising performance of our hardware implementation compared to the software implementation as well as other BP-based custom hardware or GPU implementations. From this result, we further demonstrate video-rate speed and high quality stereo matching using a hybrid CPU+FPGA platform. We propose three frame-level optimization techniques to fully exploit computational resources of a hybrid CPU+FPGA platform and achieve significant speed-up. We first propose a message reuse scheme which is guided by simple scene change detection. This scheme allows a current inference to be made based on a determination of whether the current result is expected to be similar to the inference result of the previous frame. We also consider frame level parallelization to process multiple frames in parallel using multiple FPGAs available in the platform. This parallelized hardware procedure is further pipelined with data management in CPU to overlap the execution time of the two and thereby reduce the entire processing time of the stereo video sequence. From experimental results with the real-world stereo video sequences, we see video-rate speed of our stereo matching system for QVGA stereo videos. Next, we consider error resilience of the message passing hardware for energy efficient hardware implementation. Modern nanoscale CMOS process technologies suffer in reliability caused by process, temperature and voltage variations. Conventional approaches to deal with such unreliability (e.g., design for the worst-case scenario) are complex and inefficient in terms of hardware resources and energy consumption. As machine learning applications are inherently probabilistic and robust to errors, statistical error compensation (SEC) techniques can play a significant role in achieving robust and energy-efficient implementation. SEC embraces the statistical nature of errors and utilizes statistical and probabilistic techniques to build robust systems. Energy-efficiency is obtained by trading off the enhanced robustness with energy. In this work, we analyze the error resilience of our message passing inference hardware subject to the hardware errors (e.g. errors caused by timing violation in circuits) and explore application of a popular SEC technique, algorithmic noise tolerance (ANT), to this hardware. Analysis and simulations show that the TRW-S message passing hardware is tolerant to small magnitude arithmetic errors, but large magnitude errors cause significantly inaccurate inference results which need to be corrected using SEC. Experimental results show that the proposed ANT-based hardware can tolerate an error rate of 21.3%, with performance degradation of only 3.5 % with an energy savings of 39.7 %, compared to an error-free hardware. Lastly, we extend our TRW-S hardware toward a general purpose machine learning framework. We propose advanced streaming architecture with flexible choice of MRF setting to achieve 10-40x speedup across a variety of computer vision applications. Furthermore, we provide better theoretical understanding of error resiliency of TRW-S, and of the implication of ANT for TRW-S, under more general MRF setting, along with strong empirical support

    Event-Driven Technologies for Reactive Motion Planning: Neuromorphic Stereo Vision and Robot Path Planning and Their Application on Parallel Hardware

    Get PDF
    Die Robotik wird immer mehr zu einem SchlĂŒsselfaktor des technischen Aufschwungs. Trotz beeindruckender Fortschritte in den letzten Jahrzehnten, ĂŒbertreffen Gehirne von SĂ€ugetieren in den Bereichen Sehen und Bewegungsplanung noch immer selbst die leistungsfĂ€higsten Maschinen. Industrieroboter sind sehr schnell und prĂ€zise, aber ihre Planungsalgorithmen sind in hochdynamischen Umgebungen, wie sie fĂŒr die Mensch-Roboter-Kollaboration (MRK) erforderlich sind, nicht leistungsfĂ€hig genug. Ohne schnelle und adaptive Bewegungsplanung kann sichere MRK nicht garantiert werden. Neuromorphe Technologien, einschließlich visueller Sensoren und Hardware-Chips, arbeiten asynchron und verarbeiten so raum-zeitliche Informationen sehr effizient. Insbesondere ereignisbasierte visuelle Sensoren sind konventionellen, synchronen Kameras bei vielen Anwendungen bereits ĂŒberlegen. Daher haben ereignisbasierte Methoden ein großes Potenzial, schnellere und energieeffizientere Algorithmen zur Bewegungssteuerung in der MRK zu ermöglichen. In dieser Arbeit wird ein Ansatz zur flexiblen reaktiven Bewegungssteuerung eines Roboterarms vorgestellt. Dabei wird die Exterozeption durch ereignisbasiertes Stereosehen erreicht und die Pfadplanung ist in einer neuronalen ReprĂ€sentation des Konfigurationsraums implementiert. Die Multiview-3D-Rekonstruktion wird durch eine qualitative Analyse in Simulation evaluiert und auf ein Stereo-System ereignisbasierter Kameras ĂŒbertragen. Zur Evaluierung der reaktiven kollisionsfreien Online-Planung wird ein Demonstrator mit einem industriellen Roboter genutzt. Dieser wird auch fĂŒr eine vergleichende Studie zu sample-basierten Planern verwendet. ErgĂ€nzt wird dies durch einen Benchmark von parallelen Hardwarelösungen wozu als Testszenario Bahnplanung in der Robotik gewĂ€hlt wurde. Die Ergebnisse zeigen, dass die vorgeschlagenen neuronalen Lösungen einen effektiven Weg zur Realisierung einer Robotersteuerung fĂŒr dynamische Szenarien darstellen. Diese Arbeit schafft eine Grundlage fĂŒr neuronale Lösungen bei adaptiven Fertigungsprozesse, auch in Zusammenarbeit mit dem Menschen, ohne Einbußen bei Geschwindigkeit und Sicherheit. Damit ebnet sie den Weg fĂŒr die Integration von dem Gehirn nachempfundener Hardware und Algorithmen in die Industrierobotik und MRK

    Lock-free multithreaded semi-global matching with an arbitrary number of path directions

    Get PDF
    This paper describes an efficient implementation of the semi-global matching (SGM) algorithm on multi-core processors that allows a nearly arbitrary number of path directions for the cost aggregation stage. The scanlines for each orientation are discretized iteratively once, and the regular substructures of the obtained template are reused and shifted to concurrently sum up the path cost in at most two sweeps per direction over the disparity space image. Since path overlaps do not occur at any time, no expensive thread synchronization will be needed. To further reduce the runtime on high counts of path directions, pixel-wise disparity gating is applied, and both the cost function and disparity loop of SGM are optimized using current single instruction multiple data (SIMD) intrinsics for two major CPU architectures. Performance evaluation of the proposed implementation on synthetic ground truth reveals a reduced height error if the number of aggregation directions is significantly increased or when the paths start with an angular offset. Overall runtime shows a speedup that is nearly linear to the number of available processors

    Discrete Optimization in Early Vision - Model Tractability Versus Fidelity

    Get PDF
    Early vision is the process occurring before any semantic interpretation of an image takes place. Motion estimation, object segmentation and detection are all parts of early vision, but recognition is not. Some models in early vision are easy to perform inference with---they are tractable. Others describe the reality well---they have high fidelity. This thesis improves the tractability-fidelity trade-off of the current state of the art by introducing new discrete methods for image segmentation and other problems of early vision. The first part studies pseudo-boolean optimization, both from a theoretical perspective as well as a practical one by introducing new algorithms. The main result is the generalization of the roof duality concept to polynomials of higher degree than two. Another focus is parallelization; discrete optimization methods for multi-core processors, computer clusters, and graphical processing units are presented. Remaining in an image segmentation context, the second part studies parametric problems where a set of model parameters and a segmentation are estimated simultaneously. For a small number of parameters these problems can still be optimally solved. One application is an optimal method for solving the two-phase Mumford-Shah functional. The third part shifts the focus to curvature regularization---where the commonly used length and area penalization is replaced by curvature in two and three dimensions. These problems can be discretized over a mesh and special attention is given to the mesh geometry. Specifically, hexagonal meshes in the plane are compared to square ones and a method for generating adaptive meshes is introduced and evaluated. The framework is then extended to curvature regularization of surfaces. Finally, the thesis is concluded by three applications to early vision problems: cardiac MRI segmentation, image registration, and cell classification

    Internet of Underwater Things and Big Marine Data Analytics -- A Comprehensive Survey

    Full text link
    The Internet of Underwater Things (IoUT) is an emerging communication ecosystem developed for connecting underwater objects in maritime and underwater environments. The IoUT technology is intricately linked with intelligent boats and ships, smart shores and oceans, automatic marine transportations, positioning and navigation, underwater exploration, disaster prediction and prevention, as well as with intelligent monitoring and security. The IoUT has an influence at various scales ranging from a small scientific observatory, to a midsized harbor, and to covering global oceanic trade. The network architecture of IoUT is intrinsically heterogeneous and should be sufficiently resilient to operate in harsh environments. This creates major challenges in terms of underwater communications, whilst relying on limited energy resources. Additionally, the volume, velocity, and variety of data produced by sensors, hydrophones, and cameras in IoUT is enormous, giving rise to the concept of Big Marine Data (BMD), which has its own processing challenges. Hence, conventional data processing techniques will falter, and bespoke Machine Learning (ML) solutions have to be employed for automatically learning the specific BMD behavior and features facilitating knowledge extraction and decision support. The motivation of this paper is to comprehensively survey the IoUT, BMD, and their synthesis. It also aims for exploring the nexus of BMD with ML. We set out from underwater data collection and then discuss the family of IoUT data communication techniques with an emphasis on the state-of-the-art research challenges. We then review the suite of ML solutions suitable for BMD handling and analytics. We treat the subject deductively from an educational perspective, critically appraising the material surveyed.Comment: 54 pages, 11 figures, 19 tables, IEEE Communications Surveys & Tutorials, peer-reviewed academic journa

    Parallel computing for brain simulation

    Get PDF
    [Abstract] Background: The human brain is the most complex system in the known universe, it is therefore one of the greatest mysteries. It provides human beings with extraordinary abilities. However, until now it has not been understood yet how and why most of these abilities are produced. Aims: For decades, researchers have been trying to make computers reproduce these abilities, focusing on both understanding the nervous system and, on processing data in a more efficient way than before. Their aim is to make computers process information similarly to the brain. Important technological developments and vast multidisciplinary projects have allowed creating the first simulation with a number of neurons similar to that of a human brain. Conclusion: This paper presents an up-to-date review about the main research projects that are trying to simulate and/or emulate the human brain. They employ different types of computational models using parallel computing: digital models, analog models and hybrid models. This review includes the current applications of these works, as well as future trends. It is focused on various works that look for advanced progress in Neuroscience and still others which seek new discoveries in Computer Science (neuromorphic hardware, machine learning techniques). Their most outstanding characteristics are summarized and the latest advances and future plans are presented. In addition, this review points out the importance of considering not only neurons: Computational models of the brain should also include glial cells, given the proven importance of astrocytes in information processing.Galicia. ConsellerĂ­a de Cultura, EducaciĂłn e OrdenaciĂłn Universitaria; GRC2014/049Galicia. ConsellerĂ­a de Cultura, EducaciĂłn e OrdenaciĂłn Universitaria; R2014/039Instituto de Salud Carlos III; PI13/0028

    Object detection and recognition with event driven cameras

    Get PDF
    This thesis presents study, analysis and implementation of algorithms to perform object detection and recognition using an event-based cam era. This sensor represents a novel paradigm which opens a wide range of possibilities for future developments of computer vision. In partic ular it allows to produce a fast, compressed, illumination invariant output, which can be exploited for robotic tasks, where fast dynamics and signi\ufb01cant illumination changes are frequent. The experiments are carried out on the neuromorphic version of the iCub humanoid platform. The robot is equipped with a novel dual camera setup mounted directly in the robot\u2019s eyes, used to generate data with a moving camera. The motion causes the presence of background clut ter in the event stream. In such scenario the detection problem has been addressed with an at tention mechanism, speci\ufb01cally designed to respond to the presence of objects, while discarding clutter. The proposed implementation takes advantage of the nature of the data to simplify the original proto object saliency model which inspired this work. Successively, the recognition task was \ufb01rst tackled with a feasibility study to demonstrate that the event stream carries su\ufb03cient informa tion to classify objects and then with the implementation of a spiking neural network. The feasibility study provides the proof-of-concept that events are informative enough in the context of object classi\ufb01 cation, whereas the spiking implementation improves the results by employing an architecture speci\ufb01cally designed to process event data. The spiking network was trained with a three-factor local learning rule which overcomes weight transport, update locking and non-locality problem. The presented results prove that both detection and classi\ufb01cation can be carried-out in the target application using the event data

    Learning from minimally labeled data with accelerated convolutional neural networks

    Get PDF
    The main objective of an Artificial Vision Algorithm is to design a mapping function that takes an image as an input and correctly classifies it into one of the user-determined categories. There are several important properties to be satisfied by the mapping function for visual understanding. First, the function should produce good representations of the visual world, which will be able to recognize images independently of pose, scale and illumination. Furthermore, the designed artificial vision system has to learn these representations by itself. Recent studies on Convolutional Neural Networks (ConvNets) produced promising advancements in visual understanding. These networks attain significant performance upgrades by relying on hierarchical structures inspired by biological vision systems. In my research, I work mainly in two areas: 1) how ConvNets can be programmed to learn the optimal mapping function using the minimum amount of labeled data, and 2) how these networks can be accelerated for practical purposes. In this work, algorithms that learn from unlabeled data are studied. A new framework that exploits unlabeled data is proposed. The proposed framework obtains state-of-the-art performance results in different tasks. Furthermore, this study presents an optimized streaming method for ConvNets’ hardware accelerator on an embedded platform. It is tested on object classification and detection applications using ConvNets. Experimental results indicate high computational efficiency, and significant performance upgrades over all other existing platforms

    GPU Computing for Cognitive Robotics

    Get PDF
    This thesis presents the first investigation of the impact of GPU computing on cognitive robotics by providing a series of novel experiments in the area of action and language acquisition in humanoid robots and computer vision. Cognitive robotics is concerned with endowing robots with high-level cognitive capabilities to enable the achievement of complex goals in complex environments. Reaching the ultimate goal of developing cognitive robots will require tremendous amounts of computational power, which was until recently provided mostly by standard CPU processors. CPU cores are optimised for serial code execution at the expense of parallel execution, which renders them relatively inefficient when it comes to high-performance computing applications. The ever-increasing market demand for high-performance, real-time 3D graphics has evolved the GPU into a highly parallel, multithreaded, many-core processor extraordinary computational power and very high memory bandwidth. These vast computational resources of modern GPUs can now be used by the most of the cognitive robotics models as they tend to be inherently parallel. Various interesting and insightful cognitive models were developed and addressed important scientific questions concerning action-language acquisition and computer vision. While they have provided us with important scientific insights, their complexity and application has not improved much over the last years. The experimental tasks as well as the scale of these models are often minimised to avoid excessive training times that grow exponentially with the number of neurons and the training data. This impedes further progress and development of complex neurocontrollers that would be able to take the cognitive robotics research a step closer to reaching the ultimate goal of creating intelligent machines. This thesis presents several cases where the application of the GPU computing on cognitive robotics algorithms resulted in the development of large-scale neurocontrollers of previously unseen complexity enabling the conducting of the novel experiments described herein.European Commission Seventh Framework Programm
