381 research outputs found

    High performance and error resilient probabilistic inference system for machine learning

    Get PDF
    Many real-world machine learning applications can be considered as inferring the best label assignment of maximum a posteriori probability (MAP) problems. Since these MAP problems are NP-hard in general, they are often dealt with using approximate inference algorithms on Markov random field (MRF) such as belief propagation (BP). However, this approximate inference is still computationally demanding, and thus custom hardware accelerators have been attractive for high performance and energy efficiency. There are various custom hardware implementations that employ BP to achieve reasonable performance for the real-world applications such as stereo matching. Due to lack of convergence guarantees, however, BP often fails to provide the right answer, thus degrading performance of the hardware. Therefore, we consider sequential tree-reweighted message passing (TRW-S), which avoids many of these convergence problems with BP via sequential execution of its computations but challenges parallel implementation for high throughput. In this work, therefore, we propose a novel streaming hardware architecture that parallelizes the sequential computations of TRW-S. Experimental results on stereo matching benchmarks show promising performance of our hardware implementation compared to the software implementation as well as other BP-based custom hardware or GPU implementations. From this result, we further demonstrate video-rate speed and high quality stereo matching using a hybrid CPU+FPGA platform. We propose three frame-level optimization techniques to fully exploit computational resources of a hybrid CPU+FPGA platform and achieve significant speed-up. We first propose a message reuse scheme which is guided by simple scene change detection. This scheme allows a current inference to be made based on a determination of whether the current result is expected to be similar to the inference result of the previous frame. We also consider frame level parallelization to process multiple frames in parallel using multiple FPGAs available in the platform. This parallelized hardware procedure is further pipelined with data management in CPU to overlap the execution time of the two and thereby reduce the entire processing time of the stereo video sequence. From experimental results with the real-world stereo video sequences, we see video-rate speed of our stereo matching system for QVGA stereo videos. Next, we consider error resilience of the message passing hardware for energy efficient hardware implementation. Modern nanoscale CMOS process technologies suffer in reliability caused by process, temperature and voltage variations. Conventional approaches to deal with such unreliability (e.g., design for the worst-case scenario) are complex and inefficient in terms of hardware resources and energy consumption. As machine learning applications are inherently probabilistic and robust to errors, statistical error compensation (SEC) techniques can play a significant role in achieving robust and energy-efficient implementation. SEC embraces the statistical nature of errors and utilizes statistical and probabilistic techniques to build robust systems. Energy-efficiency is obtained by trading off the enhanced robustness with energy. In this work, we analyze the error resilience of our message passing inference hardware subject to the hardware errors (e.g. errors caused by timing violation in circuits) and explore application of a popular SEC technique, algorithmic noise tolerance (ANT), to this hardware. Analysis and simulations show that the TRW-S message passing hardware is tolerant to small magnitude arithmetic errors, but large magnitude errors cause significantly inaccurate inference results which need to be corrected using SEC. Experimental results show that the proposed ANT-based hardware can tolerate an error rate of 21.3%, with performance degradation of only 3.5 % with an energy savings of 39.7 %, compared to an error-free hardware. Lastly, we extend our TRW-S hardware toward a general purpose machine learning framework. We propose advanced streaming architecture with flexible choice of MRF setting to achieve 10-40x speedup across a variety of computer vision applications. Furthermore, we provide better theoretical understanding of error resiliency of TRW-S, and of the implication of ANT for TRW-S, under more general MRF setting, along with strong empirical support

    MRF Stereo Matching with Statistical Estimation of Parameters

    Get PDF
    For about the last ten years, stereo matching in computer vision has been treated as a combinatorial optimization problem. Assuming that the points in stereo images form a Markov Random Field (MRF), a variety of combinatorial optimization algorithms has been developed to optimize their underlying cost functions. In many of these algorithms, the MRF parameters of the cost functions have often been manually tuned or heuristically determined for achieving good performance results. Recently, several algorithms for statistical, hence, automatic estimation of the parameters have been published. Overall, these algorithms perform well in labeling, but they lack in performance for handling discontinuity in labeling along the surface borders. In this dissertation, we develop an algorithm for optimization of the cost function with automatic estimation of the MRF parameters – the data and smoothness parameters. Both the parameters are estimated statistically and applied in the cost function with support of adaptive neighborhood defined based on color similarity. With the proposed algorithm, discontinuity handling with higher consistency than of the existing algorithms is achieved along surface borders. The data parameters are pre-estimated from one of the stereo images by applying a hypothesis, called noise equivalence hypothesis, to eliminate interdependency between the estimations of the data and smoothness parameters. The smoothness parameters are estimated applying a combination of maximum likelihood and disparity gradient constraint, to eliminate nested inference for the estimation. The parameters for handling discontinuities in data and smoothness are defined statistically as well. We model cost functions to match the images symmetrically for improved matching performance and also to detect occlusions. Finally, we fill the occlusions in the disparity map by applying several existing and proposed algorithms and show that our best proposed segmentation based least squares algorithm performs better than the existing algorithms. We conduct experiments with the proposed algorithm on publicly available ground truth test datasets provided by the Middlebury College. Experiments show that results better than the existing algorithms’ are delivered by the proposed algorithm having the MRF parameters estimated automatically. In addition, applying the parameter estimation technique in existing stereo matching algorithm, we observe significant improvement in computational time

    Event-based Vision: A Survey

    Get PDF
    Event cameras are bio-inspired sensors that differ from conventional frame cameras: Instead of capturing images at a fixed rate, they asynchronously measure per-pixel brightness changes, and output a stream of events that encode the time, location and sign of the brightness changes. Event cameras offer attractive properties compared to traditional cameras: high temporal resolution (in the order of microseconds), very high dynamic range (140 dB vs. 60 dB), low power consumption, and high pixel bandwidth (on the order of kHz) resulting in reduced motion blur. Hence, event cameras have a large potential for robotics and computer vision in challenging scenarios for traditional cameras, such as low-latency, high speed, and high dynamic range. However, novel methods are required to process the unconventional output of these sensors in order to unlock their potential. This paper provides a comprehensive overview of the emerging field of event-based vision, with a focus on the applications and the algorithms developed to unlock the outstanding properties of event cameras. We present event cameras from their working principle, the actual sensors that are available and the tasks that they have been used for, from low-level vision (feature detection and tracking, optic flow, etc.) to high-level vision (reconstruction, segmentation, recognition). We also discuss the techniques developed to process events, including learning-based techniques, as well as specialized processors for these novel sensors, such as spiking neural networks. Additionally, we highlight the challenges that remain to be tackled and the opportunities that lie ahead in the search for a more efficient, bio-inspired way for machines to perceive and interact with the world

    Gaussian Belief Propagation on a Field-Programmable Gate Array for Solving Linear Equation Systems

    Get PDF
    Solving Linear Equation System (LESs) is a common problem in numerous fields of science. Even though the problem is well studied and powerful solvers are available nowadays, solving LES is still a bottleneck in many numerical applications concerning computation time. This issue especially pertains to applications in mobile robotics constrained by real-time requirements, where on-top power consumption and weight play an important role. This paper provides a general framework to approximately solve large LESs by Gaussian Belief Propagation (GaBP), which is extremely suitable for parallelization and implementation in hardware on a Field-Programmable Gate Array (FPGA). We derive the simple update rules of the Message Passing Algorithm for GaBP and show how to implement the approach efficiently on a System on a Programmable Chip (SoPC). In particular, multiple dedicated co-processors take care of recurring computations in GaBP. Exploiting multiple Direct Memory Access (DMA) controllers in scatter-gather mode and available arithmetic logic slices for numerical calculations accelerate the algorithm. Presented evaluations demonstrate that the approach does not only provide an accurate approximative solution of the LES. It also outperforms traditional solvers with respect to computation time for certain LESs

    Event-based neuromorphic stereo vision

    Full text link

    Specialised global methods for binocular and trinocular stereo matching

    Get PDF
    The problem of estimating depth from two or more images is a fundamental problem in computer vision, which is commonly referred as to stereo matching. The applications of stereo matching range from 3D reconstruction to autonomous robot navigation. Stereo matching is particularly attractive for applications in real life because of its simplicity and low cost, especially compared to costly laser range finders/scanners, such as for the case of 3D reconstruction. However, stereo matching has its very unique problems like convergence issues in the optimisation methods, and challenges to find matches accurately due to changes in lighting conditions, occluded areas, noisy images, etc. It is precisely because of these challenges that stereo matching continues to be a very active field of research. In this thesis we develop a binocular stereo matching algorithm that works with rectified images (i.e. scan lines in two images are aligned) to find a real valued displacement (i.e. disparity) that best matches two pixels. To accomplish this our research has developed techniques to efficiently explore a 3D space, compare potential matches, and an inference algorithm to assign the optimal disparity to each pixel in the image. The proposed approach is also extended to the trinocular case. In particular, the trinocular extension deals with a binocular set of images captured at the same time and a third image displaced in time. This approach is referred as to t +1 trinocular stereo matching, and poses the challenge of recovering camera motion, which is addressed by a novel technique we call baseline recovery. We have extensively validated our binocular and trinocular algorithms using the well known KITTI and Middlebury data sets. The performance of our algorithms is consistent across different data sets, and its performance is among the top performers in the KITTI and Middlebury datasets. The time-stamped results of our algorithms as reported in this thesis can be found at: • LCU on Middlebury V2 (https://web.archive.org/web/20150106200339/http://vision.middlebury. edu/stereo/eval/). • LCU on Middlebury V3 (https://web.archive.org/web/20150510133811/http://vision.middlebury. edu/stereo/eval3/). • LPU on Middlebury V3 (https://web.archive.org/web/20161210064827/http://vision.middlebury. edu/stereo/eval3/). • LPU on KITTI 2012 (https://web.archive.org/web/20161106202908/http://cvlibs.net/datasets/ kitti/eval_stereo_flow.php?benchmark=stereo). • LPU on KITTI 2015 (https://web.archive.org/web/20161010184245/http://cvlibs.net/datasets/ kitti/eval_scene_flow.php?benchmark=stereo). • TBR on KITTI 2012 (https://web.archive.org/web/20161230052942/http://cvlibs.net/datasets/ kitti/eval_stereo_flow.php?benchmark=stereo)

    A novel approach for the hardware implementation of a PPMC statistical data compressor

    Get PDF
    This thesis aims to understand how to design high-performance compression algorithms suitable for hardware implementation and to provide hardware support for an efficient compression algorithm. Lossless data compression techniques have been developed to exploit the available bandwidth of applications in data communications and computer systems by reducing the amount of data they transmit or store. As the amount of data to handle is ever increasing, traditional methods for compressing data become· insufficient. To overcome this problem, more powerful methods have been developed. Among those are the so-called statistical data compression methods that compress data based on their statistics. However, their high complexity and space requirements have prevented their hardware implementation and the full exploitation of their potential benefits. This thesis looks into the feasibility of the hardware implementation of one of these statistical data compression methods by exploring the potential for reorganising and restructuring the method for hardware implementation and investigating ways of achieving efficient and effective designs to achieve an efficient and cost-effective algorithm. [Continues.

    Message passing algorithms - methods and applications

    Get PDF
    Algorithms on graphs are used extensively in many applications and research areas. Such applications include machine learning, artificial intelligence, communications, image processing, state tracking, sensor networks, sensor fusion, distributed cooperative estimation, and distributed computation. Among the types of algorithms that employ some kind of message passing over the connections in a graph, the work in this dissertation will consider belief propagation and gossip consensus algorithms. We begin by considering the marginalization problem on factor graphs, which is often solved or approximated with Sum-Product belief propagation (BP) over the edges of the factor graph. For the case of sensor networks, where the conservation of energy is of critical importance and communication overhead can quickly drain this valuable resource, we present techniques for specifically addressing the needs of this low power scenario. We create a number of alternatives to Sum-Product BP. The first of these is a generalization of Stochastic BP with reduced setup time. We then present Projected BP, where a subset of elements from each message is transmitted between nodes, and computational savings are realized in proportion to the reduction in size of the transmitted messages. Zoom BP is a derivative of Projected BP that focuses particularly on utilizing low bandwidth discrete channels. We give the results of experiments that show the practical advantages of our alternatives to Sum-Product BP. We then proceed with an application of Sum-Product BP in sequential investment. We combine various insights from universal portfolios research in order to construct more sophisticated algorithms that take into account transaction costs. In particular, we use the insights of Blum and Kalai's transaction costs algorithm to take these costs into account in Cover and Ordentlich's side information portfolio and Kozat and Singer's switching portfolio. This involves carefully designing a set of causal portfolio strategies and computing a convex combination of these according to a carefully designed distribution. Universal (sublinear regret) performance bounds for each of these portfolios show that the algorithms asymptotically achieve the wealth of the best strategy from the corresponding portfolio strategy set, to first order in the exponent. The Sum-Product algorithm on factor graph representations of the universal investment algorithms provides computationally tractable approximations to the investment strategies. Finally, we present results of simulations of our algorithms and compare them to other portfolios. We then turn our attention to gossip consensus and distributed estimation algorithms. Specifically, we consider the problem of estimating the parameters in a model of an agent's observations when it is known that the population as a whole is partitioned into a number of subpopulations, each of which has model parameters that are common among the member agents. We develop a method for determining the beneficial communication links in the network, which involves maintaining non-cooperative parameter estimates at each agent, and the distance of this estimate is compared with those of the neighbors to determine time-varying connectivity. We also study the expected squared estimation error of our algorithm, showing that estimates are asymptotically as good as centralized estimation, and we study the short term error convergence behavior. Finally, we examine the metrics used to guide the design of data converters in the setting of digital communications. The usual analog to digital converters (ADC) performance metrics---effective number of bits (ENOB), total harmonic distortion (THD), signal to noise and distortion ratio (SNDR), and spurious free dynamic range (SFDR)---are all focused on the faithful reproduction of observed waveforms, which is not of fundamental concern if the data converter is to be used in a digital communications system. Therefore, we propose other information-centric rather than waveform-centric metrics that are better aligned with the goal of communications. We provide computational methods for calculating the values of these metrics, some of which are derived from Sum-Product BP or related algorithms. We also propose Statistics Gathering Converters (SGCs), which represent a change in perspective on data conversion for communications applications away from signal representation and towards the collection of relevant statistics for the purposes of decision making and detection. We show how to develop algorithms for the detection of transmitted data when the transmitted signal is received by an SGC. Finally, we provide evidence for the benefits of using system-level metrics and statistics gathering converters in communications applications
    • …
    corecore