997 research outputs found

    Blending Learning and Inference in Structured Prediction

    Full text link
    In this paper we derive an efficient algorithm to learn the parameters of structured predictors in general graphical models. This algorithm blends the learning and inference tasks, which results in a significant speedup over traditional approaches, such as conditional random fields and structured support vector machines. For this purpose we utilize the structures of the predictors to describe a low dimensional structured prediction task which encourages local consistencies within the different structures while learning the parameters of the model. Convexity of the learning task provides the means to enforce the consistencies between the different parts. The inference-learning blending algorithm that we propose is guaranteed to converge to the optimum of the low dimensional primal and dual programs. Unlike many of the existing approaches, the inference-learning blending allows us to learn efficiently high-order graphical models, over regions of any size, and very large number of parameters. We demonstrate the effectiveness of our approach, while presenting state-of-the-art results in stereo estimation, semantic segmentation, shape reconstruction, and indoor scene understanding

    Image understanding and feature extraction for applications in industry and mapping

    Get PDF
    Bibliography: p. 212-220.The aim of digital photogrammetry is the automated extraction and classification of the three dimensional information of a scene from a number of images. Existing photogrammetric systems are semi-automatic requiring manual editing and control, and have very limited domains of application so that image understanding capabilities are left to the user. Among the most important steps in a fully integrated system are the extraction of features suitable for matching, the establishment of the correspondence between matching points and object classification. The following study attempts to explore the applicability of pattern recognition concepts in conjunction with existing area-based methods, feature-based techniques and other approaches used in computer vision in order to increase the level of automation and as a general alternative and addition to existing methods. As an illustration of the pattern recognition approach examples of industrial applications are given. The underlying method is then extended to the identification of objects in aerial images of urban scenes and to the location of targets in close-range photogrammetric applications. Various moment-based techniques are considered as pattern classifiers including geometric invariant moments, Legendre moments, Zernike moments and pseudo-Zernike moments. Two-dimensional Fourier transforms are also considered as pattern classifiers. The suitability of these techniques is assessed. These are then applied as object locators and as feature extractors or interest operators. Additionally the use of fractal dimension to segment natural scenes for regional classification in order to limit the search space for particular objects is considered. The pattern recognition techniques require considerable preprocessing of images. The various image processing techniques required are explained where needed. Extracted feature points are matched using relaxation based techniques in conjunction with area-based methods to 'obtain subpixel accuracy. A subpixel pattern recognition based method is also proposed and an investigation into improved area-based subpixel matching methods is undertaken. An algorithm for determining relative orientation parameters incorporating the epipolar line constraint is investigated and compared with a standard relative orientation algorithm. In conclusion a basic system that can be automated based on some novel techniques in conjunction with existing methods is described and implemented in a mapping application. This system could be largely automated with suitably powerful computers

    Offshore stereo measurements of gravity waves

    Get PDF
    Stereo video techniques are effective for estimating the space-time wave dynamics over an area of the ocean. Indeed, a stereo camera view allows retrieval of both spatial and temporal data whose statistical content is richer than that of time series data retrieved from point wave probes. To prove this, we consider an application of the Wave Acquisition Stereo System (WASS) for the analysis of offshore video measurements of gravity waves in the Northern Adriatic Sea. In particular, we deployed WASS at the oceanographic platform Acqua Alta, off the Venice coast, Italy. Three experimental studies were performed, and the overlapping field of view of the acquired stereo images covered an area of approximately 1100 m2. Analysis of the WASS measurements show that the sea surface can be accurately estimated in space and time together, yielding associated directional spectra and wave statistics that agree well with theoretical models. From the observed wavenumber-frequency spectrum one can also predict the vertical profile of the current flow underneath the wave surface. Finally, future improvements of WASS and applications are discussed

    Patch-based Denoising Algorithms for Single and Multi-view Images

    Get PDF
    In general, all single and multi-view digital images are captured using sensors, where they are often contaminated with noise, which is an undesired random signal. Such noise can also be produced during transmission or by lossy image compression. Reducing the noise and enhancing those images is among the fundamental digital image processing tasks. Improving the performance of image denoising methods, would greatly contribute to single or multi-view image processing techniques, e.g. segmentation, computing disparity maps, etc. Patch-based denoising methods have recently emerged as the state-of-the-art denoising approaches for various additive noise levels. This thesis proposes two patch-based denoising methods for single and multi-view images, respectively. A modification to the block matching 3D algorithm is proposed for single image denoising. An adaptive collaborative thresholding filter is proposed which consists of a classification map and a set of various thresholding levels and operators. These are exploited when the collaborative hard-thresholding step is applied. Moreover, the collaborative Wiener filtering is improved by assigning greater weight when dealing with similar patches. For the denoising of multi-view images, this thesis proposes algorithms that takes a pair of noisy images captured from two different directions at the same time (stereoscopic images). The structural, maximum difference or the singular value decomposition-based similarity metrics is utilized for identifying locations of similar search windows in the input images. The non-local means algorithm is adapted for filtering these noisy multi-view images. The performance of both methods have been evaluated both quantitatively and qualitatively through a number of experiments using the peak signal-to-noise ratio and the mean structural similarity measure. Experimental results show that the proposed algorithm for single image denoising outperforms the original block matching 3D algorithm at various noise levels. Moreover, the proposed algorithm for multi-view image denoising can effectively reduce noise and assist to estimate more accurate disparity maps at various noise levels

    FPGA-Based Multimodal Embedded Sensor System Integrating Low- and Mid-Level Vision

    Get PDF
    Motion estimation is a low-level vision task that is especially relevant due to its wide range of applications in the real world. Many of the best motion estimation algorithms include some of the features that are found in mammalians, which would demand huge computational resources and therefore are not usually available in real-time. In this paper we present a novel bioinspired sensor based on the synergy between optical flow and orthogonal variant moments. The bioinspired sensor has been designed for Very Large Scale Integration (VLSI) using properties of the mammalian cortical motion pathway. This sensor combines low-level primitives (optical flow and image moments) in order to produce a mid-level vision abstraction layer. The results are described trough experiments showing the validity of the proposed system and an analysis of the computational resources and performance of the applied algorithms

    Stereo vision for facet type cameras

    Get PDF
    Ausgehend von den Facettenaugen der Insekten haben Wissenschaftler seit 10 Jahren viele kĂŒnstliche Facettenaugensysteme erstellt, die auf der Multi-Apertur-Optik basieren. Im Vergleich zu den auf Single-Apertur-Optik basierenden Systemen sind diese Systeme kleiner und leichter. Außerdem haben solche Systeme ein großes Sichtfeld und eine hohe Empfindlichkeit. Das eCley (Electronic cluster eye) ist ein neues kĂŒnstliches Facettenaugensystem, das Bilder mit Super-Pixel-Auflösung erstellen kann, welches vom Sehsystem der parasitĂ€ren Wespe „Xenos Peckii“ inspiriert ist. Wegen seiner ausgezeichneten FĂ€higkeiten sind eCley-Systeme in den Bereichen Ă€rztliche Untersuchung, IdentitĂ€tsauthentifizierung, Roboternavigation und Flugkörperlenkung angewendet worden. Aber solche Anwendungen basieren nur auf der Datenverarbeitung im 2D-Bereich. Wenn jedoch mit einem eCley-System rĂ€umliche 3D-Daten erzeugt werden können, kann man nur mit eCley 3D-Rekonstruktion, Lokalisierung und Entfernungsmessung erledigen, die man vorher mit anderen GerĂ€ten durchfĂŒhren musste. Zwar können je zwei horizontal benachbarte Mikrokameras im eCley als ein Stereo-Sehsystem genutzt werden, aber es ist nicht leicht, die rĂ€umlichen Informationen durch so kleine Kameras zu erhalten. Die von der Mikrokamera gemachten Fotos haben nur eine ziemlich niedrige Auflösung. Außerdem ist die TiefenverĂ€nderung der Szene kleiner als 1 Pixel, wenn die Entfernung grĂ¶ĂŸer als 86mm ist, d.h. dass viele verbreitete Algorithmen zum Stereosehen mit eCley nicht gut funktionieren können. Um die verbreiteten Stereosehalgorithmen mit dem eCley besser anwenden zu können, wurde eine neue Methode dafĂŒr im Bereich des Subpixel-Stereosehen erstellt. Diese Methode basiert auf der positiven Eigenschaft des eCleys, dass die Kanten des Ziels im eCley sehr gut behalten werden können. Im Übergang zwischen Bilder benachbarter Mikrokameras gibt es zahlreiche Tiefeninformationen. Mit diesen Tiefeninformationen kann der entsprechende Subpixelabstand ausgerechnet werden. Danach kann die Entfernung des Ziels mit dem Subpixelabstand berechnet werden. Aufgrund der Struktur des eCleys haben wir in dieser Doktorarbeit ein mathematisches Modell des Stereosehens fĂŒr eCley abgeleitet. Dazu werden die optische Ausrichtung und die geometrische Korrektur, die die Voraussetzungen zur prĂ€zisen Messung sind, diskutiert. Zum Schluss haben wir die Subpixel-Baseline-Methode, die auf der Helligkeit und den Gradienten basiert, und die Echtzeit-Messung fĂŒr den Subpixelabstand, die auf der Eigenschaft der Kanten basiert, entwickelt. Um unsere Methode zu ĂŒberprĂŒfen, haben wir viele kĂŒnstliche und reale Szenenbilder angewendet. Das Ergebnis zeigt, dass unsere Methode die Messung zum Subpixelabstand fĂŒr Stereopixelpaare ausgezeichnet realisiert hat. Außerdem funktioniert diese Methode in vielen komplexen Umgebungen robust. Das bedeutet, dass die Methode die FĂ€higkeit des eCleys verbessert hat, die 3D-Umgebung zu erkennen. Das eCley kann daher in verschiedenen 3D-Anwendungsbereichen eingesetzt werden.In the last decade, scientists have put forth many artificial compound eye systems, inspired by the compound eyes of all kinds of insects. These systems, employing multi-aperture optical systems instead of single-aperture optical systems, provide many specific characteristics, such as small volume, light weight, large view, and high sensitivity. Electronic cluster eye (eCley) is a state-of-the-art artificial superposition compound eye with super resolution, which is inspired by a wasp parasite called the Xenos Peckii. Thanks to the inherent characteristics of eCley, it has successfully been applied to aspects of medical inspection, personal identification, bank safety, robot navigation, and missile guidance. However, all these applications only involve a two-dimensional image space, i.e., no three-dimensional (3D) information is provided. Conceiving of the ability of detecting 3D space information using eCley, the performances of 3D reconstruction, object position, and distance measurement will be obtained easily from the single eCley rather than requiring extra depth information devices. In practice, there is a big challenge to implementing 3D space information detection in the minimized eCley, although structures similar to stereo vision exist in each pair of adjacent channels. In the case of an imaging channel with short focal length and low resolution, the determination of the depth information not only is an ill-posed problem but also varies in the range of one pixel from quite near distance (≥86 mm), which restricts the applicability of popular stereo matching algorithms to eCley. Taking aim at this limitation, and with the goal of satisfying the real demands of applications in eCley, this thesis mainly studies a novel method of subpixel stereo vision for eCley. This method utilizes the significant property of object edges still retained in eCley, i.e., the transitional areas of edges contain rich information including the depths or distances of objects, to determine subpixel distances of the corresponding pixel pairs in the adjacent channels, to further obtain the objects' depth information by employing the triangle relationship. In the whole thesis, I mainly deduce the mathematical model of stereo vision in eCley theoretically based on its special structure, discuss the optical correction and geometric calibration that are essential to high precision measurement, study the implementation of methods of the subpixel baselines for each pixel pair based on intensity information and gradient information in transitional areas, and eventually implement real-time subpixel distance measurement for objects through these edge features. To verify the various methods adopted, and to analyze the precision of these methods, I employ an artificial synthetical stereo channel image and a large number of real images captured in diverse scenes in my experiments. The results from either a process or the whole method prove that the proposed methods efficiently implement stereo vision in eCley and the measurement of the subpixel distance of stereo pixel pairs. Through a sensitivity analysis with respect to illumination, object distances, and pixel positions, I verify that the proposed method also performs robustly in many scenes. This stereo vision method extends the ability of perceiving 3D information in eCley, and makes it applicable to more comprehensive fields such as 3D object position, distance measurement, and 3D reconstruction

    Centralized and distributed semi-parametric compression of piecewise smooth functions

    No full text
    This thesis introduces novel wavelet-based semi-parametric centralized and distributed compression methods for a class of piecewise smooth functions. Our proposed compression schemes are based on a non-conventional transform coding structure with simple independent encoders and a complex joint decoder. Current centralized state-of-the-art compression schemes are based on the conventional structure where an encoder is relatively complex and nonlinear. In addition, the setting usually allows the encoder to observe the entire source. Recently, there has been an increasing need for compression schemes where the encoder is lower in complexity and, instead, the decoder has to handle more computationally intensive tasks. Furthermore, the setup may involve multiple encoders, where each one can only partially observe the source. Such scenario is often referred to as distributed source coding. In the first part, we focus on the dual situation of the centralized compression where the encoder is linear and the decoder is nonlinear. Our analysis is centered around a class of 1-D piecewise smooth functions. We show that, by incorporating parametric estimation into the decoding procedure, it is possible to achieve the same distortion- rate performance as that of a conventional wavelet-based compression scheme. We also present a new constructive approach to parametric estimation based on the sampling results of signals with finite rate of innovation. The second part of the thesis focuses on the distributed compression scenario, where each independent encoder partially observes the 1-D piecewise smooth function. We propose a new wavelet-based distributed compression scheme that uses parametric estimation to perform joint decoding. Our distortion-rate analysis shows that it is possible for the proposed scheme to achieve that same compression performance as that of a joint encoding scheme. Lastly, we apply the proposed theoretical framework in the context of distributed image and video compression. We start by considering a simplified model of the video signal and show that we can achieve distortion-rate performance close to that of a joint encoding scheme. We then present practical compression schemes for real world signals. Our simulations confirm the improvement in performance over classical schemes, both in terms of the PSNR and the visual quality
    • 

    corecore