94 research outputs found

    Optic Flow from Unstable Sequences containing Unconstrained Scenes through Local Velocity Constancy Maximization

    Full text link

    Computational Modeling of Human Dorsal Pathway for Motion Processing

    Get PDF
    Reliable motion estimation in videos is of crucial importance for background iden- tification, object tracking, action recognition, event analysis, self-navigation, etc. Re- constructing the motion field in the 2D image plane is very challenging, due to variations in image quality, scene geometry, lighting condition, and most importantly, camera jit- tering. Traditional optical flow models assume consistent image brightness and smooth motion field, which are violated by unstable illumination and motion discontinuities that are common in real world videos. To recognize observer (or camera) motion robustly in complex, realistic scenarios, we propose a biologically-inspired motion estimation system to overcome issues posed by real world videos. The bottom-up model is inspired from the infrastructure as well as functionalities of human dorsal pathway, and the hierarchical processing stream can be divided into three stages: 1) spatio-temporal processing for local motion, 2) recogni- tion for global motion patterns (camera motion), and 3) preemptive estimation of object motion. To extract effective and meaningful motion features, we apply a series of steer- able, spatio-temporal filters to detect local motion at different speeds and directions, in a way that\u27s selective of motion velocity. The intermediate response maps are cal- ibrated and combined to estimate dense motion fields in local regions, and then, local motions along two orthogonal axes are aggregated for recognizing planar, radial and circular patterns of global motion. We evaluate the model with an extensive, realistic video database that collected by hand with a mobile device (iPad) and the video content varies in scene geometry, lighting condition, view perspective and depth. We achieved high quality result and demonstrated that this bottom-up model is capable of extracting high-level semantic knowledge regarding self motion in realistic scenes. Once the global motion is known, we segment objects from moving backgrounds by compensating for camera motion. For videos captured with non-stationary cam- eras, we consider global motion as a combination of camera motion (background) and object motion (foreground). To estimate foreground motion, we exploit corollary dis- charge mechanism of biological systems and estimate motion preemptively. Since back- ground motions for each pixel are collectively introduced by camera movements, we apply spatial-temporal averaging to estimate the background motion at pixel level, and the initial estimation of foreground motion is derived by comparing global motion and background motion at multiple spatial levels. The real frame signals are compared with those derived by forward predictions, refining estimations for object motion. This mo- tion detection system is applied to detect objects with cluttered, moving backgrounds and is proved to be efficient in locating independently moving, non-rigid regions. The core contribution of this thesis is the invention of a robust motion estimation system for complicated real world videos, with challenges by real sensor noise, complex natural scenes, variations in illumination and depth, and motion discontinuities. The overall system demonstrates biological plausibility and holds great potential for other applications, such as camera motion removal, heading estimation, obstacle avoidance, route planning, and vision-based navigational assistance, etc

    A Neurocomputational Model of Smooth Pursuit Control to Interact with the Real World

    Get PDF
    Whether we want to drive a car, play a ball game, or even enjoy watching a flying bird, we need to track moving objects. This is possible via smooth pursuit eye movements (SPEMs), which maintain the image of the moving object on the fovea (i.e., a very small portion of the retina with high visual resolution). At first glance, performing an accurate SPEM by the brain may seem trivial. However, imperfect visual coding, processing and transmission delays, wide variety of object sizes, and background textures make the task challenging. Furthermore, the existence of distractors in the environment makes it even more complicated and it is no wonder why understanding SPEM has been a classic question of human motor control. To understand physiological systems of which SPEM is an example, creation of models has played an influential role. Models make quantitative predictions that can be tested in experiments. Therefore, modelling SPEM is not only valuable to learn neurobiological mechanisms of smooth pursuit or more generally gaze control but also beneficial to give insight into other sensory-motor functions. In this thesis, I present a neurocomputational SPEM model based on Neural Engineering Framework (NEF) to drive an eye-like robot. The model interacts with the real world in real time. It uses naturalistic images as input and by the use of spiking model neurons controls the robot. This work can be the first step towards more thorough validation of abstract SPEM control models. Besides, it is a small step toward neural models that drive robots to accomplish more intricate sensory-motor tasks such as reaching and grasping

    Humanistic Computing: WearComp as a New Framework and Application for Intelligent Signal Processing

    Get PDF
    Humanistic computing is proposed as a new signal processing framework in which the processing apparatus is inextricably intertwined with the natural capabilities of our human body and mind. Rather than trying to emulate human intelligence, humanistic computing recognizes that the human brain is perhaps the best neural network of its kind, and that there are many new signal processing applications (within the domain of personal technologies) that can make use of this excellent but often overlooked processor. The emphasis of this paper is on personal imaging applications of humanistic computing, to take a first step toward an intelligent wearable camera system that can allow us to effortlessly capture our day-to-day experiences, help us remember and see better, provide us with personal safety through crime reduction, and facilitate new forms of communication through collective connected humanistic computing. The author’s wearable signal processing hardware, which began as a cumbersome backpackbased photographic apparatus of the 1970’s and evolved into a clothing-based apparatus in the early 1980’s, currently provides the computational power of a UNIX workstation concealed within ordinary-looking eyeglasses and clothing. Thus it may be worn continuously during all facets of ordinary day-to-day living, so that, through long-term adaptation, it begins to function as a true extension of the mind and body

    Combinatorial Solutions for Shape Optimization in Computer Vision

    Get PDF
    This thesis aims at solving so-called shape optimization problems, i.e. problems where the shape of some real-world entity is sought, by applying combinatorial algorithms. I present several advances in this field, all of them based on energy minimization. The addressed problems will become more intricate in the course of the thesis, starting from problems that are solved globally, then turning to problems where so far no global solutions are known. The first two chapters treat segmentation problems where the considered grouping criterion is directly derived from the image data. That is, the respective data terms do not involve any parameters to estimate. These problems will be solved globally. The first of these chapters treats the problem of unsupervised image segmentation where apart from the image there is no other user input. Here I will focus on a contour-based method and show how to integrate curvature regularity into a ratio-based optimization framework. The arising optimization problem is reduced to optimizing over the cycles in a product graph. This problem can be solved globally in polynomial, effectively linear time. As a consequence, the method does not depend on initialization and translational invariance is achieved. This is joint work with Daniel Cremers and Simon Masnou. I will then proceed to the integration of shape knowledge into the framework, while keeping translational invariance. This problem is again reduced to cycle-finding in a product graph. Being based on the alignment of shape points, the method actually uses a more sophisticated shape measure than most local approaches and still provides global optima. It readily extends to tracking problems and allows to solve some of them in real-time. I will present an extension to highly deformable shape models which can be included in the global optimization framework. This method simultaneously allows to decompose a shape into a set of deformable parts, based only on the input images. This is joint work with Daniel Cremers. In the second part segmentation is combined with so-called correspondence problems, i.e. the underlying grouping criterion is now based on correspondences that have to be inferred simultaneously. That is, in addition to inferring the shapes of objects, one now also tries to put into correspondence the points in several images. The arising problems become more intricate and are no longer optimized globally. This part is divided into two chapters. The first chapter treats the topic of real-time motion segmentation where objects are identified based on the observations that the respective points in the video will move coherently. Rather than pre-estimating motion, a single energy functional is minimized via alternating optimization. The main novelty lies in the real-time capability, which is achieved by exploiting a fast combinatorial segmentation algorithm. The results are furthermore improved by employing a probabilistic data term. This is joint work with Daniel Cremers. The final chapter presents a method for high resolution motion layer decomposition and was developed in combination with Daniel Cremers and Thomas Pock. Layer decomposition methods support the notion of a scene model, which allows to model occlusion and enforce temporal consistency. The contributions are twofold: from a practical point of view the proposed method allows to recover fine-detailed layer images by minimizing a single energy. This is achieved by integrating a super-resolution method into the layer decomposition framework. From a theoretical viewpoint the proposed method introduces layer-based regularity terms as well as a graph cut-based scheme to solve for the layer domains. The latter is combined with powerful continuous convex optimization techniques into an alternating minimization scheme. Lastly I want to mention that a significant part of this thesis is devoted to the recent trend of exploiting parallel architectures, in particular graphics cards: many combinatorial algorithms are easily parallelized. In Chapter 3 we will see a case where the standard algorithm is hard to parallelize, but easy for the respective problem instances

    Variational image fusion

    Get PDF
    The main goal of this work is the fusion of multiple images to a single composite that offers more information than the individual input images. We approach those fusion tasks within a variational framework. First, we present iterative schemes that are well-suited for such variational problems and related tasks. They lead to efficient algorithms that are simple to implement and well-parallelisable. Next, we design a general fusion technique that aims for an image with optimal local contrast. This is the key for a versatile method that performs well in many application areas such as multispectral imaging, decolourisation, and exposure fusion. To handle motion within an exposure set, we present the following two-step approach: First, we introduce the complete rank transform to design an optic flow approach that is robust against severe illumination changes. Second, we eliminate remaining misalignments by means of brightness transfer functions that relate the brightness values between frames. Additional knowledge about the exposure set enables us to propose the first fully coupled method that jointly computes an aligned high dynamic range image and dense displacement fields. Finally, we present a technique that infers depth information from differently focused images. In this context, we additionally introduce a novel second order regulariser that adapts to the image structure in an anisotropic way.Das Hauptziel dieser Arbeit ist die Fusion mehrerer Bilder zu einem Einzelbild, das mehr Informationen bietet als die einzelnen Eingangsbilder. Wir verwirklichen diese Fusionsaufgaben in einem variationellen Rahmen. Zunächst präsentieren wir iterative Schemata, die sich gut für solche variationellen Probleme und verwandte Aufgaben eignen. Danach entwerfen wir eine Fusionstechnik, die ein Bild mit optimalem lokalen Kontrast anstrebt. Dies ist der Schlüssel für eine vielseitige Methode, die gute Ergebnisse für zahlreiche Anwendungsbereiche wie Multispektralaufnahmen, Bildentfärbung oder Belichtungsreihenfusion liefert. Um Bewegungen in einer Belichtungsreihe zu handhaben, präsentieren wir folgenden Zweischrittansatz: Zuerst stellen wir die komplette Rangtransformation vor, um eine optische Flussmethode zu entwerfen, die robust gegenüber starken Beleuchtungsänderungen ist. Dann eliminieren wir verbleibende Registrierungsfehler mit der Helligkeitstransferfunktion, welche die Helligkeitswerte zwischen Bildern in Beziehung setzt. Zusätzliches Wissen über die Belichtungsreihe ermöglicht uns, die erste vollständig gekoppelte Methode vorzustellen, die gemeinsam ein registriertes Hochkontrastbild sowie dichte Bewegungsfelder berechnet. Final präsentieren wir eine Technik, die von unterschiedlich fokussierten Bildern Tiefeninformation ableitet. In diesem Kontext stellen wir zusätzlich einen neuen Regularisierer zweiter Ordnung vor, der sich der Bildstruktur anisotrop anpasst
    • …
    corecore