30 research outputs found

    The World of Fast Moving Objects

    Full text link
    The notion of a Fast Moving Object (FMO), i.e. an object that moves over a distance exceeding its size within the exposure time, is introduced. FMOs may, and typically do, rotate with high angular speed. FMOs are very common in sports videos, but are not rare elsewhere. In a single frame, such objects are often barely visible and appear as semi-transparent streaks. A method for the detection and tracking of FMOs is proposed. The method consists of three distinct algorithms, which form an efficient localization pipeline that operates successfully in a broad range of conditions. We show that it is possible to recover the appearance of the object and its axis of rotation, despite its blurred appearance. The proposed method is evaluated on a new annotated dataset. The results show that existing trackers are inadequate for the problem of FMO localization and a new approach is required. Two applications of localization, temporal super-resolution and highlighting, are presented

    Transform recipes for efficient cloud photo enhancement

    Get PDF
    Cloud image processing is often proposed as a solution to the limited computing power and battery life of mobile devices: it allows complex algorithms to run on powerful servers with virtually unlimited energy supply. Unfortunately, this overlooks the time and energy cost of uploading the input and downloading the output images. When transfer overhead is accounted for, processing images on a remote server becomes less attractive and many applications do not benefit from cloud offloading. We aim to change this in the case of image enhancements that preserve the overall content of an image. Our key insight is that, in this case, the server can compute and transmit a description of the transformation from input to output, which we call a transform recipe. At equivalent quality, our recipes are much more compact than JPEG images: this reduces the client's download. Furthermore, recipes can be computed from highly compressed inputs which significantly reduces the data uploaded to the server. The client reconstructs a high-fidelity approximation of the output by applying the recipe to its local high-quality input. We demonstrate our results on 168 images and 10 image processing applications, showing that our recipes form a compact representation for a diverse set of image filters. With an equivalent transmission budget, they provide higher-quality results than JPEG-compressed input/output images, with a gain of the order of 10 dB in many cases. We demonstrate the utility of recipes on a mobile phone by profiling the energy consumption and latency for both local and cloud computation: a transform recipe-based pipeline runs 2--4x faster and uses 2--7x less energy than local or naive cloud computation.Qatar Computing Research InstituteUnited States. Defense Advanced Research Projects Agency (Agreement FA8750-14-2-0009)Stanford University. Stanford Pervasive Parallelism LaboratoryAdobe System

    Foundations, Inference, and Deconvolution in Image Restoration

    Get PDF
    Image restoration is a critical preprocessing step in computer vision, producing images with reduced noise, blur, and pixel defects. This enables precise higher-level reasoning as to the scene content in later stages of the vision pipeline (e.g., object segmentation, detection, recognition, and tracking). Restoration techniques have found extensive usage in a broad range of applications from industry, medicine, astronomy, biology, and photography. The recovery of high-grade results requires models of the image degradation process, giving rise to a class of often heavily underconstrained, inverse problems. A further challenge specific to the problem of blur removal is noise amplification, which may cause strong distortion by ringing artifacts. This dissertation presents new insights and problem solving procedures for three areas of image restoration, namely (1) model foundations, (2) Bayesian inference for high-order Markov random fields (MRFs), and (3) blind image deblurring (deconvolution). As basic research on model foundations, we contribute to reconciling the perceived differences between probabilistic MRFs on the one hand, and deterministic variational models on the other. To do so, we restrict the variational functional to locally supported finite elements (FE) and integrate over the domain. This yields a sum of terms depending locally on FE basis coefficients, and by identifying the latter with pixels, the terms resolve to MRF potential functions. In contrast with previous literature, we place special emphasis on robust regularizers used commonly in contemporary computer vision. Moreover, we draw samples from the derived models to further demonstrate the probabilistic connection. Another focal issue is a class of high-order Field of Experts MRFs which are learned generatively from natural image data and yield best quantitative results under Bayesian estimation. This involves minimizing an integral expression, which has no closed form solution in general. However, the MRF class under study has Gaussian mixture potentials, permitting expansion by indicator variables as a technical measure. As approximate inference method, we study Gibbs sampling in the context of non-blind deblurring and obtain excellent results, yet at the cost of high computing effort. In reaction to this, we turn to the mean field algorithm, and show that it scales quadratically in the clique size for a standard restoration setting with linear degradation model. An empirical study of mean field over several restoration scenarios confirms advantageous properties with regard to both image quality and computational runtime. This dissertation further examines the problem of blind deconvolution, beginning with localized blur from fast moving objects in the scene, or from camera defocus. Forgoing dedicated hardware or user labels, we rely only on the image as input and introduce a latent variable model to explain the non-uniform blur. The inference procedure estimates freely varying kernels and we demonstrate its generality by extensive experiments. We further present a discriminative method for blind removal of camera shake. In particular, we interleave discriminative non-blind deconvolution steps with kernel estimation and leverage the error cancellation effects of the Regression Tree Field model to attain a deblurring process with tightly linked sequential stages

    Reconstruction and rendering of time-varying natural phenomena

    Get PDF
    While computer performance increases and computer generated images get ever more realistic, the need for modeling computer graphics content is becoming stronger. To achieve photo-realism detailed scenes have to be modeled often with a significant amount of manual labour. Interdisciplinary research combining the fields of Computer Graphics, Computer Vision and Scientific Computing has led to the development of (semi-)automatic modeling tools freeing the user of labour-intensive modeling tasks. The modeling of animated content is especially challenging. Realistic motion is necessary to convince the audience of computer games, movies with mixed reality content and augmented reality applications. The goal of this thesis is to investigate automated modeling techniques for time-varying natural phenomena. The results of the presented methods are animated, three-dimensional computer models of fire, smoke and fluid flows.Durch die steigende Rechenkapazität moderner Computer besteht die Möglichkeit immer realistischere Bilder virtuell zu erzeugen. Dadurch entsteht ein größerer Bedarf an Modellierungsarbeit um die nötigen Objekte virtuell zu beschreiben. Um photorealistische Bilder erzeugen zu können müssen sehr detaillierte Szenen, oft in mühsamer Handarbeit, modelliert werden. Ein interdisziplinärer Forschungszweig, der Computergrafik, Bildverarbeitung und Wissenschaftliches Rechnen verbindet, hat in den letzten Jahren die Entwicklung von (semi-)automatischen Methoden zur Modellierung von Computergrafikinhalten vorangetrieben. Die Modellierung dynamischer Inhalte ist dabei eine besonders anspruchsvolle Aufgabe, da realistische Bewegungsabläufe sehr wichtig für eine überzeugende Darstellung von Computergrafikinhalten in Filmen, Computerspielen oder Augmented-Reality Anwendungen sind. Das Ziel dieser Arbeit ist es automatische Modellierungsmethoden für dynamische Naturerscheinungen wie Wasserfluss, Feuer, Rauch und die Bewegung erhitzter Luft zu entwickeln. Das Resultat der entwickelten Methoden sind dabei dynamische, dreidimensionale Computergrafikmodelle

    Selectively De-animating and Stabilizing Videos

    Full text link
    This thesis presents three systems for editing the motion of videos. First, selectively de-animating videos seeks to remove the large-scale motions of one or more objects so that other motions are easier to see. The user draws strokes to indicate the regions that should be immobilized, and our algorithm warps the video to remove large-scale motion in regions while leaving finer-scale, relative motions intact. We then use a graph-cut-based optimization to composite the warped video with still frames from the input video to remove unwanted background motion. Our technique enables applications such as clearer motion visualization, simpler creation of artistic cinemagraphs, and new ways to edit appearance and motion paths in video. Second, we design a fully automatic system to create portrait cinemagraphs by tracking facial features and de-animating the video with respect to the face and torso. We then generate compositing weights automatically to create the final cinemagraph portraits.Third, we present a user-assisted video stabilization algorithm that is able to stabilize challenging videos when state-of-the-art automatic algorithms fail to generate a satisfactory result. Our system introduces two new modes of interaction that allow the user to improve an unsatisfactory automatically stabilized video. First, we cluster tracks and visualize them on the warped video. The user ensures that appropriate tracks are selected by clicking on track clusters to include or exclude them to guide the stabilization. Second, the user can directly specify how regions in the output video should look by drawing quadrilaterals to select and deform parts of the frame. Our algorithm then computes a stabilized video using the user-selected tracks, while respecting the user-modified regions

    Schätzung dichter Korrespondenzfelder unter Verwendung mehrerer Bilder

    Get PDF
    Most optical flow algorithms assume pairs of images that are acquired with an ideal, short exposure time. We present two approaches, that use additional images of a scene to estimate highly accurate, dense correspondence fields. In our first approach we consider video sequences that are acquired with alternating exposure times so that a short-exposure image is followed by a long-exposure image that exhibits motion-blur. With the help of the two enframing short-exposure images, we can decipher not only the motion information encoded in the long-exposure image, but also estimate occlusion timings, which are a basis for artifact-free frame interpolation. In our second approach we consider the data modality of multi-view video sequences, as it commonly occurs, e.g., in stereoscopic video. As several images capture nearly the same data of a scene, this redundancy can be used to establish more robust and consistent correspondence fields than the consideration of two images permits.Die meisten Verfahren zur Schätzung des optischen Flusses verwenden zwei Bilder, die mit einer optimalen, kurzen Belichtungszeit aufgenommen wurden. Wir präsentieren zwei Methoden, die zusätzliche Bilder zur Schätzung von hochgenauen, dichten Korrespondenzfeldern verwenden. Die erste Methode betrachtet Videosequenzen, die mit alternierender Belichtungsdauer aufgenommen werden, so dass auf eine Kurzzeitbelichtung eine Langzeitbelichtung folgt, die Bewegungsunschärfe enthält. Mit der Hilfe von zwei benachbarten Kurzzeitbelichtungen können wir nicht nur die Bewegung schätzen, die in der Bewegungsunschärfe der Langzeitbelichtung verschlüsselt ist, sondern zusätzlich auch Verdeckungszeiten schätzen, die sich bei der Interpolation von Zwischenbildern als große Hilfe erweisen. Die zweite Methode betrachtet Videos, die eine Szene aus mehreren Ansichten aufzeichnen, wie z.B. Stereovideos. Dabei enthalten mehrere Bilder fast dieselbe Information über die Szene. Wir nutzen diese Redundanz aus, um konsistentere und robustere Bewegungsfelder zu bestimmen, als es mit zwei Bildern möglich ist

    Pattern Recognition

    Get PDF
    Pattern recognition is a very wide research field. It involves factors as diverse as sensors, feature extraction, pattern classification, decision fusion, applications and others. The signals processed are commonly one, two or three dimensional, the processing is done in real- time or takes hours and days, some systems look for one narrow object class, others search huge databases for entries with at least a small amount of similarity. No single person can claim expertise across the whole field, which develops rapidly, updates its paradigms and comprehends several philosophical approaches. This book reflects this diversity by presenting a selection of recent developments within the area of pattern recognition and related fields. It covers theoretical advances in classification and feature extraction as well as application-oriented works. Authors of these 25 works present and advocate recent achievements of their research related to the field of pattern recognition

    Neural representations for object capture and rendering

    Get PDF
    Photometric stereo is a classical computer vision problem with applications ranging from gaming, VR/AR avatars to movie visual effects which requires a faithful reconstruction of an object in a new space, and thus, there is a need to thoroughly understand the object’s visual properties. With the advent of Neural Radiance Fields (NeRFs) in the early 2020s, we witnessed the incredible photorealism provided by the method and its potential beyond. However, original NeRFs do not provide any information about the material and lighting of the objects in focus. Therefore, we propose to tackle the multiview photometric stereo problem using an extension of NeRFs. We provide three novel contributions through this work. First, the Relightable NeRF model, an extension of the original NeRF, where appearance is conditioned on a point light source direction. It provides two use cases - it is able to learn from varying lighting and relight under arbitrary conditions. Second, the Neural BRDF Fields which extends the relightable NeRF by introducing explicit models for surface reflectance and shadowing. The parameters of the BRDF are learnable as a neural field, enabling spatially varying reflectance. The local surface normal direction as another neural field is learned as well. We experiment with both a fixed BRDF (Lambertian) and a learnable (i.e. neural) reflectance model which guarantees a realistic BRDF by tieing the neural network to BRDF physical properties. In addition, it learns local shadowing as a function of light source direction enabling the reconstruction of cast shadows. Finally, the Neural Implicit Fields for Merging Monocular Photometric Stereo switches from NeRF’s volume density function to a signed distance function representation. This provides a straightforward means to compute the surface normal direction and, thus, ties normal-based losses directly to the geometry. We use this representation to address the problem of merging the output of monocular photometric stereo methods into a single unified model: a neural SDF and a neural field capturing diffuse albedo from which we can extract a textured mesh

    Advanced Sensing and Image Processing Techniques for Healthcare Applications

    Get PDF
    This Special Issue aims to attract the latest research and findings in the design, development and experimentation of healthcare-related technologies. This includes, but is not limited to, using novel sensing, imaging, data processing, machine learning, and artificially intelligent devices and algorithms to assist/monitor the elderly, patients, and the disabled population
    corecore