30 research outputs found
The World of Fast Moving Objects
The notion of a Fast Moving Object (FMO), i.e. an object that moves over a
distance exceeding its size within the exposure time, is introduced. FMOs may,
and typically do, rotate with high angular speed. FMOs are very common in
sports videos, but are not rare elsewhere. In a single frame, such objects are
often barely visible and appear as semi-transparent streaks.
A method for the detection and tracking of FMOs is proposed. The method
consists of three distinct algorithms, which form an efficient localization
pipeline that operates successfully in a broad range of conditions. We show
that it is possible to recover the appearance of the object and its axis of
rotation, despite its blurred appearance. The proposed method is evaluated on a
new annotated dataset. The results show that existing trackers are inadequate
for the problem of FMO localization and a new approach is required. Two
applications of localization, temporal super-resolution and highlighting, are
presented
Transform recipes for efficient cloud photo enhancement
Cloud image processing is often proposed as a solution to the limited computing power and battery life of mobile devices: it allows complex algorithms to run on powerful servers with virtually unlimited energy supply. Unfortunately, this overlooks the time and energy cost of uploading the input and downloading the output images. When transfer overhead is accounted for, processing images on a remote server becomes less attractive and many applications do not benefit from cloud offloading. We aim to change this in the case of image enhancements that preserve the overall content of an image. Our key insight is that, in this case, the server can compute and transmit a description of the transformation from input to output, which we call a transform recipe. At equivalent quality, our recipes are much more compact than JPEG images: this reduces the client's download. Furthermore, recipes can be computed from highly compressed inputs which significantly reduces the data uploaded to the server. The client reconstructs a high-fidelity approximation of the output by applying the recipe to its local high-quality input. We demonstrate our results on 168 images and 10 image processing applications, showing that our recipes form a compact representation for a diverse set of image filters. With an equivalent transmission budget, they provide higher-quality results than JPEG-compressed input/output images, with a gain of the order of 10 dB in many cases. We demonstrate the utility of recipes on a mobile phone by profiling the energy consumption and latency for both local and cloud computation: a transform recipe-based pipeline runs 2--4x faster and uses 2--7x less energy than local or naive cloud computation.Qatar Computing Research InstituteUnited States. Defense Advanced Research Projects Agency (Agreement FA8750-14-2-0009)Stanford University. Stanford Pervasive Parallelism LaboratoryAdobe System
Recommended from our members
Models of Visual Appearance for Analyzing and Editing Images and Videos
The visual appearance of an image is a complex function of factors such as scene geometry, material reflectances and textures, illumination, and the properties of the camera used to capture the image. Understanding how these factors interact to produce an image is a fundamental problem in computer vision and graphics. This dissertation examines two aspects of this problem: models of visual appearance that allow us to recover scene properties from images and videos, and tools that allow users to manipulate visual appearance in images and videos in intuitive ways. In particular, we look at these problems in three different applications. First, we propose techniques for compositing images that differ significantly in their appearance. Our framework transfers appearance between images by manipulating the different levels of a multi-scale decomposition of the image. This allows users to create realistic composites with minimal interaction in a number of different scenarios. We also discuss techniques for compositing and replacing facial performances in videos. Second, we look at the problem of creating high-quality still images from low-quality video clips. Traditional multi-image enhancement techniques accomplish this by inverting the camera’s imaging process. Our system incorporates feature weights into these image models to create results that have better resolution, noise, and blur characteristics, and summarize the activity in the video. Finally, we analyze variations in scene appearance caused by changes in lighting. We develop a model for outdoor scene appearance that allows us to recover radiometric and geometric infor- mation about the scene from images. We apply this model to a variety of visual tasks, including color-constancy, background subtraction, shadow detection, scene reconstruction, and camera geo-location. We also show that the appearance of a Lambertian scene can be modeled as a combi- nation of distinct three-dimensional illumination subspaces — a result that leads to novel bounds on scene appearance, and a robust uncalibrated photometric stereo method.Engineering and Applied Science
Foundations, Inference, and Deconvolution in Image Restoration
Image restoration is a critical preprocessing step in computer vision,
producing images with reduced noise, blur, and pixel defects.
This enables precise higher-level reasoning as to the scene content in
later stages of the vision pipeline (e.g., object segmentation,
detection, recognition, and tracking).
Restoration techniques have found extensive usage in a broad range of
applications from industry, medicine, astronomy, biology, and
photography.
The recovery of high-grade results requires models of the image
degradation process, giving rise to a class of often heavily
underconstrained, inverse problems.
A further challenge specific to the problem of blur removal is noise
amplification, which may cause strong distortion by ringing artifacts.
This dissertation presents new insights and problem solving procedures
for three areas of image restoration, namely (1) model
foundations, (2) Bayesian inference for high-order Markov
random fields (MRFs), and (3) blind image deblurring
(deconvolution).
As basic research on model foundations, we contribute to reconciling
the perceived differences between probabilistic MRFs on the one hand,
and deterministic variational models on the other.
To do so, we restrict the variational functional to locally supported finite
elements (FE) and integrate over the domain.
This yields a sum of terms depending locally on FE basis coefficients,
and by identifying the latter with pixels, the terms resolve to MRF
potential functions.
In contrast with previous literature, we place special emphasis on robust
regularizers used commonly in contemporary computer vision.
Moreover, we draw samples from the derived models to further
demonstrate the probabilistic connection.
Another focal issue is a class of high-order Field of Experts MRFs
which are learned generatively from natural image data and yield
best quantitative results under Bayesian estimation.
This involves minimizing an integral expression, which has no closed
form solution in general.
However, the MRF class under study has Gaussian mixture potentials,
permitting expansion by indicator variables as a technical measure.
As approximate inference method, we study Gibbs sampling in the
context of non-blind deblurring and obtain excellent results, yet
at the cost of high computing effort.
In reaction to this, we turn to the mean field algorithm, and show
that it scales quadratically in the clique size for a standard
restoration setting with linear degradation model.
An empirical study of mean field over several restoration scenarios
confirms advantageous properties with regard to both image quality and
computational runtime.
This dissertation further examines the problem of blind deconvolution,
beginning with localized blur from fast moving objects in the
scene, or from camera defocus.
Forgoing dedicated hardware or user labels, we rely only on the image
as input and introduce a latent variable model to explain the
non-uniform blur.
The inference procedure estimates freely varying kernels and we
demonstrate its generality by extensive experiments.
We further present a discriminative method for blind removal of camera
shake.
In particular, we interleave discriminative non-blind deconvolution
steps with kernel estimation and leverage the error cancellation
effects of the Regression Tree Field model to attain a deblurring
process with tightly linked sequential stages
Reconstruction and rendering of time-varying natural phenomena
While computer performance increases and computer generated images get ever more realistic, the need for modeling computer graphics content is becoming stronger. To achieve photo-realism detailed scenes have to be modeled often with a significant amount of manual labour. Interdisciplinary research combining the fields of Computer Graphics, Computer Vision and Scientific Computing has led to the development of (semi-)automatic modeling tools freeing the user of labour-intensive modeling tasks. The modeling of animated content is especially challenging. Realistic motion is necessary to convince the audience of computer games, movies with mixed reality content and augmented reality applications. The goal of this thesis is to investigate automated modeling techniques for time-varying natural phenomena. The results of the presented methods are animated, three-dimensional computer models of fire, smoke and fluid flows.Durch die steigende Rechenkapazität moderner Computer besteht die Möglichkeit immer realistischere Bilder virtuell zu erzeugen. Dadurch entsteht ein größerer Bedarf an Modellierungsarbeit um die nötigen Objekte virtuell zu beschreiben. Um photorealistische Bilder erzeugen zu können müssen sehr detaillierte Szenen, oft in mühsamer Handarbeit, modelliert werden. Ein interdisziplinärer Forschungszweig, der Computergrafik, Bildverarbeitung und Wissenschaftliches Rechnen verbindet, hat in den letzten Jahren die Entwicklung von (semi-)automatischen Methoden zur Modellierung von Computergrafikinhalten vorangetrieben. Die Modellierung dynamischer Inhalte ist dabei eine besonders anspruchsvolle Aufgabe, da realistische Bewegungsabläufe sehr wichtig für eine überzeugende Darstellung von Computergrafikinhalten in Filmen, Computerspielen oder Augmented-Reality Anwendungen sind. Das Ziel dieser Arbeit ist es automatische Modellierungsmethoden für dynamische Naturerscheinungen wie Wasserfluss, Feuer, Rauch und die Bewegung erhitzter Luft zu entwickeln. Das Resultat der entwickelten Methoden sind dabei dynamische, dreidimensionale Computergrafikmodelle
Selectively De-animating and Stabilizing Videos
This thesis presents three systems for editing the motion of videos. First, selectively de-animating videos seeks to remove the large-scale motions of one or more objects so that other motions are easier to see. The user draws strokes to indicate the regions that should be immobilized, and our algorithm warps the video to remove large-scale motion in regions while leaving finer-scale, relative motions intact. We then use a graph-cut-based optimization to composite the warped video with still frames from the input video to remove unwanted background motion. Our technique enables applications such as clearer motion visualization, simpler creation of artistic cinemagraphs, and new ways to edit appearance and motion paths in video. Second, we design a fully automatic system to create portrait cinemagraphs by tracking facial features and de-animating the video with respect to the face and torso. We then generate compositing weights automatically to create the final cinemagraph portraits.Third, we present a user-assisted video stabilization algorithm that is able to stabilize challenging videos when state-of-the-art automatic algorithms fail to generate a satisfactory result. Our system introduces two new modes of interaction that allow the user to improve an unsatisfactory automatically stabilized video. First, we cluster tracks and visualize them on the warped video. The user ensures that appropriate tracks are selected by clicking on track clusters to include or exclude them to guide the stabilization. Second, the user can directly specify how regions in the output video should look by drawing quadrilaterals to select and deform parts of the frame. Our algorithm then computes a stabilized video using the user-selected tracks, while respecting the user-modified regions
Schätzung dichter Korrespondenzfelder unter Verwendung mehrerer Bilder
Most optical flow algorithms assume pairs of images that are acquired with an ideal, short exposure time. We present two approaches, that use additional images of a scene to estimate highly accurate, dense correspondence fields. In our first approach we consider video sequences that are acquired with alternating exposure times so that a short-exposure image is followed by a long-exposure image that exhibits motion-blur. With the help of the two enframing short-exposure images, we can decipher not only the motion information encoded in the long-exposure image, but also estimate occlusion timings, which are a basis for artifact-free frame interpolation. In our second approach we consider the data modality of multi-view video sequences, as it commonly occurs, e.g., in stereoscopic video. As several images capture nearly the same data of a scene, this redundancy can be used to establish more robust and consistent correspondence fields than the consideration of two images permits.Die meisten Verfahren zur Schätzung des optischen Flusses verwenden zwei Bilder, die mit einer optimalen, kurzen Belichtungszeit aufgenommen wurden. Wir präsentieren zwei Methoden, die zusätzliche Bilder zur Schätzung von hochgenauen, dichten Korrespondenzfeldern verwenden. Die erste Methode betrachtet Videosequenzen, die mit alternierender Belichtungsdauer aufgenommen werden, so dass auf eine Kurzzeitbelichtung eine Langzeitbelichtung folgt, die Bewegungsunschärfe enthält. Mit der Hilfe von zwei benachbarten Kurzzeitbelichtungen können wir nicht nur die Bewegung schätzen, die in der Bewegungsunschärfe der Langzeitbelichtung verschlüsselt ist, sondern zusätzlich auch Verdeckungszeiten schätzen, die sich bei der Interpolation von Zwischenbildern als große Hilfe erweisen. Die zweite Methode betrachtet Videos, die eine Szene aus mehreren Ansichten aufzeichnen, wie z.B. Stereovideos. Dabei enthalten mehrere Bilder fast dieselbe Information über die Szene. Wir nutzen diese Redundanz aus, um konsistentere und robustere Bewegungsfelder zu bestimmen, als es mit zwei Bildern möglich ist
Pattern Recognition
Pattern recognition is a very wide research field. It involves factors as diverse as sensors, feature extraction, pattern classification, decision fusion, applications and others. The signals processed are commonly one, two or three dimensional, the processing is done in real- time or takes hours and days, some systems look for one narrow object class, others search huge databases for entries with at least a small amount of similarity. No single person can claim expertise across the whole field, which develops rapidly, updates its paradigms and comprehends several philosophical approaches. This book reflects this diversity by presenting a selection of recent developments within the area of pattern recognition and related fields. It covers theoretical advances in classification and feature extraction as well as application-oriented works. Authors of these 25 works present and advocate recent achievements of their research related to the field of pattern recognition
Neural representations for object capture and rendering
Photometric stereo is a classical computer vision problem with applications ranging from gaming, VR/AR avatars to movie visual effects which requires a faithful reconstruction of an object in a new space, and thus, there is a need to thoroughly understand the object’s visual properties. With the advent of Neural Radiance Fields (NeRFs) in the early 2020s, we witnessed the incredible photorealism provided by the method and its potential beyond. However, original NeRFs do not provide any information about the material and lighting of the objects in focus. Therefore, we propose to tackle the multiview photometric stereo problem using an extension of NeRFs. We provide three novel contributions through this work. First, the Relightable NeRF model, an extension of the original NeRF, where appearance is conditioned on a point light source direction. It provides two use cases - it is able to learn from varying lighting and relight under arbitrary conditions. Second, the Neural BRDF Fields which extends the relightable NeRF by introducing explicit models for surface reflectance and shadowing. The parameters of the BRDF are learnable as a neural field, enabling spatially varying reflectance. The local surface normal direction as another neural field is learned as well. We experiment with both a fixed BRDF (Lambertian) and a learnable (i.e. neural) reflectance model which guarantees a realistic BRDF by tieing the neural network to BRDF physical properties. In addition, it learns local shadowing as a function of light source direction enabling the reconstruction of cast shadows. Finally, the Neural Implicit Fields for Merging Monocular Photometric Stereo switches from NeRF’s volume density function to a signed distance function representation. This provides a straightforward means to compute the surface normal direction and, thus, ties normal-based losses directly to the geometry. We use this representation to address the problem of merging the output of monocular photometric stereo methods into a single unified model: a neural SDF and a neural field capturing diffuse albedo from which we can extract a textured mesh
Advanced Sensing and Image Processing Techniques for Healthcare Applications
This Special Issue aims to attract the latest research and findings in the design, development and experimentation of healthcare-related technologies. This includes, but is not limited to, using novel sensing, imaging, data processing, machine learning, and artificially intelligent devices and algorithms to assist/monitor the elderly, patients, and the disabled population