82 research outputs found
The Video Mesh: A Data Structure for Image-based Three-dimensional Video Editing
This paper introduces the video mesh, a data structure for representing video as 2.5D âpaper cutouts.â The video mesh allows interactive editing of moving objects and modeling of depth, which enables 3D effects and post-exposure camera control. The video mesh sparsely encodes optical flow as well as depth, and handles occlusion using local layering and alpha mattes. Motion is described by a sparse set of points tracked over time. Each point also stores a depth value. The video mesh is a triangulation over this point set and per-pixel information is obtained by interpolation. The user rotoscopes occluding contours and we introduce an algorithm to cut the video mesh along them. Object boundaries are refined with per-pixel alpha values. The video mesh is at its core a set of texture mapped triangles, we leverage graphics hardware to enable interactive editing and rendering of a variety of effects. We demonstrate the effectiveness of our representation with special effects such as 3D viewpoint changes, object insertion, depth-of-field manipulation, and 2D to 3D video conversion
Recommended from our members
A Novel Inpainting Framework for Virtual View Synthesis
Multi-view imaging has stimulated significant research to enhance the user experience of free viewpoint video, allowing interactive navigation between views and the freedom to select a desired view to watch. This usually involves transmitting both textural and depth information captured from different viewpoints to the receiver, to enable the synthesis of an arbitrary view. In rendering these virtual views, perceptual holes can appear due to certain regions, hidden in the original view by a closer object, becoming visible in the virtual view. To provide a high quality experience these holes must be filled in a visually plausible way, in a process known as inpainting. This is challenging because the missing information is generally unknown and the hole-regions can be large. Recently depth-based inpainting techniques have been proposed to address this challenge and while these generally perform better than non-depth assisted methods, they are not very robust and can produce perceptual artefacts.
This thesis presents a new inpainting framework that innovatively exploits depth and textural self-similarity characteristics to construct subjectively enhanced virtual viewpoints. The framework makes three significant contributions to the field: i) the exploitation of view information to jointly inpaint textural and depth hole regions; ii) the introduction of the novel concept of self-similarity characterisation which is combined with relevant depth information; and iii) an advanced self-similarity characterising scheme that automatically determines key spatial transform parameters for effective and flexible inpainting.
The presented inpainting framework has been critically analysed and shown to provide superior performance both perceptually and numerically compared to existing techniques, especially in terms of lower visual artefacts. It provides a flexible robust framework to develop new inpainting strategies for the next generation of interactive multi-view technologies
Subjective responses and eye fixations to visual displays of spatial sequences
In the selection of spatial sequences for this
study importance was given to an understanding of the
architect's design intentions, Chapters IV and VIII.
The displays of the Sydney Opera House have revealed
an incomplete structure, and the contrast of some
completed spaces with substantial environmental noise
of scaffold and workmens' ramps has been an important
part of the study. In many cases the provision of a
temporary path through space has been accepted by a
group of subjects, and its importance "has been
illustrated from eye fixations to the displays.In Displays 3, 4, 6, 7, 8, 9, 11 and 12 a
conflict has not generally arisen between the
designer's intentions of subjective experience and
the responses of the subjects. But in Displays 1, 2,
5 and 10 the architect's design for the completed
building and the recording of space in the incomplete
building are at variance.The forty subjects who have taken part in the
experiments represented a range of architectural
training in addition to variations of personality and
general conditioning. The group of subjects was not
a sample of the general population, and for this
reason the average creativity score was considerably
higher than the average for the general population
Generative RGB-D face completion for head-mounted display removal
Head-mounted displays (HMDs) are an essential display device for the observation of virtual reality (VR) environments. However, HMDs obstruct external capturing methods from recording the user's upper face. This severely impacts social VR applications, such as teleconferencing, which commonly rely on external RGB-D sensors to capture a volumetric representation of the user. In this paper, we introduce an HMD removal framework based on generative adversarial networks (GANs), capable of jointly filling in missing color and depth data in RGB-D face images. Our framework includes an RGB-based identity loss function for identity preservation and several components aimed at surface reproduction. Our results demonstrate that our framework is able to remove HMDs from synthetic RGB-D face images while preserving the subject's identity
Efficient and High-Quality Rendering of Higher-Order Geometric Data Representations
Computer-Aided Design (CAD) bezeichnet den Entwurf industrieller Produkte mit Hilfe von virtuellen 3D Modellen. Ein CAD-Modell besteht aus parametrischen Kurven und FlĂ€chen, in den meisten FĂ€llen non-uniform rational B-Splines (NURBS). Diese mathematische Beschreibung wird ebenfalls zur Analyse, Optimierung und PrĂ€sentation des Modells verwendet. In jeder dieser Entwicklungsphasen wird eine unterschiedliche visuelle Darstellung benötigt, um den entsprechenden Nutzern ein geeignetes Feedback zu geben. Designer bevorzugen beispielsweise illustrative oder realistische Darstellungen, Ingenieure benötigen eine verstĂ€ndliche Visualisierung der Simulationsergebnisse, wĂ€hrend eine immersive 3D Darstellung bei einer Benutzbarkeitsanalyse oder der Designauswahl hilfreich sein kann. Die interaktive Darstellung von NURBS-Modellen und -Simulationsdaten ist jedoch aufgrund des hohen Rechenaufwandes und der eingeschrĂ€nkten HardwareunterstĂŒtzung eine groĂe Herausforderung.
Diese Arbeit stellt vier neuartige Verfahren vor, welche sich mit der interaktiven Darstellung von NURBS-Modellen und Simulationensdaten befassen. Die vorgestellten Algorithmen nutzen neue FĂ€higkeiten aktueller Grafikkarten aus, um den Stand der Technik bezĂŒglich QualitĂ€t, Effizienz und Darstellungsgeschwindigkeit zu verbessern. Zwei dieser Verfahren befassen sich mit der direkten Darstellung der parametrischen Beschreibung ohne Approximationen oder zeitaufwĂ€ndige Vorberechnungen. Die dabei vorgestellten Datenstrukturen und Algorithmen ermöglichen die effiziente Unterteilung, Klassifizierung, Tessellierung und Darstellung getrimmter NURBS-FlĂ€chen und einen interaktiven Ray-Casting-Algorithmus fĂŒr die IsoflĂ€chenvisualisierung von NURBSbasierten isogeometrischen Analysen. Die weiteren zwei Verfahren beschreiben zum einen das vielseitige Konzept der programmierbaren Transparenz fĂŒr illustrative und verstĂ€ndliche Visualisierungen tiefenkomplexer CAD-Modelle und zum anderen eine neue hybride Methode zur Reprojektion halbtransparenter und undurchsichtiger Bildinformation fĂŒr die Beschleunigung der Erzeugung von stereoskopischen Bildpaaren. Die beiden letztgenannten AnsĂ€tze basieren auf rasterisierter Geometrie und sind somit ebenfalls fĂŒr normale Dreiecksmodelle anwendbar, wodurch die Arbeiten auch einen wichtigen Beitrag in den Bereichen der Computergrafik und der virtuellen RealitĂ€t darstellen.
Die Auswertung der Arbeit wurde mit groĂen, realen NURBS-DatensĂ€tzen durchgefĂŒhrt. Die Resultate zeigen, dass die direkte Darstellung auf Grundlage der parametrischen Beschreibung mit interaktiven Bildwiederholraten und in subpixelgenauer QualitĂ€t möglich ist. Die EinfĂŒhrung programmierbarer Transparenz ermöglicht zudem die Umsetzung kollaborativer 3D Interaktionstechniken fĂŒr die Exploration der Modelle in virtuellenUmgebungen sowie illustrative und verstĂ€ndliche Visualisierungen tiefenkomplexer CAD-Modelle. Die Erzeugung stereoskopischer Bildpaare fĂŒr die interaktive Visualisierung auf 3D Displays konnte beschleunigt werden. Diese messbare Verbesserung wurde zudem im Rahmen einer Nutzerstudie als wahrnehmbar und vorteilhaft befunden.In computer-aided design (CAD), industrial products are designed using a virtual 3D model. A CAD model typically consists of curves and surfaces in a parametric representation, in most cases, non-uniform rational B-splines (NURBS). The same representation is also used for the analysis, optimization and presentation of the model. In each phase of this process, different visualizations are required to provide an appropriate user feedback. Designers work with illustrative and realistic renderings, engineers need a
comprehensible visualization of the simulation results, and usability studies or product presentations benefit from using a 3D display. However, the interactive visualization of NURBS models and corresponding physical simulations is a challenging task because of the computational complexity and the limited graphics hardware support.
This thesis proposes four novel rendering approaches that improve the interactive visualization of CAD models and their analysis. The presented algorithms exploit latest graphics hardware capabilities to advance the state-of-the-art in terms of quality, efficiency and performance. In particular, two approaches describe the direct rendering of the parametric representation without precomputed approximations and timeconsuming pre-processing steps. New data structures and algorithms are presented for the efficient partition, classification, tessellation, and rendering of trimmed NURBS surfaces as well as the first direct isosurface ray-casting approach for NURBS-based isogeometric analysis. The other two approaches introduce the versatile concept of programmable order-independent semi-transparency for the illustrative and comprehensible visualization of depth-complex CAD models, and a novel method for the hybrid reprojection of opaque and semi-transparent image information to accelerate stereoscopic rendering. Both approaches are also applicable to standard polygonal geometry which contributes to the computer graphics and virtual reality research communities.
The evaluation is based on real-world NURBS-based models and simulation data. The results show that rendering can be performed directly on the underlying parametric representation with interactive frame rates and subpixel-precise image results. The computational costs of additional visualization effects, such as semi-transparency and stereoscopic rendering, are reduced to maintain interactive frame rates. The benefit of this performance gain was confirmed by quantitative measurements and a pilot user study
Tele-Robotics VR with Holographic Vision in Immersive Video
We present a first-of-its-kind end-to-end tele-robotic VR system
where the user operates a robot arm remotely, while being virtually
immersed into the scene through force feedback and holographic
vision. In contrast to stereoscopic head mounted displays that only
provide depth perception to the user, the holographic vision device
projects a light field, additionally allowing the user to correctly
accommodate his/her eyes to the perceived depth of the scene's
objects. The highly improved immersive user experience results in
less fatigue in the tele-operator's daily work, creating safer and/or
longer working conditions. The core technology relies on recent
advances in immersive video coding for audio-visual transmission
developed within the MPEG standardization committee. Virtual
viewpoints are synthesized for the tele-operator's viewing direction
from a couple of colour and depth fixed video feeds. Besides of
the display hardware and its GPU-enabled view synthesis driver,
the biggest challenge hides in obtaining high-quality and reliable
depth images from low-cost depth sensing devices. Specialized
depth refinement tools have been developed for running in real-
time at zero delay within the end-to-end tele-robotic immersive
video pipeline, which must remain interactive by essence. Various
modules work asynchronously and efficiently at their own pace,
with the acquisition devices typically being limited to 30 frames per
second (fps), while the holographic headset updates its projected
light field at up to 240 fps. Such modular approach ensures high
genericity over a wide range of free navigation VR/XR applications,
also beyond the tele-robotic one presented in this paper
Automatic 2D to Stereoscopic Video Conversion for 3DTV
In this thesis we address the problem of automatically converting a video filmed with a single camera to stereoscopic content tailored for viewing using 3D TVs. We present two techniques: (a) a non-parametric approach which does not require extensive training and produces good results for simple rigid scenes and, (b) a deep learning approach able to handle dynamic changes in the scene. The proposed solutions both include two stages: depth generation and rendering. For the first stage, for the non-parametric approach we utilize an energy-based optimization, and for the deep learning approach a multi-scale convolutional neural network to address the complex problem of depth estimation from a single image. Depth maps are generated based on the input RGB images. We reformulate and simplify the process of generating the virtual cameraâs depth map and present how this can be used to render an anaglyph image. Anaglyph stereo was used for demonstration only because of the easy and wide availability of red/cyan glasses however, this does not limit the applicability of the proposed technique to other stereo forms. Finally, we have extensively tested the proposed approaches and present the results
Deep Industrial Image Anomaly Detection: A Survey
The recent rapid development of deep learning has laid a milestone in
industrial Image Anomaly Detection (IAD). In this paper, we provide a
comprehensive review of deep learning-based image anomaly detection techniques,
from the perspectives of neural network architectures, levels of supervision,
loss functions, metrics and datasets. In addition, we extract the new setting
from industrial manufacturing and review the current IAD approaches under our
proposed our new setting. Moreover, we highlight several opening challenges for
image anomaly detection. The merits and downsides of representative network
architectures under varying supervision are discussed. Finally, we summarize
the research findings and point out future research directions. More resources
are available at
https://github.com/M-3LAB/awesome-industrial-anomaly-detection
Automatic 2D-to-3D conversion of single low depth-of-field images
This research presents a novel approach to the automatic rendering of 3D stereoscopic disparity image pairs from single 2D low depth-of-field (LDOF) images. Initially a depth map is produced through the assignment of depth to every delineated object and region in the image. Subsequently the left and right disparity images are produced through depth imagebased rendering (DIBR). The objects and regions in the image are initially assigned to one of six proposed groups or labels. Labelling is performed in two stages. The first involves the delineation of the dominant object-of-interest (OOI). The second involves the global object and region grouping of the non-OOI regions. The matting of the OOI is also performed in two stages. Initially the in focus foreground or region-of-interest (ROI) is separated from the out of focus background. This is achieved through the correlation of edge, gradient and higher-order statistics (HOS) saliencies. Refinement of the ROI is performed using k-means segmentation and CIEDE2000 colour-difference matching. Subsequently the OOI is extracted from within the ROI through analysis of the dominant gradients and edge saliencies together with k-means segmentation. Depth is assigned to each of the six labels by correlating Gestalt-based principles with vanishing point estimation, gradient plane approximation and depth from defocus (DfD). To minimise some of the dis-occlusions that are generated through the 3D warping sub-process within the DIBR process the depth map is pre-smoothed using an asymmetric bilateral filter. Hole-filling of the remaining dis-occlusions is performed through nearest-neighbour horizontal interpolation, which incorporates depth as well as direction of warp. To minimising the effects of the lateral striations, specific directional Gaussian and circular averaging smoothing is applied independently to each view, with additional average filtering applied to the border transitions. Each stage of the proposed model is benchmarked against data from several significant publications. Novel contributions are made in the sub-speciality fields of ROI estimation, OOI matting, LDOF image classification, Gestalt-based region categorisation, vanishing point detection, relative depth assignment and hole-filling or inpainting. An important contribution is made towards the overall knowledge base of automatic 2D-to-3D conversion techniques, through the collation of existing information, expansion of existing methods and development of newer concepts
- âŠ