884 research outputs found
Data-Driven Shape Analysis and Processing
Data-driven methods play an increasingly important role in discovering
geometric, structural, and semantic relationships between 3D shapes in
collections, and applying this analysis to support intelligent modeling,
editing, and visualization of geometric data. In contrast to traditional
approaches, a key feature of data-driven approaches is that they aggregate
information from a collection of shapes to improve the analysis and processing
of individual shapes. In addition, they are able to learn models that reason
about properties and relationships of shapes without relying on hard-coded
rules or explicitly programmed instructions. We provide an overview of the main
concepts and components of these techniques, and discuss their application to
shape classification, segmentation, matching, reconstruction, modeling and
exploration, as well as scene analysis and synthesis, through reviewing the
literature and relating the existing works with both qualitative and numerical
comparisons. We conclude our report with ideas that can inspire future research
in data-driven shape analysis and processing.Comment: 10 pages, 19 figure
OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic Perception, Reconstruction and Generation
Recent advances in modeling 3D objects mostly rely on synthetic datasets due
to the lack of large-scale realscanned 3D databases. To facilitate the
development of 3D perception, reconstruction, and generation in the real world,
we propose OmniObject3D, a large vocabulary 3D object dataset with massive
high-quality real-scanned 3D objects. OmniObject3D has several appealing
properties: 1) Large Vocabulary: It comprises 6,000 scanned objects in 190
daily categories, sharing common classes with popular 2D datasets (e.g.,
ImageNet and LVIS), benefiting the pursuit of generalizable 3D representations.
2) Rich Annotations: Each 3D object is captured with both 2D and 3D sensors,
providing textured meshes, point clouds, multiview rendered images, and
multiple real-captured videos. 3) Realistic Scans: The professional scanners
support highquality object scans with precise shapes and realistic appearances.
With the vast exploration space offered by OmniObject3D, we carefully set up
four evaluation tracks: a) robust 3D perception, b) novel-view synthesis, c)
neural surface reconstruction, and d) 3D object generation. Extensive studies
are performed on these four benchmarks, revealing new observations, challenges,
and opportunities for future research in realistic 3D vision.Comment: Project page: https://omniobject3d.github.io
Data-driven shape analysis and processing
Data-driven methods serve an increasingly important role in discovering geometric, structural, and semantic relationships between shapes. In contrast to traditional approaches that process shapes in isolation of each other, data-driven methods aggregate information from 3D model collections to improve the analysis, modeling and editing of shapes. Through reviewing the literature, we provide an overview of the main concepts and components of these methods, as well as discuss their application to classification, segmentation, matching, reconstruction, modeling and exploration, as well as scene analysis and synthesis. We conclude our report with ideas that can inspire future research in data-driven shape analysis and processing
State of the Art on Neural Rendering
Efficient rendering of photo-realistic virtual worlds is a long standing effort of computer graphics. Modern graphics techniques have succeeded in synthesizing photo-realistic images from hand-crafted scene representations. However, the automatic generation of shape, materials, lighting, and other aspects of scenes remains a challenging problem that, if solved, would make photo-realistic computer graphics more widely accessible. Concurrently, progress in computer vision and machine learning have given rise to a new approach to image synthesis and editing, namely deep generative models. Neural rendering is a new and rapidly emerging field that combines generative machine learning techniques with physical knowledge from computer graphics, e.g., by the integration of differentiable rendering into network training. With a plethora of applications in computer graphics and vision, neural rendering is poised to become a new area in the graphics community, yet no survey of this emerging field exists. This state-of-the-art report summarizes the recent trends and applications of neural rendering. We focus on approaches that combine classic computer graphics techniques with deep generative models to obtain controllable and photo-realistic outputs. Starting with an overview of the underlying computer graphics and machine learning concepts, we discuss critical aspects of neural rendering approaches. This state-of-the-art report is focused on the many important use cases for the described algorithms such as novel view synthesis, semantic photo manipulation, facial and body reenactment, relighting, free-viewpoint video, and the creation of photo-realistic avatars for virtual and augmented reality telepresence. Finally, we conclude with a discussion of the social implications of such technology and investigate open research problems
Neuromorphic Visual Scene Understanding with Resonator Networks
Inferring the position of objects and their rigid transformations is still an
open problem in visual scene understanding. Here we propose a neuromorphic
solution that utilizes an efficient factorization network based on three key
concepts: (1) a computational framework based on Vector Symbolic Architectures
(VSA) with complex-valued vectors; (2) the design of Hierarchical Resonator
Networks (HRN) to deal with the non-commutative nature of translation and
rotation in visual scenes, when both are used in combination; (3) the design of
a multi-compartment spiking phasor neuron model for implementing complex-valued
vector binding on neuromorphic hardware. The VSA framework uses vector binding
operations to produce generative image models in which binding acts as the
equivariant operation for geometric transformations. A scene can therefore be
described as a sum of vector products, which in turn can be efficiently
factorized by a resonator network to infer objects and their poses. The HRN
enables the definition of a partitioned architecture in which vector binding is
equivariant for horizontal and vertical translation within one partition and
for rotation and scaling within the other partition. The spiking neuron model
allows mapping the resonator network onto efficient and low-power neuromorphic
hardware. In this work, we demonstrate our approach using synthetic scenes
composed of simple 2D shapes undergoing rigid geometric transformations and
color changes. A companion paper demonstrates this approach in real-world
application scenarios for machine vision and robotics.Comment: 15 pages, 6 figures, minor change
ISS: Image as Stepping Stone for Text-Guided 3D Shape Generation
Text-guided 3D shape generation remains challenging due to the absence of
large paired text-shape data, the substantial semantic gap between these two
modalities, and the structural complexity of 3D shapes. This paper presents a
new framework called Image as Stepping Stone (ISS) for the task by introducing
2D image as a stepping stone to connect the two modalities and to eliminate the
need for paired text-shape data. Our key contribution is a two-stage
feature-space-alignment approach that maps CLIP features to shapes by
harnessing a pre-trained single-view reconstruction (SVR) model with multi-view
supervisions: first map the CLIP image feature to the detail-rich shape space
in the SVR model, then map the CLIP text feature to the shape space and
optimize the mapping by encouraging CLIP consistency between the input text and
the rendered images. Further, we formulate a text-guided shape stylization
module to dress up the output shapes with novel textures. Beyond existing works
on 3D shape generation from text, our new approach is general for creating
shapes in a broad range of categories, without requiring paired text-shape
data. Experimental results manifest that our approach outperforms the
state-of-the-arts and our baselines in terms of fidelity and consistency with
text. Further, our approach can stylize the generated shapes with both
realistic and fantasy structures and textures
Adversarial content manipulation for analyzing and improving model robustness
The recent rapid progress in machine learning systems has opened up many real-world applications --- from recommendation engines on web platforms to safety critical systems like autonomous vehicles. A model deployed in the real-world will often encounter inputs far from its training distribution. For example, a self-driving car might come across a black stop sign in the wild. To ensure safe operation, it is vital to quantify the robustness of machine learning models to such out-of-distribution data before releasing them into the real-world. However, the standard paradigm of benchmarking machine learning models with fixed size test sets drawn from the same distribution as the training data is insufficient to identify these corner cases efficiently. In principle, if we could generate all valid variations of an input and measure the model response, we could quantify and guarantee model robustness locally. Yet, doing this with real world data is not scalable. In this thesis, we propose an alternative, using generative models to create synthetic data variations at scale and test robustness of target models to these variations. We explore methods to generate semantic data variations in a controlled fashion across visual and text modalities. We build generative models capable of performing controlled manipulation of data like changing visual context, editing appearance of an object in images or changing writing style of text. Leveraging these generative models we propose tools to study robustness of computer vision systems to input variations and systematically identify failure modes. In the text domain, we deploy these generative models to improve diversity of image captioning systems and perform writing style manipulation to obfuscate private attributes of the user. Our studies quantifying model robustness explore two kinds of input manipulations, model-agnostic and model-targeted. The model-agnostic manipulations leverage human knowledge to choose the kinds of changes without considering the target model being tested. This includes automatically editing images to remove objects not directly relevant to the task and create variations in visual context. Alternatively, in the model-targeted approach the input variations performed are directly adversarially guided by the target model. For example, we adversarially manipulate the appearance of an object in the image to fool an object detector, guided by the gradients of the detector. Using these methods, we measure and improve the robustness of various computer vision systems -- specifically image classification, segmentation, object detection and visual question answering systems -- to semantic input variations.Der schnelle Fortschritt von Methoden des maschinellen Lernens hat viele neue Anwendungen ermöglicht – von Recommender-Systemen bis hin zu sicherheitskritischen Systemen wie autonomen Fahrzeugen. In der realen Welt werden diese Systeme oft mit Eingaben außerhalb der Verteilung der Trainingsdaten konfrontiert. Zum Beispiel könnte ein autonomes Fahrzeug einem schwarzen Stoppschild begegnen. Um sicheren Betrieb zu gewährleisten, ist es entscheidend, die Robustheit dieser Systeme zu quantifizieren, bevor sie in der Praxis eingesetzt werden. Aktuell werden diese Modelle auf festen Eingaben von derselben Verteilung wie die Trainingsdaten evaluiert. Allerdings ist diese Strategie unzureichend, um solche Ausnahmefälle zu identifizieren. Prinzipiell könnte die Robustheit “lokal” bestimmt werden, indem wir alle zulässigen Variationen einer Eingabe generieren und die Ausgabe des Systems überprüfen. Jedoch skaliert dieser Ansatz schlecht zu echten Daten. In dieser Arbeit benutzen wir generative Modelle, um synthetische Variationen von Eingaben zu erstellen und so die Robustheit eines Modells zu überprüfen. Wir erforschen Methoden, die es uns erlauben, kontrolliert semantische Änderungen an Bild- und Textdaten vorzunehmen. Wir lernen generative Modelle, die kontrollierte Manipulation von Daten ermöglichen, zum Beispiel den visuellen Kontext zu ändern, die Erscheinung eines Objekts zu bearbeiten oder den Schreibstil von Text zu ändern. Basierend auf diesen Modellen entwickeln wir neue Methoden, um die Robustheit von Bilderkennungssystemen bezüglich Variationen in den Eingaben zu untersuchen und Fehlverhalten zu identifizieren. Im Gebiet von Textdaten verwenden wir diese Modelle, um die Diversität von sogenannten Automatische Bildbeschriftung-Modellen zu verbessern und Schreibtstil-Manipulation zu erlauben, um private Attribute des Benutzers zu verschleiern. Um die Robustheit von Modellen zu quantifizieren, werden zwei Arten von Eingabemanipulationen untersucht: Modell-agnostische und Modell-spezifische Manipulationen. Modell-agnostische Manipulationen basieren auf menschlichem Wissen, um bestimmte Änderungen auszuwählen, ohne das entsprechende Modell miteinzubeziehen. Dies beinhaltet das Entfernen von für die Aufgabe irrelevanten Objekten aus Bildern oder Variationen des visuellen Kontextes. In dem alternativen Modell-spezifischen Ansatz werden Änderungen vorgenommen, die für das Modell möglichst ungünstig sind. Zum Beispiel ändern wir die Erscheinung eines Objekts um ein Modell der Objekterkennung täuschen. Dies ist durch den Gradienten des Modells möglich. Mithilfe dieser Werkzeuge können wir die Robustheit von Systemen zur Bildklassifizierung oder -segmentierung, Objekterkennung und Visuelle Fragenbeantwortung quantifizieren und verbessern
A Census of Baryons and Dark Matter in an Isolated, Milky Way-sized Elliptical Galaxy
We present a study of the dark and luminous matter in the isolated elliptical
galaxy NGC720, based on deep X-ray observations made with Chandra and Suzaku.
The gas is reliably measured to ~R2500, allowing us to place good constraints
on the enclosed mass and baryon fraction (fb) within this radius
(M2500=1.6e12+/-0.2e12 Msun, fb(2500)=0.10+/-0.01; systematic errors are
<~20%). The data indicate that the hot gas is close to hydrostatic, which is
supported by good agreement with a kinematical analysis of the dwarf satellite
galaxies. We confirm a dark matter (DM) halo at ~20-sigma. Assuming an NFW DM
profile, our physical model for the gas distribution enables us to obtain
meaningful constraints at scales larger than R2500, revealing that most of the
baryons are in the hot gas. We find that fb within Rvir is consistent with the
Cosmological value, confirming theoretical predictions that a ~Milky Way-mass
(Mvir=3.1e12+/-0.4e12 Msun) galaxy can sustain a massive, quasi-hydrostatic gas
halo. While fb is higher than the cold baryon fraction typically measured in
similar-mass spiral galaxies, both the gas fraction (fg) and fb in NGC720 are
consistent with an extrapolation of the trends with mass seen in massive galaxy
groups and clusters. After correcting for fg, the entropy profile is close to
the self-similar prediction of gravitational structure formation simulations,
as observed in galaxy clusters. Finally, we find a strong heavy metal abundance
gradient in the ISM similar to those observed in massive galaxy groups.Comment: 23 pages, 13 figures, 4 tables. Accepted for publication in the
Astrophysical Journal. Minor modifications to match accepted version.
Conclusions unchange
Visual Deprojection: Probabilistic Recovery of Collapsed Dimensions
We introduce visual deprojection: the task of recovering an image or video
that has been collapsed along a dimension. Projections arise in various
contexts, such as long-exposure photography, where a dynamic scene is collapsed
in time to produce a motion-blurred image, and corner cameras, where reflected
light from a scene is collapsed along a spatial dimension because of an edge
occluder to yield a 1D video. Deprojection is ill-posed-- often there are many
plausible solutions for a given input. We first propose a probabilistic model
capturing the ambiguity of the task. We then present a variational inference
strategy using convolutional neural networks as functional approximators.
Sampling from the inference network at test time yields plausible candidates
from the distribution of original signals that are consistent with a given
input projection. We evaluate the method on several datasets for both spatial
and temporal deprojection tasks. We first demonstrate the method can recover
human gait videos and face images from spatial projections, and then show that
it can recover videos of moving digits from dramatically motion-blurred images
obtained via temporal projection.Comment: ICCV 201
- …