367,504 research outputs found
Robustness of 3D Deep Learning in an Adversarial Setting
Understanding the spatial arrangement and nature of real-world objects is of
paramount importance to many complex engineering tasks, including autonomous
navigation. Deep learning has revolutionized state-of-the-art performance for
tasks in 3D environments; however, relatively little is known about the
robustness of these approaches in an adversarial setting. The lack of
comprehensive analysis makes it difficult to justify deployment of 3D deep
learning models in real-world, safety-critical applications. In this work, we
develop an algorithm for analysis of pointwise robustness of neural networks
that operate on 3D data. We show that current approaches presented for
understanding the resilience of state-of-the-art models vastly overestimate
their robustness. We then use our algorithm to evaluate an array of
state-of-the-art models in order to demonstrate their vulnerability to
occlusion attacks. We show that, in the worst case, these networks can be
reduced to 0% classification accuracy after the occlusion of at most 6.5% of
the occupied input space.Comment: 10 pages, 8 figures, 1 tabl
PENGEMBANGAN MEDIA PEMBELAJARAN LEMARI MIND MAPPING 3 DIMENSI (LEMAPING 3D) PADA MATERI MENULIS TEKS PROSEDUR KELAS 1 SEKOLAH DASAR
Tion in the form of learning materials. The researcher conducted observations and interviews with the first-grade teacher at SDN Sumbersari 2 Malang. In the process of teaching Indonesian, the informant revealed that students had difficulty understanding simple procedural texts and composing sentences for simple procedural text material. This was evidenced by the results of LKPD and evaluation sheets that did not meet the KKM standard, where out of 14 students, 8 were unable to compose simple procedural texts. Additionally, the lack of media supporting writing skills further exacerbated the students' ability to compose sentences. In response to this issue, the researcher developed a 3D mind mapping cupboard learning media for the first-grade procedural text writing material in elementary school. The purpose of this study is to facilitate students in understanding the material and writing simple procedural texts in an enjoyable manner.
This type of research uses the ADDIE model, which consists of analysis, design, development, implementation, and evaluation. Data collection techniques include observation, interviews, and documentation, as well as data processing techniques involving qualitative and quantitative data. The results of the study showed that the 3D mind mapping cupboard learning media for first-grade procedural text writing material was validated by material experts with a percentage of 81% "valid and feasible to use" and by media experts with a percentage of 95% "valid and feasible to use in learning." The percentage results of student responses were 96% and teacher responses were 97%, indicating that the 3D mind mapping cupboard media is feasible to use. In conclusion, the use of 3D mind mapping can facilitate students in understanding the material and improve their skills in composing simple procedural texts in Indonesian
3D Representation Learning for Shape Reconstruction and Understanding
The real world we are living in is inherently composed of multiple 3D objects. However, most of the existing works in computer vision traditionally either focus on images or videos where the 3D information inevitably gets lost due to the camera projection. Traditional methods typically rely on hand-crafted algorithms and features with many constraints and geometric priors to understand the real world. However, following the trend of deep learning, there has been an exponential growth in the number of research works based on deep neural networks to learn 3D representations for complex shapes and scenes, which lead to many cutting-edged applications in augmented reality (AR), virtual reality (VR) and robotics as one of the most important directions for computer vision and computer graphics.
This thesis aims to build an intelligent system with dynamic 3D representations that can change over time to understand and recover the real world with semantic, instance and geometric information and eventually bridge the gap between the real world and the digital world. As the first step towards the challenges, this thesis explores both explicit representations and implicit representations by explicitly addressing the existing open problems in these areas. This thesis starts from neural implicit representation learning on 3D scene representation learning and understanding and moves to a parametric model based explicit 3D reconstruction method. Extensive experimentation over various benchmarks on various domains demonstrates the superiority of our method against previous state-of-the-art approaches, enabling many applications in the real world. Based on the proposed methods and current observations of open problems, this thesis finally presents a comprehensive conclusion with potential future research directions
MVTN: Learning Multi-View Transformations for 3D Understanding
Multi-view projection techniques have shown themselves to be highly effective
in achieving top-performing results in the recognition of 3D shapes. These
methods involve learning how to combine information from multiple view-points.
However, the camera view-points from which these views are obtained are often
fixed for all shapes. To overcome the static nature of current multi-view
techniques, we propose learning these view-points. Specifically, we introduce
the Multi-View Transformation Network (MVTN), which uses differentiable
rendering to determine optimal view-points for 3D shape recognition. As a
result, MVTN can be trained end-to-end with any multi-view network for 3D shape
classification. We integrate MVTN into a novel adaptive multi-view pipeline
that is capable of rendering both 3D meshes and point clouds. Our approach
demonstrates state-of-the-art performance in 3D classification and shape
retrieval on several benchmarks (ModelNet40, ScanObjectNN, ShapeNet Core55).
Further analysis indicates that our approach exhibits improved robustness to
occlusion compared to other methods. We also investigate additional aspects of
MVTN, such as 2D pretraining and its use for segmentation. To support further
research in this area, we have released MVTorch, a PyTorch library for 3D
understanding and generation using multi-view projections.Comment: under review journal extension for the ICCV 2021 paper
arXiv:2011.1324
DeepVoxels: Learning Persistent 3D Feature Embeddings
In this work, we address the lack of 3D understanding of generative neural
networks by introducing a persistent 3D feature embedding for view synthesis.
To this end, we propose DeepVoxels, a learned representation that encodes the
view-dependent appearance of a 3D scene without having to explicitly model its
geometry. At its core, our approach is based on a Cartesian 3D grid of
persistent embedded features that learn to make use of the underlying 3D scene
structure. Our approach combines insights from 3D geometric computer vision
with recent advances in learning image-to-image mappings based on adversarial
loss functions. DeepVoxels is supervised, without requiring a 3D reconstruction
of the scene, using a 2D re-rendering loss and enforces perspective and
multi-view geometry in a principled manner. We apply our persistent 3D scene
representation to the problem of novel view synthesis demonstrating
high-quality results for a variety of challenging scenes.Comment: Video: https://www.youtube.com/watch?v=HM_WsZhoGXw Supplemental
material:
https://drive.google.com/file/d/1BnZRyNcVUty6-LxAstN83H79ktUq8Cjp/view?usp=sharing
Code: https://github.com/vsitzmann/deepvoxels Project page:
https://vsitzmann.github.io/deepvoxels
SilNet : Single- and Multi-View Reconstruction by Learning from Silhouettes
The objective of this paper is 3D shape understanding from single and
multiple images. To this end, we introduce a new deep-learning architecture and
loss function, SilNet, that can handle multiple views in an order-agnostic
manner. The architecture is fully convolutional, and for training we use a
proxy task of silhouette prediction, rather than directly learning a mapping
from 2D images to 3D shape as has been the target in most recent work.
We demonstrate that with the SilNet architecture there is generalisation over
the number of views -- for example, SilNet trained on 2 views can be used with
3 or 4 views at test-time; and performance improves with more views.
We introduce two new synthetics datasets: a blobby object dataset useful for
pre-training, and a challenging and realistic sculpture dataset; and
demonstrate on these datasets that SilNet has indeed learnt 3D shape. Finally,
we show that SilNet exceeds the state of the art on the ShapeNet benchmark
dataset, and use SilNet to generate novel views of the sculpture dataset.Comment: BMVC 2017; Best Poste
Semantic Visual Localization
Robust visual localization under a wide range of viewing conditions is a
fundamental problem in computer vision. Handling the difficult cases of this
problem is not only very challenging but also of high practical relevance,
e.g., in the context of life-long localization for augmented reality or
autonomous robots. In this paper, we propose a novel approach based on a joint
3D geometric and semantic understanding of the world, enabling it to succeed
under conditions where previous approaches failed. Our method leverages a novel
generative model for descriptor learning, trained on semantic scene completion
as an auxiliary task. The resulting 3D descriptors are robust to missing
observations by encoding high-level 3D geometric and semantic information.
Experiments on several challenging large-scale localization datasets
demonstrate reliable localization under extreme viewpoint, illumination, and
geometry changes
SEGMENT3D: A Web-based Application for Collaborative Segmentation of 3D images used in the Shoot Apical Meristem
The quantitative analysis of 3D confocal microscopy images of the shoot
apical meristem helps understanding the growth process of some plants. Cell
segmentation in these images is crucial for computational plant analysis and
many automated methods have been proposed. However, variations in signal
intensity across the image mitigate the effectiveness of those approaches with
no easy way for user correction. We propose a web-based collaborative 3D image
segmentation application, SEGMENT3D, to leverage automatic segmentation
results. The image is divided into 3D tiles that can be either segmented
interactively from scratch or corrected from a pre-existing segmentation.
Individual segmentation results per tile are then automatically merged via
consensus analysis and then stitched to complete the segmentation for the
entire image stack. SEGMENT3D is a comprehensive application that can be
applied to other 3D imaging modalities and general objects. It also provides an
easy way to create supervised data to advance segmentation using machine
learning models
- …