848 research outputs found
Project RISE: Recognizing Industrial Smoke Emissions
Industrial smoke emissions pose a significant concern to human health. Prior
works have shown that using Computer Vision (CV) techniques to identify smoke
as visual evidence can influence the attitude of regulators and empower
citizens to pursue environmental justice. However, existing datasets are not of
sufficient quality nor quantity to train the robust CV models needed to support
air quality advocacy. We introduce RISE, the first large-scale video dataset
for Recognizing Industrial Smoke Emissions. We adopted a citizen science
approach to collaborate with local community members to annotate whether a
video clip has smoke emissions. Our dataset contains 12,567 clips from 19
distinct views from cameras that monitored three industrial facilities. These
daytime clips span 30 days over two years, including all four seasons. We ran
experiments using deep neural networks to establish a strong performance
baseline and reveal smoke recognition challenges. Our survey study discussed
community feedback, and our data analysis displayed opportunities for
integrating citizen scientists and crowd workers into the application of
Artificial Intelligence for social good.Comment: Technical repor
Recurrent Scene Parsing with Perspective Understanding in the Loop
Objects may appear at arbitrary scales in perspective images of a scene,
posing a challenge for recognition systems that process images at a fixed
resolution. We propose a depth-aware gating module that adaptively selects the
pooling field size in a convolutional network architecture according to the
object scale (inversely proportional to the depth) so that small details are
preserved for distant objects while larger receptive fields are used for those
nearby. The depth gating signal is provided by stereo disparity or estimated
directly from monocular input. We integrate this depth-aware gating into a
recurrent convolutional neural network to perform semantic segmentation. Our
recurrent module iteratively refines the segmentation results, leveraging the
depth and semantic predictions from the previous iterations.
Through extensive experiments on four popular large-scale RGB-D datasets, we
demonstrate this approach achieves competitive semantic segmentation
performance with a model which is substantially more compact. We carry out
extensive analysis of this architecture including variants that operate on
monocular RGB but use depth as side-information during training, unsupervised
gating as a generic attentional mechanism, and multi-resolution gating. We find
that gated pooling for joint semantic segmentation and depth yields
state-of-the-art results for quantitative monocular depth estimation
Deep learning methods for 360 monocular depth estimation and point cloud semantic segmentation
Monocular depth estimation and point cloud segmentation are essential tasks for 3D scene understanding in computer vision. Depth estimation for omnidirectional images is challenging due to the spherical distortion issue and the availability of large-scale labeled datasets. We propose two separate works for 360 monocular depth estimation tasks. In the first work, we propose a novel, model-agnostic, two-stage pipeline for omnidirectional monocular depth estimation. Our proposed framework PanoDepth takes one 360 image as input, produces one or more synthesized views in the first stage, and feeds the original image and the synthesized images into the subsequent stereo matching stage. Utilizing the explicit stereo-based geometric constraints, PanoDepth can generate dense high-quality depth. In the second work, we propose a 360 monocular depth estimation pipeline, OmniFusion, to tackle the spherical distortion issue. Our pipeline transforms a 360 image into less-distorted perspective patches (i.e. tangent images) to obtain patch-wise predictions via CNN, and then merge the patch-wise results for final output. To handle the discrepancy between patch-wise predictions which is a major issue affecting the merging quality, we propose a new framework with (i) a geometry-aware feature fusion mechanism that combines 3D geometric features with 2D image features. (ii) the self-attention-based transformer architecture to conduct a global aggregation of patch-wise information. (iii) an iterative depth refinement mechanism to further refine the estimated depth based on the more accurate geometric features. Experiments show that both PanoDepth and OmniFusion achieve state-of-the-art performances on several 360 monocular depth estimation benchmark datasets. For point cloud analysis, we mainly focus on defining effective local point convolution operators. We propose two approaches, SPNet and Point-Voxel CNN respectively. For the former, we propose a novel point convolution operator named Shell Point Convolution (SPConv) as the building block for shape encoding and local context learning. Specifically, SPConv splits 3D neighborhood space into shells, aggregates local features on manually designed kernel points, and performs convolution on the shells. For the latter, we present a novel lightweight convolutional neural network which uses point voxel convolution (PVC) layer as building block. Each PVC layer has two parallel branches, namely the voxel branch and the point branch. For the voxel branch, we aggregate local features on non-empty voxel centers to reduce geometric information loss caused by voxelization, then apply volumetric convolutions to enhance local neighborhood geometry encoding. For the point branch, we use Multi-Layer Perceptron (MLP) to extract fine-detailed point-wise features. Outputs from these two branches are adaptively fused via a feature selection module. Experimental results show that SPConv and PVC layers are effective in local shape encoding, and our proposed networks perform well in semantic segmentation tasks.Includes bibliographical references
SPNet: Deep 3D Object Classification and Retrieval using Stereographic Projection
νμλ
Όλ¬Έ(μμ¬)--μμΈλνκ΅ λνμ :곡과λν μ κΈ°Β·μ»΄ν¨ν°κ³΅νλΆ,2019. 8. μ΄κ²½λ¬΄.λ³Έ λ
Όλ¬Έμμλ 3D 물체λΆλ₯ λ¬Έμ λ₯Ό ν¨μ¨μ μΌλ‘ ν΄κ²°νκΈ°μνμ¬ μ
체νλ²μ ν¬μ¬λ₯Ό νμ©ν λͺ¨λΈμ μ μνλ€. λ¨Όμ μ
체νλ²μ ν¬μ¬λ₯Ό μ¬μ©νμ¬ 3D μ
λ ₯ μμμ 2D νλ©΄ μ΄λ―Έμ§λ‘ λ³ννλ€. λν, κ°μ²΄μ μΉ΄ν
κ³ λ¦¬λ₯Ό μΆμ νκΈ° μνμ¬ μμ 2Dν©μ±κ³±μ μ
©λ§(CNN)μ μ μνκ³ , λ€μ€μμ μΌλ‘λΆν° μ»μ κ°μ²΄ μΉ΄ν
κ³ λ¦¬μ μΆμ κ°λ€μ κ²°ν©νμ¬ μ±λ₯μ λμ± ν₯μμν€λ μμλΈ λ°©λ²μ μ μνλ€. μ΄λ₯Όμν΄ (1) μ
체νλ²ν¬μ¬λ₯Ό νμ©νμ¬ 3D κ°μ²΄λ₯Ό 2D νλ©΄ μ΄λ―Έμ§λ‘ λ³ννκ³ (2) λ€μ€μμ μμλ€μ νΉμ§μ μ νμ΅ (3) ν¨κ³Όμ μ΄κ³ κ°μΈν μμ μ νΉμ§μ μ μ λ³ν ν (4) λ€μ€μμ μμλΈμ ν΅ν μ±λ₯μ ν₯μμν€λ 4λ¨κ³λ‘ ꡬμ±λ νμ΅λ°©λ²μ μ μνλ€. λ³Έ λ
Όλ¬Έμμλ μ€νκ²°κ³Όλ₯Ό ν΅ν΄ μ μνλ λ°©λ²μ΄ λ§€μ° μ μ λͺ¨λΈμ νμ΅ λ³μμ GPU λ©λͺ¨λ¦¬λ₯Ό μ¬μ©νλκ³Ό λμμ κ°μ²΄ λΆλ₯ λ° κ²μμμμ μ°μν μ±λ₯μ 보μ΄κ³ μμμ μ¦λͺ
νμλ€.We propose an efficient Stereographic Projection Neural Network (SPNet) for learning representations of 3D objects. We first transform a 3D input volume into a 2D planar image using stereographic projection. We then present a shallow 2D convolutional neural network (CNN) to estimate the object category followed by view ensemble, which combines the responses from multiple views of the object to further enhance the predictions. Specifically, the proposed approach consists of four stages: (1) Stereographic projection of a 3D object, (2) view-specific feature learning, (3) view selection and (4) view ensemble. The proposed approach performs comparably to the state-of-the-art methods while having substantially lower GPU memory as well as network parameters. Despite its lightness, the experiments on 3D object classification and shape retrievals demonstrate the high performance of the proposed method.1 INTRODUCTION
2 Related Work
2.1 Point cloud-based methods
2.2 3D model-based methods
2.3 2D/2.5D image-based methods
3 Proposed Stereographic Projection Network
3.1 Stereographic Representation
3.2 Network Architecture
3.3 View Selection
3.4 View Ensemble
4 Experimental Evaluation
4.1 Datasets
4.2 Training
4.3 Choice of Stereographic Projection
4.4 Test on View Selection Schemes
4.5 3D Object Classification
4.6 Shape Retrieval
4.7 Implementation
5 ConclusionsMaste
- β¦