848 research outputs found

    Project RISE: Recognizing Industrial Smoke Emissions

    Full text link
    Industrial smoke emissions pose a significant concern to human health. Prior works have shown that using Computer Vision (CV) techniques to identify smoke as visual evidence can influence the attitude of regulators and empower citizens to pursue environmental justice. However, existing datasets are not of sufficient quality nor quantity to train the robust CV models needed to support air quality advocacy. We introduce RISE, the first large-scale video dataset for Recognizing Industrial Smoke Emissions. We adopted a citizen science approach to collaborate with local community members to annotate whether a video clip has smoke emissions. Our dataset contains 12,567 clips from 19 distinct views from cameras that monitored three industrial facilities. These daytime clips span 30 days over two years, including all four seasons. We ran experiments using deep neural networks to establish a strong performance baseline and reveal smoke recognition challenges. Our survey study discussed community feedback, and our data analysis displayed opportunities for integrating citizen scientists and crowd workers into the application of Artificial Intelligence for social good.Comment: Technical repor

    Recurrent Scene Parsing with Perspective Understanding in the Loop

    Full text link
    Objects may appear at arbitrary scales in perspective images of a scene, posing a challenge for recognition systems that process images at a fixed resolution. We propose a depth-aware gating module that adaptively selects the pooling field size in a convolutional network architecture according to the object scale (inversely proportional to the depth) so that small details are preserved for distant objects while larger receptive fields are used for those nearby. The depth gating signal is provided by stereo disparity or estimated directly from monocular input. We integrate this depth-aware gating into a recurrent convolutional neural network to perform semantic segmentation. Our recurrent module iteratively refines the segmentation results, leveraging the depth and semantic predictions from the previous iterations. Through extensive experiments on four popular large-scale RGB-D datasets, we demonstrate this approach achieves competitive semantic segmentation performance with a model which is substantially more compact. We carry out extensive analysis of this architecture including variants that operate on monocular RGB but use depth as side-information during training, unsupervised gating as a generic attentional mechanism, and multi-resolution gating. We find that gated pooling for joint semantic segmentation and depth yields state-of-the-art results for quantitative monocular depth estimation

    Deep learning methods for 360 monocular depth estimation and point cloud semantic segmentation

    Get PDF
    Monocular depth estimation and point cloud segmentation are essential tasks for 3D scene understanding in computer vision. Depth estimation for omnidirectional images is challenging due to the spherical distortion issue and the availability of large-scale labeled datasets. We propose two separate works for 360 monocular depth estimation tasks. In the first work, we propose a novel, model-agnostic, two-stage pipeline for omnidirectional monocular depth estimation. Our proposed framework PanoDepth takes one 360 image as input, produces one or more synthesized views in the first stage, and feeds the original image and the synthesized images into the subsequent stereo matching stage. Utilizing the explicit stereo-based geometric constraints, PanoDepth can generate dense high-quality depth. In the second work, we propose a 360 monocular depth estimation pipeline, OmniFusion, to tackle the spherical distortion issue. Our pipeline transforms a 360 image into less-distorted perspective patches (i.e. tangent images) to obtain patch-wise predictions via CNN, and then merge the patch-wise results for final output. To handle the discrepancy between patch-wise predictions which is a major issue affecting the merging quality, we propose a new framework with (i) a geometry-aware feature fusion mechanism that combines 3D geometric features with 2D image features. (ii) the self-attention-based transformer architecture to conduct a global aggregation of patch-wise information. (iii) an iterative depth refinement mechanism to further refine the estimated depth based on the more accurate geometric features. Experiments show that both PanoDepth and OmniFusion achieve state-of-the-art performances on several 360 monocular depth estimation benchmark datasets. For point cloud analysis, we mainly focus on defining effective local point convolution operators. We propose two approaches, SPNet and Point-Voxel CNN respectively. For the former, we propose a novel point convolution operator named Shell Point Convolution (SPConv) as the building block for shape encoding and local context learning. Specifically, SPConv splits 3D neighborhood space into shells, aggregates local features on manually designed kernel points, and performs convolution on the shells. For the latter, we present a novel lightweight convolutional neural network which uses point voxel convolution (PVC) layer as building block. Each PVC layer has two parallel branches, namely the voxel branch and the point branch. For the voxel branch, we aggregate local features on non-empty voxel centers to reduce geometric information loss caused by voxelization, then apply volumetric convolutions to enhance local neighborhood geometry encoding. For the point branch, we use Multi-Layer Perceptron (MLP) to extract fine-detailed point-wise features. Outputs from these two branches are adaptively fused via a feature selection module. Experimental results show that SPConv and PVC layers are effective in local shape encoding, and our proposed networks perform well in semantic segmentation tasks.Includes bibliographical references

    SPNet: Deep 3D Object Classification and Retrieval using Stereographic Projection

    Get PDF
    ν•™μœ„λ…Όλ¬Έ(석사)--μ„œμšΈλŒ€ν•™κ΅ λŒ€ν•™μ› :κ³΅κ³ΌλŒ€ν•™ 전기·컴퓨터곡학뢀,2019. 8. 이경무.λ³Έ λ…Όλ¬Έμ—μ„œλŠ” 3D 물체뢄λ₯˜ 문제λ₯Ό 효율적으둜 ν•΄κ²°ν•˜κΈ°μœ„ν•˜μ—¬ μž…μ²΄ν™”λ²•μ˜ νˆ¬μ‚¬λ₯Ό ν™œμš©ν•œ λͺ¨λΈμ„ μ œμ•ˆν•œλ‹€. λ¨Όμ € μž…μ²΄ν™”λ²•μ˜ νˆ¬μ‚¬λ₯Ό μ‚¬μš©ν•˜μ—¬ 3D μž…λ ₯ μ˜μƒμ„ 2D 평면 μ΄λ―Έμ§€λ‘œ λ³€ν™˜ν•œλ‹€. λ˜ν•œ, 객체의 μΉ΄ν…Œκ³ λ¦¬λ₯Ό μΆ”μ •ν•˜κΈ° μœ„ν•˜μ—¬ 얕은 2D합성곱신셩망(CNN)을 μ œμ‹œν•˜κ³ , λ‹€μ€‘μ‹œμ μœΌλ‘œλΆ€ν„° 얻은 객체 μΉ΄ν…Œκ³ λ¦¬μ˜ 좔정값듀을 κ²°ν•©ν•˜μ—¬ μ„±λŠ₯을 λ”μš± ν–₯μƒμ‹œν‚€λŠ” 앙상블 방법을 μ œμ•ˆν•œλ‹€. 이λ₯Όμœ„ν•΄ (1) μž…μ²΄ν™”λ²•νˆ¬μ‚¬λ₯Ό ν™œμš©ν•˜μ—¬ 3D 객체λ₯Ό 2D 평면 μ΄λ―Έμ§€λ‘œ λ³€ν™˜ν•˜κ³  (2) λ‹€μ€‘μ‹œμ  μ˜μƒλ“€μ˜ νŠΉμ§•μ μ„ ν•™μŠ΅ (3) 효과적이고 κ°•μΈν•œ μ‹œμ μ˜ νŠΉμ§•μ μ„ μ„ λ³„ν•œ ν›„ (4) λ‹€μ€‘μ‹œμ  앙상블을 ν†΅ν•œ μ„±λŠ₯을 ν–₯μƒμ‹œν‚€λŠ” 4λ‹¨κ³„λ‘œ κ΅¬μ„±λœ ν•™μŠ΅λ°©λ²•μ„ μ œμ•ˆν•œλ‹€. λ³Έ λ…Όλ¬Έμ—μ„œλŠ” μ‹€ν—˜κ²°κ³Όλ₯Ό 톡해 μ œμ•ˆν•˜λŠ” 방법이 맀우 적은 λͺ¨λΈμ˜ ν•™μŠ΅ λ³€μˆ˜μ™€ GPU λ©”λͺ¨λ¦¬λ₯Ό μ‚¬μš©ν•˜λŠ”κ³Ό λ™μ‹œμ— 객체 λΆ„λ₯˜ 및 κ²€μƒ‰μ—μ„œμ˜ μš°μˆ˜ν•œ μ„±λŠ₯을 λ³΄μ΄κ³ μžˆμŒμ„ 증λͺ…ν•˜μ˜€λ‹€.We propose an efficient Stereographic Projection Neural Network (SPNet) for learning representations of 3D objects. We first transform a 3D input volume into a 2D planar image using stereographic projection. We then present a shallow 2D convolutional neural network (CNN) to estimate the object category followed by view ensemble, which combines the responses from multiple views of the object to further enhance the predictions. Specifically, the proposed approach consists of four stages: (1) Stereographic projection of a 3D object, (2) view-specific feature learning, (3) view selection and (4) view ensemble. The proposed approach performs comparably to the state-of-the-art methods while having substantially lower GPU memory as well as network parameters. Despite its lightness, the experiments on 3D object classification and shape retrievals demonstrate the high performance of the proposed method.1 INTRODUCTION 2 Related Work 2.1 Point cloud-based methods 2.2 3D model-based methods 2.3 2D/2.5D image-based methods 3 Proposed Stereographic Projection Network 3.1 Stereographic Representation 3.2 Network Architecture 3.3 View Selection 3.4 View Ensemble 4 Experimental Evaluation 4.1 Datasets 4.2 Training 4.3 Choice of Stereographic Projection 4.4 Test on View Selection Schemes 4.5 3D Object Classification 4.6 Shape Retrieval 4.7 Implementation 5 ConclusionsMaste
    • …
    corecore