4,031 research outputs found
Project RISE: Recognizing Industrial Smoke Emissions
Industrial smoke emissions pose a significant concern to human health. Prior
works have shown that using Computer Vision (CV) techniques to identify smoke
as visual evidence can influence the attitude of regulators and empower
citizens to pursue environmental justice. However, existing datasets are not of
sufficient quality nor quantity to train the robust CV models needed to support
air quality advocacy. We introduce RISE, the first large-scale video dataset
for Recognizing Industrial Smoke Emissions. We adopted a citizen science
approach to collaborate with local community members to annotate whether a
video clip has smoke emissions. Our dataset contains 12,567 clips from 19
distinct views from cameras that monitored three industrial facilities. These
daytime clips span 30 days over two years, including all four seasons. We ran
experiments using deep neural networks to establish a strong performance
baseline and reveal smoke recognition challenges. Our survey study discussed
community feedback, and our data analysis displayed opportunities for
integrating citizen scientists and crowd workers into the application of
Artificial Intelligence for social good.Comment: Technical repor
Non-aligned supervision for Real Image Dehazing
Removing haze from real-world images is challenging due to unpredictable
weather conditions, resulting in misaligned hazy and clear image pairs. In this
paper, we propose a non-aligned supervision framework that consists of three
networks - dehazing, airlight, and transmission. In particular, we explore a
non-alignment setting by utilizing a clear reference image that is not aligned
with the hazy input image to supervise the dehazing network through a
multi-scale reference loss that compares the features of the two images. Our
setting makes it easier to collect hazy/clear image pairs in real-world
environments, even under conditions of misalignment and shift views. To
demonstrate this, we have created a new hazy dataset called "Phone-Hazy", which
was captured using mobile phones in both rural and urban areas. Additionally,
we present a mean and variance self-attention network to model the infinite
airlight using dark channel prior as position guidance, and employ a channel
attention network to estimate the three-channel transmission. Experimental
results show that our framework outperforms current state-of-the-art methods in
the real-world image dehazing. Phone-Hazy and code will be available at
https://github.com/hello2377/NSDNet
Simulation-to-Real domain adaptation with teacher-student learning for endoscopic instrument segmentation
Purpose: Segmentation of surgical instruments in endoscopic videos is
essential for automated surgical scene understanding and process modeling.
However, relying on fully supervised deep learning for this task is challenging
because manual annotation occupies valuable time of the clinical experts.
Methods: We introduce a teacher-student learning approach that learns jointly
from annotated simulation data and unlabeled real data to tackle the erroneous
learning problem of the current consistency-based unsupervised domain
adaptation framework.
Results: Empirical results on three datasets highlight the effectiveness of
the proposed framework over current approaches for the endoscopic instrument
segmentation task. Additionally, we provide analysis of major factors affecting
the performance on all datasets to highlight the strengths and failure modes of
our approach.
Conclusion: We show that our proposed approach can successfully exploit the
unlabeled real endoscopic video frames and improve generalization performance
over pure simulation-based training and the previous state-of-the-art. This
takes us one step closer to effective segmentation of surgical tools in the
annotation scarce setting.Comment: Accepted at IPCAI202
A comprehensive survey on recent deep learning-based methods applied to surgical data
Minimally invasive surgery is highly operator dependant with a lengthy
procedural time causing fatigue to surgeon and risks to patients such as injury
to organs, infection, bleeding, and complications of anesthesia. To mitigate
such risks, real-time systems are desired to be developed that can provide
intra-operative guidance to surgeons. For example, an automated system for tool
localization, tool (or tissue) tracking, and depth estimation can enable a
clear understanding of surgical scenes preventing miscalculations during
surgical procedures. In this work, we present a systematic review of recent
machine learning-based approaches including surgical tool localization,
segmentation, tracking, and 3D scene perception. Furthermore, we provide a
detailed overview of publicly available benchmark datasets widely used for
surgical navigation tasks. While recent deep learning architectures have shown
promising results, there are still several open research problems such as a
lack of annotated datasets, the presence of artifacts in surgical scenes, and
non-textured surfaces that hinder 3D reconstruction of the anatomical
structures. Based on our comprehensive review, we present a discussion on
current gaps and needed steps to improve the adaptation of technology in
surgery.Comment: This paper is to be submitted to International journal of computer
visio
Comparison of two deep learning methods for detecting fire hotspots
Every high-rise building must meet construction requirements, i.e. it must have good safety to prevent unexpected events such as fire incident. To avoid the occurrence of a bigger fire, surveillance using closed circuit television (CCTV) videos is necessary. However, it is impossible for security forces to monitor for a full day. One of the methods that can be used to help security forces is deep learning method. In this study, we use two deep learning methods to detect fire hotspots, i.e. you only look once (YOLO) method and faster region-based convolutional neural network (faster R-CNN) method. The first stage, we collected 100 image data (70 training data and 30 test data). The next stage is model training which aims to make the model can recognize fire. Later, we calculate precision, recall, accuracy, and F1 score to measure performance of model. If the F1 score is close to 1, then the balance is optimal. In our experiment results, we found that YOLO has a precision is 100%, recall is 54.54%, accuracy is 66.67%, and F1 score is 0.70583667. While faster R-CNN has a precision is 87.5%, recall is 95.45%, accuracy is 86.67%, and F1 score is 0.913022
- …