Search CORE

25,414 research outputs found

Project RISE: Recognizing Industrial Smoke Emissions

Author: Dille Paul
Hoffman Ryan
Hsu Yen-Chia
Hu Ting-Yao
Huang Ting-Hao 'Kenneth'
Nourbakhsh Illah
Pachuta Jessica
Prendi Sean
Sargent Randy
Tsuhlares Anastasia
Publication venue
Publication date: 14/09/2020
Field of study

Industrial smoke emissions pose a significant concern to human health. Prior works have shown that using Computer Vision (CV) techniques to identify smoke as visual evidence can influence the attitude of regulators and empower citizens to pursue environmental justice. However, existing datasets are not of sufficient quality nor quantity to train the robust CV models needed to support air quality advocacy. We introduce RISE, the first large-scale video dataset for Recognizing Industrial Smoke Emissions. We adopted a citizen science approach to collaborate with local community members to annotate whether a video clip has smoke emissions. Our dataset contains 12,567 clips from 19 distinct views from cameras that monitored three industrial facilities. These daytime clips span 30 days over two years, including all four seasons. We ran experiments using deep neural networks to establish a strong performance baseline and reveal smoke recognition challenges. Our survey study discussed community feedback, and our data analysis displayed opportunities for integrating citizen scientists and crowd workers into the application of Artificial Intelligence for social good.Comment: Technical repor

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Surgical Phase Recognition of Short Video Shots Based on Temporal Modeling of Deep Features

Author: Loukas Constantinos
Publication venue
Publication date: 07/12/2018
Field of study

Recognizing the phases of a laparoscopic surgery (LS) operation form its video constitutes a fundamental step for efficient content representation, indexing and retrieval in surgical video databases. In the literature, most techniques focus on phase segmentation of the entire LS video using hand-crafted visual features, instrument usage signals, and recently convolutional neural networks (CNNs). In this paper we address the problem of phase recognition of short video shots (10s) of the operation, without utilizing information about the preceding/forthcoming video frames, their phase labels or the instruments used. We investigate four state-of-the-art CNN architectures (Alexnet, VGG19, GoogleNet, and ResNet101), for feature extraction via transfer learning. Visual saliency was employed for selecting the most informative region of the image as input to the CNN. Video shot representation was based on two temporal pooling mechanisms. Most importantly, we investigate the role of 'elapsed time' (from the beginning of the operation), and we show that inclusion of this feature can increase performance dramatically (69% vs. 75% mean accuracy). Finally, a long short-term memory (LSTM) network was trained for video shot classification based on the fusion of CNN features with 'elapsed time', increasing the accuracy to 86%. Our results highlight the prominent role of visual saliency, long-range temporal recursion and 'elapsed time' (a feature so far ignored), for surgical phase recognition.Comment: 6 pages, 4 figures, 6 table

arXiv.org e-Print Archive

Crossref

Pergamos : Unified Institutional Repository / Digital Library Platform of the National and Kapodistrian University of Athens

Recurrent 3D Pose Sequence Machines

Author: Cheng Hui
Liang Xiaodan
Lin Liang
Lin Mude
Wang Keze
Publication venue
Publication date: 30/07/2017
Field of study

3D human articulated pose recovery from monocular image sequences is very challenging due to the diverse appearances, viewpoints, occlusions, and also the human 3D pose is inherently ambiguous from the monocular imagery. It is thus critical to exploit rich spatial and temporal long-range dependencies among body joints for accurate 3D pose sequence prediction. Existing approaches usually manually design some elaborate prior terms and human body kinematic constraints for capturing structures, which are often insufficient to exploit all intrinsic structures and not scalable for all scenarios. In contrast, this paper presents a Recurrent 3D Pose Sequence Machine(RPSM) to automatically learn the image-dependent structural constraint and sequence-dependent temporal context by using a multi-stage sequential refinement. At each stage, our RPSM is composed of three modules to predict the 3D pose sequences based on the previously learned 2D pose representations and 3D poses: (i) a 2D pose module extracting the image-dependent pose representations, (ii) a 3D pose recurrent module regressing 3D poses and (iii) a feature adaption module serving as a bridge between module (i) and (ii) to enable the representation transformation from 2D to 3D domain. These three modules are then assembled into a sequential prediction framework to refine the predicted poses with multiple recurrent stages. Extensive evaluations on the Human3.6M dataset and HumanEva-I dataset show that our RPSM outperforms all state-of-the-art approaches for 3D pose estimation.Comment: Published in CVPR 201

arXiv.org e-Print Archive

Crossref

UbiEar: Bringing location-independent sound awareness to the hard-of-hearing people with smartphones

Author: DU Junzhao
HAN Jun
LIU Sicong
SHANGGUAN Longfei
WANG Xin
ZHOU Zimu
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/06/2017
Field of study

Non-speech sound-awareness is important to improve the quality of life for the deaf and hard-of-hearing (DHH) people. DHH people, especially the young, are not always satisfied with their hearing aids. According to the interviews with 60 young hard-of-hearing students, a ubiquitous sound-awareness tool for emergency and social events that works in diverse environments is desired. In this paper, we design UbiEar, a smartphone-based acoustic event sensing and notification system. Core techniques in UbiEar are a light-weight deep convolution neural network to enable location-independent acoustic event recognition on commodity smartphons, and a set of mechanisms for prompt and energy-efficient acoustic sensing. We conducted both controlled experiments and user studies with 86 DHH students and showed that UbiEar can assist the young DHH students in awareness of important acoustic events in their daily life.</jats:p

Crossref

Institutional Knowledge at Singapore Management University

Deep Convolution and Correlated Manifold Embedded Distribution Alignment for Forest Fire Smoke Prediction

Author: Di Wenxia
Li Maozhen
Liu Xiaohui
Wang Lipo
Wang Yaoli
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 29/02/2020
Field of study

This paper proposes the deep convolution and correlated manifold embedded distribution alignment (DC-CMEDA) model, which is able to realize the transfer learning classification between and among various small datasets, and greatly shorten the training time. First, pre-trained Resnet50 network is used for feature transfer to extract smoke features because of the difficulty in training small dataset of forest fire smoke; second, a correlated manifold embedded distribution alignment (CMEDA) is proposed to register the smoke features in order to align the input feature distributions of the source and target domains; and finally, a trainable network model is constructed. This model is evaluated in the paper based on satellite remote sensing image and video image datasets. Compared with the deep convolutional integrated long short-term memory (DC-ILSTM) network, DC-CMEDA has increased the accuracy of video images by 1.50 %, and the accuracy of satellite remote sensing images by 4.00 %. Compared the CMEDA algorithm with the ILSTM algorithm, the number of iterations of the former has decreased to 10 times or less, and the algorithm complexity of CMEDA is lower than that of ILSTM. DC-CMEDA has a great advantage in terms of convergence speed. The experimental results show that DC-CMEDA can solve the problem of small sample smoke dataset detection and recognition

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

Gas Detection and Identification Using Multimodal Artificial Intelligence Based Sensor Fusion

Author: Chandel Pulkit
Ghinea George
Kotecha Ketan
Mandaokar Shruti
Narkhede Parag
Walambe Rahee
Publication venue: 'MDPI AG'
Publication date: 09/01/2021
Field of study

With the rapid industrialization and technological advancements, innovative engineering technologies which are cost effective, faster and easier to implement are essential. One such area of concern is the rising number of accidents happening due to gas leaks at coal mines, chemical industries, home appliances etc. In this paper we propose a novel approach to detect and identify the gaseous emissions using the multimodal AI fusion techniques. Most of the gases and their fumes are colorless, odorless, and tasteless, thereby challenging our normal human senses. Sensing based on a single sensor may not be accurate, and sensor fusion is essential for robust and reliable detection in several real-world applications. We manually collected 6400 gas samples (1600 samples per class for four classes) using two specific sensors: the 7-semiconductor gas sensors array, and a thermal camera. The early fusion method of multimodal AI, is applied The network architecture consists of a feature extraction module for individual modality, which is then fused using a merged layer followed by a dense layer, which provides a single output for identifying the gas. We obtained the testing accuracy of 96% (for fused model) as opposed to individual model accuracies of 82% (based on Gas Sensor data using LSTM) and 93% (based on thermal images data using CNN model). Results demonstrate that the fusion of multiple sensors and modalities outperforms the outcome of a single sensor.Comment: 14 Pages, 9 Figure

arXiv.org e-Print Archive

Multidisciplinary Digital Publishing Institute

Brunel University Research Archive