24,632 research outputs found
Leveraging Machine Learning and Big Data for Smart Buildings: A Comprehensive Survey
Future buildings will offer new convenience, comfort, and efficiency
possibilities to their residents. Changes will occur to the way people live as
technology involves into people's lives and information processing is fully
integrated into their daily living activities and objects. The future
expectation of smart buildings includes making the residents' experience as
easy and comfortable as possible. The massive streaming data generated and
captured by smart building appliances and devices contains valuable information
that needs to be mined to facilitate timely actions and better decision making.
Machine learning and big data analytics will undoubtedly play a critical role
to enable the delivery of such smart services. In this paper, we survey the
area of smart building with a special focus on the role of techniques from
machine learning and big data analytics. This survey also reviews the current
trends and challenges faced in the development of smart building services
Collecting and Annotating the Large Continuous Action Dataset
We make available to the community a new dataset to support
action-recognition research. This dataset is different from prior datasets in
several key ways. It is significantly larger. It contains streaming video with
long segments containing multiple action occurrences that often overlap in
space and/or time. All actions were filmed in the same collection of
backgrounds so that background gives little clue as to action class. We had
five humans replicate the annotation of temporal extent of action occurrences
labeled with their class and measured a surprisingly low level of intercoder
agreement. A baseline experiment shows that recent state-of-the-art methods
perform poorly on this dataset. This suggests that this will be a challenging
dataset to foster advances in action-recognition research. This manuscript
serves to describe the novel content and characteristics of the LCA dataset,
present the design decisions made when filming the dataset, and document the
novel methods employed to annotate the dataset
Detecting Temporally Consistent Objects in Videos through Object Class Label Propagation
Object proposals for detecting moving or static video objects need to address
issues such as speed, memory complexity and temporal consistency. We propose an
efficient Video Object Proposal (VOP) generation method and show its efficacy
in learning a better video object detector. A deep-learning based video object
detector learned using the proposed VOP achieves state-of-the-art detection
performance on the Youtube-Objects dataset. We further propose a clustering of
VOPs which can efficiently be used for detecting objects in video in a
streaming fashion. As opposed to applying per-frame convolutional neural
network (CNN) based object detection, our proposed method called Objects in
Video Enabler thRough LAbel Propagation (OVERLAP) needs to classify only a
small fraction of all candidate proposals in every video frame through
streaming clustering of object proposals and class-label propagation. Source
code will be made available soon.Comment: Accepted for publication in WACV 201
Tracking as Online Decision-Making: Learning a Policy from Streaming Videos with Reinforcement Learning
We formulate tracking as an online decision-making process, where a tracking
agent must follow an object despite ambiguous image frames and a limited
computational budget. Crucially, the agent must decide where to look in the
upcoming frames, when to reinitialize because it believes the target has been
lost, and when to update its appearance model for the tracked object. Such
decisions are typically made heuristically. Instead, we propose to learn an
optimal decision-making policy by formulating tracking as a partially
observable decision-making process (POMDP). We learn policies with deep
reinforcement learning algorithms that need supervision (a reward signal) only
when the track has gone awry. We demonstrate that sparse rewards allow us to
quickly train on massive datasets, several orders of magnitude more than past
work. Interestingly, by treating the data source of Internet videos as
unlimited streams, we both learn and evaluate our trackers in a single, unified
computational stream
A Dynamic Service-Migration Mechanism in Edge Cognitive Computing
Driven by the vision of edge computing and the success of rich cognitive
services based on artificial intelligence, a new computing paradigm, edge
cognitive computing (ECC), is a promising approach that applies cognitive
computing at the edge of the network. ECC has the potential to provide the
cognition of users and network environmental information, and further to
provide elastic cognitive computing services to achieve a higher energy
efficiency and a higher Quality of Experience (QoE) compared to edge computing.
This paper firstly introduces our architecture of the ECC and then describes
its design issues in detail. Moreover, we propose an ECC-based dynamic service
migration mechanism to provide an insight into how cognitive computing is
combined with edge computing. In order to evaluate the proposed mechanism, a
practical platform for dynamic service migration is built up, where the
services are migrated based on the behavioral cognition of a mobile user. The
experimental results show that the proposed ECC architecture has ultra-low
latency and a high user experience, while providing better service to the user,
saving computing resources, and achieving a high energy efficiency
Context-Aware Query Selection for Active Learning in Event Recognition
Activity recognition is a challenging problem with many practical
applications. In addition to the visual features, recent approaches have
benefited from the use of context, e.g., inter-relationships among the
activities and objects. However, these approaches require data to be labeled,
entirely available beforehand, and not designed to be updated continuously,
which make them unsuitable for surveillance applications. In contrast, we
propose a continuous-learning framework for context-aware activity recognition
from unlabeled video, which has two distinct advantages over existing methods.
First, it employs a novel active-learning technique that not only exploits the
informativeness of the individual activities but also utilizes their contextual
information during query selection; this leads to significant reduction in
expensive manual annotation effort. Second, the learned models can be adapted
online as more data is available. We formulate a conditional random field model
that encodes the context and devise an information-theoretic approach that
utilizes entropy and mutual information of the nodes to compute the set of most
informative queries, which are labeled by a human. These labels are combined
with graphical inference techniques for incremental updates. We provide a
theoretical formulation of the active learning framework with an analytic
solution. Experiments on six challenging datasets demonstrate that our
framework achieves superior performance with significantly less manual
labeling.Comment: To appear in Transactions of Pattern Pattern Analysis and Machine
Intelligence (T-PAMI
Lattice Long Short-Term Memory for Human Action Recognition
Human actions captured in video sequences are three-dimensional signals
characterizing visual appearance and motion dynamics. To learn action patterns,
existing methods adopt Convolutional and/or Recurrent Neural Networks (CNNs and
RNNs). CNN based methods are effective in learning spatial appearances, but are
limited in modeling long-term motion dynamics. RNNs, especially Long Short-Term
Memory (LSTM), are able to learn temporal motion dynamics. However, naively
applying RNNs to video sequences in a convolutional manner implicitly assumes
that motions in videos are stationary across different spatial locations. This
assumption is valid for short-term motions but invalid when the duration of the
motion is long.
In this work, we propose Lattice-LSTM (L2STM), which extends LSTM by learning
independent hidden state transitions of memory cells for individual spatial
locations. This method effectively enhances the ability to model dynamics
across time and addresses the non-stationary issue of long-term motion dynamics
without significantly increasing the model complexity. Additionally, we
introduce a novel multi-modal training procedure for training our network.
Unlike traditional two-stream architectures which use RGB and optical flow
information as input, our two-stream model leverages both modalities to jointly
train both input gates and both forget gates in the network rather than
treating the two streams as separate entities with no information about the
other. We apply this end-to-end system to benchmark datasets (UCF-101 and
HMDB-51) of human action recognition. Experiments show that on both datasets,
our proposed method outperforms all existing ones that are based on LSTM and/or
CNNs of similar model complexities.Comment: ICCV201
EIQIS: Toward an Event-Oriented Indexable and Queryable Intelligent Surveillance System
Edge computing provides the ability to link distributor users for multimedia
content, while retaining the power of significant data storage and access at a
centralized computer. Two requirements of significance include: what
information show be processed at the edge and how the content should be stored.
Answers to these questions require a combination of query-based search, access,
and response as well as indexed-based processing, storage, and distribution. A
measure of intelligence is not what is known, but is recalled, hence, future
edge intelligence must provide recalled information for dynamic response. In
this paper, a novel event-oriented indexable and queryable intelligent
surveillance (EIQIS) system is introduced leveraging the on-site edge devices
to collect the information sensed in format of frames and extracts useful
features to enhance situation awareness. The design principles are discussed
and a preliminary proof-of-concept prototype is built that validated the
feasibility of the proposed idea
Applications of Deep Reinforcement Learning in Communications and Networking: A Survey
This paper presents a comprehensive literature review on applications of deep
reinforcement learning in communications and networking. Modern networks, e.g.,
Internet of Things (IoT) and Unmanned Aerial Vehicle (UAV) networks, become
more decentralized and autonomous. In such networks, network entities need to
make decisions locally to maximize the network performance under uncertainty of
network environment. Reinforcement learning has been efficiently used to enable
the network entities to obtain the optimal policy including, e.g., decisions or
actions, given their states when the state and action spaces are small.
However, in complex and large-scale networks, the state and action spaces are
usually large, and the reinforcement learning may not be able to find the
optimal policy in reasonable time. Therefore, deep reinforcement learning, a
combination of reinforcement learning with deep learning, has been developed to
overcome the shortcomings. In this survey, we first give a tutorial of deep
reinforcement learning from fundamental concepts to advanced models. Then, we
review deep reinforcement learning approaches proposed to address emerging
issues in communications and networking. The issues include dynamic network
access, data rate control, wireless caching, data offloading, network security,
and connectivity preservation which are all important to next generation
networks such as 5G and beyond. Furthermore, we present applications of deep
reinforcement learning for traffic routing, resource sharing, and data
collection. Finally, we highlight important challenges, open issues, and future
research directions of applying deep reinforcement learning.Comment: 37 pages, 13 figures, 6 tables, 174 reference paper
Recurrent Convolutions for Causal 3D CNNs
Recently, three dimensional (3D) convolutional neural networks (CNNs) have
emerged as dominant methods to capture spatiotemporal representations in
videos, by adding to pre-existing 2D CNNs a third, temporal dimension. Such 3D
CNNs, however, are anti-causal (i.e., they exploit information from both the
past and the future frames to produce feature representations, thus preventing
their use in online settings), constrain the temporal reasoning horizon to the
size of the temporal convolution kernel, and are not temporal
resolution-preserving for video sequence-to-sequence modelling, as, for
instance, in action detection. To address these serious limitations, here we
present a new 3D CNN architecture for the causal/online processing of videos.
Namely, we propose a novel Recurrent Convolutional Network (RCN), which
relies on recurrence to capture the temporal context across frames at each
network level. Our network decomposes 3D convolutions into (1) a 2D spatial
convolution component, and (2) an additional hidden state
convolution, applied across time. The hidden state at any time is assumed
to depend on the hidden state at and on the current output of the spatial
convolution component. As a result, the proposed network: (i) produces causal
outputs, (ii) provides flexible temporal reasoning, (iii) preserves temporal
resolution. Our experiments on the large-scale large Kinetics and MultiThumos
datasets show that the proposed method performs comparably to anti-causal 3D
CNNs, while being causal and using fewer parameters.Comment: Workshop on Large Scale Holistic Video Understanding, ICCVW, 201
- …