89 research outputs found
Estimating Fire Weather Indices via Semantic Reasoning over Wireless Sensor Network Data Streams
Wildfires are frequent, devastating events in Australia that regularly cause
significant loss of life and widespread property damage. Fire weather indices
are a widely-adopted method for measuring fire danger and they play a
significant role in issuing bushfire warnings and in anticipating demand for
bushfire management resources. Existing systems that calculate fire weather
indices are limited due to low spatial and temporal resolution. Localized
wireless sensor networks, on the other hand, gather continuous sensor data
measuring variables such as air temperature, relative humidity, rainfall and
wind speed at high resolutions. However, using wireless sensor networks to
estimate fire weather indices is a challenge due to data quality issues, lack
of standard data formats and lack of agreement on thresholds and methods for
calculating fire weather indices. Within the scope of this paper, we propose a
standardized approach to calculating Fire Weather Indices (a.k.a. fire danger
ratings) and overcome a number of the challenges by applying Semantic Web
Technologies to the processing of data streams from a wireless sensor network
deployed in the Springbrook region of South East Queensland. This paper
describes the underlying ontologies, the semantic reasoning and the Semantic
Fire Weather Index (SFWI) system that we have developed to enable domain
experts to specify and adapt rules for calculating Fire Weather Indices. We
also describe the Web-based mapping interface that we have developed, that
enables users to improve their understanding of how fire weather indices vary
over time within a particular region.Finally, we discuss our evaluation results
that indicate that the proposed system outperforms state-of-the-art techniques
in terms of accuracy, precision and query performance.Comment: 20pages, 12 figure
From Deterministic to Generative: Multi-Modal Stochastic RNNs for Video Captioning
Video captioning in essential is a complex natural process, which is affected
by various uncertainties stemming from video content, subjective judgment, etc.
In this paper we build on the recent progress in using encoder-decoder
framework for video captioning and address what we find to be a critical
deficiency of the existing methods, that most of the decoders propagate
deterministic hidden states. Such complex uncertainty cannot be modeled
efficiently by the deterministic models. In this paper, we propose a generative
approach, referred to as multi-modal stochastic RNNs networks (MS-RNN), which
models the uncertainty observed in the data using latent stochastic variables.
Therefore, MS-RNN can improve the performance of video captioning, and generate
multiple sentences to describe a video considering different random factors.
Specifically, a multi-modal LSTM (M-LSTM) is first proposed to interact with
both visual and textual features to capture a high-level representation. Then,
a backward stochastic LSTM (S-LSTM) is proposed to support uncertainty
propagation by introducing latent variables. Experimental results on the
challenging datasets MSVD and MSR-VTT show that our proposed MS-RNN approach
outperforms the state-of-the-art video captioning benchmarks
Hierarchical LSTM with Adjusted Temporal Attention for Video Captioning
Recent progress has been made in using attention based encoder-decoder
framework for video captioning. However, most existing decoders apply the
attention mechanism to every generated word including both visual words (e.g.,
"gun" and "shooting") and non-visual words (e.g. "the", "a"). However, these
non-visual words can be easily predicted using natural language model without
considering visual signals or attention. Imposing attention mechanism on
non-visual words could mislead and decrease the overall performance of video
captioning. To address this issue, we propose a hierarchical LSTM with adjusted
temporal attention (hLSTMat) approach for video captioning. Specifically, the
proposed framework utilizes the temporal attention for selecting specific
frames to predict the related words, while the adjusted temporal attention is
for deciding whether to depend on the visual information or the language
context information. Also, a hierarchical LSTMs is designed to simultaneously
consider both low-level visual information and high-level language context
information to support the video caption generation. To demonstrate the
effectiveness of our proposed framework, we test our method on two prevalent
datasets: MSVD and MSR-VTT, and experimental results show that our approach
outperforms the state-of-the-art methods on both two datasets
Class Gradient Projection For Continual Learning
Catastrophic forgetting is one of the most critical challenges in Continual
Learning (CL). Recent approaches tackle this problem by projecting the gradient
update orthogonal to the gradient subspace of existing tasks. While the results
are remarkable, those approaches ignore the fact that these calculated
gradients are not guaranteed to be orthogonal to the gradient subspace of each
class due to the class deviation in tasks, e.g., distinguishing "Man" from
"Sea" v.s. differentiating "Boy" from "Girl". Therefore, this strategy may
still cause catastrophic forgetting for some classes. In this paper, we propose
Class Gradient Projection (CGP), which calculates the gradient subspace from
individual classes rather than tasks. Gradient update orthogonal to the
gradient subspace of existing classes can be effectively utilized to minimize
interference from other classes. To improve the generalization and efficiency,
we further design a Base Refining (BR) algorithm to combine similar classes and
refine class bases dynamically. Moreover, we leverage a contrastive learning
method to improve the model's ability to handle unseen tasks. Extensive
experiments on benchmark datasets demonstrate the effectiveness of our proposed
approach. It improves the previous methods by 2.0% on the CIFAR-100 dataset.Comment: MM '22: Proceedings of the 30th ACM International Conference on
Multimedi
MotionZero:Exploiting Motion Priors for Zero-shot Text-to-Video Generation
Zero-shot Text-to-Video synthesis generates videos based on prompts without
any videos. Without motion information from videos, motion priors implied in
prompts are vital guidance. For example, the prompt "airplane landing on the
runway" indicates motion priors that the "airplane" moves downwards while the
"runway" stays static. Whereas the motion priors are not fully exploited in
previous approaches, thus leading to two nontrivial issues: 1) the motion
variation pattern remains unaltered and prompt-agnostic for disregarding motion
priors; 2) the motion control of different objects is inaccurate and entangled
without considering the independent motion priors of different objects. To
tackle the two issues, we propose a prompt-adaptive and disentangled motion
control strategy coined as MotionZero, which derives motion priors from prompts
of different objects by Large-Language-Models and accordingly applies motion
control of different objects to corresponding regions in disentanglement.
Furthermore, to facilitate videos with varying degrees of motion amplitude, we
propose a Motion-Aware Attention scheme which adjusts attention among frames by
motion amplitude. Extensive experiments demonstrate that our strategy could
correctly control motion of different objects and support versatile
applications including zero-shot video edit
Less is More: On the Feature Redundancy of Pretrained Models When Transferring to Few-shot Tasks
Transferring a pretrained model to a downstream task can be as easy as
conducting linear probing with target data, that is, training a linear
classifier upon frozen features extracted from the pretrained model. As there
may exist significant gaps between pretraining and downstream datasets, one may
ask whether all dimensions of the pretrained features are useful for a given
downstream task. We show that, for linear probing, the pretrained features can
be extremely redundant when the downstream data is scarce, or few-shot. For
some cases such as 5-way 1-shot tasks, using only 1\% of the most important
feature dimensions is able to recover the performance achieved by using the
full representation. Interestingly, most dimensions are redundant only under
few-shot settings and gradually become useful when the number of shots
increases, suggesting that feature redundancy may be the key to characterizing
the "few-shot" nature of few-shot transfer problems. We give a theoretical
understanding of this phenomenon and show how dimensions with high variance and
small distance between class centroids can serve as confounding factors that
severely disturb classification results under few-shot settings. As an attempt
at solving this problem, we find that the redundant features are difficult to
identify accurately with a small number of training samples, but we can instead
adjust feature magnitude with a soft mask based on estimated feature
importance. We show that this method can generally improve few-shot transfer
performance across various pretrained models and downstream datasets
DePT: Decoupled Prompt Tuning
This work breaks through the Base-New Tradeoff (BNT)dilemma in prompt tuning,
i.e., the better the tuned model generalizes to the base (or target) task, the
worse it generalizes to new tasks, and vice versa. Specifically, through an
in-depth analysis of the learned features of the base and new tasks, we observe
that the BNT stems from a channel bias issue, i.e., the vast majority of
feature channels are occupied by base-specific knowledge, resulting in the
collapse of taskshared knowledge important to new tasks. To address this, we
propose the Decoupled Prompt Tuning (DePT) framework, which decouples
base-specific knowledge from feature channels into an isolated feature space
during prompt tuning, so as to maximally preserve task-shared knowledge in the
original feature space for achieving better zero-shot generalization on new
tasks. Importantly, our DePT is orthogonal to existing prompt tuning methods,
hence it can improve all of them. Extensive experiments on 11 datasets show the
strong flexibility and effectiveness of DePT. Our code and pretrained models
are available at https://github.com/Koorye/DePT.Comment: 13 page
- …