735 research outputs found
Seeing What You're Told: Sentence-Guided Activity Recognition In Video
We present a system that demonstrates how the compositional structure of
events, in concert with the compositional structure of language, can interplay
with the underlying focusing mechanisms in video action recognition, thereby
providing a medium, not only for top-down and bottom-up integration, but also
for multi-modal integration between vision and language. We show how the roles
played by participants (nouns), their characteristics (adjectives), the actions
performed (verbs), the manner of such actions (adverbs), and changing spatial
relations between participants (prepositions) in the form of whole sentential
descriptions mediated by a grammar, guides the activity-recognition process.
Further, the utility and expressiveness of our framework is demonstrated by
performing three separate tasks in the domain of multi-activity videos:
sentence-guided focus of attention, generation of sentential descriptions of
video, and query-based video search, simply by leveraging the framework in
different manners.Comment: To appear in CVPR 201
Saying What You're Looking For: Linguistics Meets Video Search
We present an approach to searching large video corpora for video clips which
depict a natural-language query in the form of a sentence. This approach uses
compositional semantics to encode subtle meaning that is lost in other systems,
such as the difference between two sentences which have identical words but
entirely different meaning: "The person rode the horse} vs. \emph{The horse
rode the person". Given a video-sentence pair and a natural-language parser,
along with a grammar that describes the space of sentential queries, we produce
a score which indicates how well the video depicts the sentence. We produce
such a score for each video clip in a corpus and return a ranked list of clips.
Furthermore, this approach addresses two fundamental problems simultaneously:
detecting and tracking objects, and recognizing whether those tracks depict the
query. Because both tracking and object detection are unreliable, this uses
knowledge about the intended sentential query to focus the tracker on the
relevant participants and ensures that the resulting tracks are described by
the sentential query. While earlier work was limited to single-word queries
which correspond to either verbs or nouns, we show how one can search for
complex queries which contain multiple phrases, such as prepositional phrases,
and modifiers, such as adverbs. We demonstrate this approach by searching for
141 queries involving people and horses interacting with each other in 10
full-length Hollywood movies.Comment: 13 pages, 8 figure
Control Synthesis for an Underactuated Cable Suspended System Using Dynamic Decoupling
This article studies the dynamics and control of a novel underactuated
system, wherein a plate suspended by cables and with a freely moving mass on
top, whose other ends are attached to three quadrotors, is sought to be
horizontally stabilized at a certain height, with the ball positioned at the
center of mass of the plate. The freely moving mass introduces a 2-degree of
underactuation into the system. The design proceeds through a decoupling of the
quadrotors and the plate dynamics. Through a partial feedback linearization
approach, the attitude of the plate and the translational height of the plate
is initially controlled, while maintaining a bounded velocity along the and
directions. These inputs are then synthesized through the quadrotors with a
backstepping and timescale separation argument based on Tikhonov's theorem
Correlating LIBS Coal Data for Coal Property Prediction
This report presents results for correlations between coal data derived from laboratory analysis and Laser Induced Breakdown Spectroscopy analysis. LIBS data were used to predict higher order properties of coal using artificial neural network models. Higher order coal properties such as heating value and ash fusion temperature are predicted using LIBS analysis and compared against standard laboratory measurements. Selected formulas for the prediction of coal properties are also presented and compared against the neural network and laboratory results
Electrical Properties of Atomic Layer Deposited Aluminum Oxide on Gallium Nitride
We report on our investigation of the electrical properties of
metal/Al2O3/GaN metal-insulator-semiconductor (MIS) capacitors. We determined
the conduction band offset and interface charge density of the alumina/GaN
interface by analyzing capacitance-voltage characteristics of atomic layer
deposited Al2O3 films on GaN substrates. The conduction band offset at the
Al2O3/GaN interface was calculated to be 2.13 eV, in agreement with theoretical
predications. A non-zero field of 0.93 MV/cm in the oxide under flat-band
conditions in the GaN was inferred, which we attribute to a fixed net positive
charge density of magnitude 4.60x1012 cm-2 at the Al2O3/GaN interface. We
provide hypotheses to explain the origin of this charge by analyzing the energy
band line-up.Comment: 8 pages, 4 figures, Applied Physics Letter
StrAE: Autoencoding for Pre-Trained Embeddings using Explicit Structure
This work presents StrAE: a Structured Autoencoder framework that through
strict adherence to explicit structure, and use of a novel contrastive
objective over tree-structured representations, enables effective learning of
multi-level representations. Through comparison over different forms of
structure, we verify that our results are directly attributable to the
informativeness of the structure provided as input, and show that this is not
the case for existing tree models. We then further extend StrAE to allow the
model to define its own compositions using a simple localised-merge algorithm.
This variant, called Self-StrAE, outperforms baselines that don't involve
explicit hierarchical compositions, and is comparable to models given
informative structure (e.g. constituency parses). Our experiments are conducted
in a data-constrained (circa 10M tokens) setting to help tease apart the
contribution of the inductive bias to effective learning. However, we find that
this framework can be robust to scale, and when extended to a much larger
dataset (circa 100M tokens), our 430 parameter model performs comparably to a
6-layer RoBERTa many orders of magnitude larger in size. Our findings support
the utility of incorporating explicit composition as an inductive bias for
effective representation learning.Comment: EMNLP 2023 Mai
Autoencoding Conditional Neural Processes for Representation Learning
Conditional neural processes (CNPs) are a flexible and efficient family of
models that learn to learn a stochastic process from observations. In the
visual domain, they have seen particular application in contextual image
completion - observing pixel values at some locations to predict a distribution
over values at other unobserved locations. However, the choice of pixels in
learning such a CNP is typically either random or derived from a simple
statistical measure (e.g. pixel variance). Here, we turn the problem on its
head and ask: which pixels would a CNP like to observe? That is, which pixels
allow fitting CNP, and do such pixels tell us something about the underlying
image? Viewing the context provided to the CNP as fixed-size latent
representations, we construct an amortised variational framework, Partial Pixel
Space Variational Autoencoder (PPS-VAE), for predicting this context
simultaneously with learning a CNP. We evaluate PPS-VAE on a set of vision
datasets, and find that not only is it possible to learn context points while
also fitting CNPs, but that their spatial arrangement and values provides
strong signal for the information contained in the image - evaluated through
the lens of classification. We believe the PPS-VAE provides a promising avenue
to explore learning interpretable and effective visual representations
- …