735 research outputs found

    Seeing What You're Told: Sentence-Guided Activity Recognition In Video

    Get PDF
    We present a system that demonstrates how the compositional structure of events, in concert with the compositional structure of language, can interplay with the underlying focusing mechanisms in video action recognition, thereby providing a medium, not only for top-down and bottom-up integration, but also for multi-modal integration between vision and language. We show how the roles played by participants (nouns), their characteristics (adjectives), the actions performed (verbs), the manner of such actions (adverbs), and changing spatial relations between participants (prepositions) in the form of whole sentential descriptions mediated by a grammar, guides the activity-recognition process. Further, the utility and expressiveness of our framework is demonstrated by performing three separate tasks in the domain of multi-activity videos: sentence-guided focus of attention, generation of sentential descriptions of video, and query-based video search, simply by leveraging the framework in different manners.Comment: To appear in CVPR 201

    Saying What You're Looking For: Linguistics Meets Video Search

    Full text link
    We present an approach to searching large video corpora for video clips which depict a natural-language query in the form of a sentence. This approach uses compositional semantics to encode subtle meaning that is lost in other systems, such as the difference between two sentences which have identical words but entirely different meaning: "The person rode the horse} vs. \emph{The horse rode the person". Given a video-sentence pair and a natural-language parser, along with a grammar that describes the space of sentential queries, we produce a score which indicates how well the video depicts the sentence. We produce such a score for each video clip in a corpus and return a ranked list of clips. Furthermore, this approach addresses two fundamental problems simultaneously: detecting and tracking objects, and recognizing whether those tracks depict the query. Because both tracking and object detection are unreliable, this uses knowledge about the intended sentential query to focus the tracker on the relevant participants and ensures that the resulting tracks are described by the sentential query. While earlier work was limited to single-word queries which correspond to either verbs or nouns, we show how one can search for complex queries which contain multiple phrases, such as prepositional phrases, and modifiers, such as adverbs. We demonstrate this approach by searching for 141 queries involving people and horses interacting with each other in 10 full-length Hollywood movies.Comment: 13 pages, 8 figure

    Control Synthesis for an Underactuated Cable Suspended System Using Dynamic Decoupling

    Full text link
    This article studies the dynamics and control of a novel underactuated system, wherein a plate suspended by cables and with a freely moving mass on top, whose other ends are attached to three quadrotors, is sought to be horizontally stabilized at a certain height, with the ball positioned at the center of mass of the plate. The freely moving mass introduces a 2-degree of underactuation into the system. The design proceeds through a decoupling of the quadrotors and the plate dynamics. Through a partial feedback linearization approach, the attitude of the plate and the translational height of the plate is initially controlled, while maintaining a bounded velocity along the yy and xx directions. These inputs are then synthesized through the quadrotors with a backstepping and timescale separation argument based on Tikhonov's theorem

    Correlating LIBS Coal Data for Coal Property Prediction

    Get PDF
    This report presents results for correlations between coal data derived from laboratory analysis and Laser Induced Breakdown Spectroscopy analysis. LIBS data were used to predict higher order properties of coal using artificial neural network models. Higher order coal properties such as heating value and ash fusion temperature are predicted using LIBS analysis and compared against standard laboratory measurements. Selected formulas for the prediction of coal properties are also presented and compared against the neural network and laboratory results

    Electrical Properties of Atomic Layer Deposited Aluminum Oxide on Gallium Nitride

    Full text link
    We report on our investigation of the electrical properties of metal/Al2O3/GaN metal-insulator-semiconductor (MIS) capacitors. We determined the conduction band offset and interface charge density of the alumina/GaN interface by analyzing capacitance-voltage characteristics of atomic layer deposited Al2O3 films on GaN substrates. The conduction band offset at the Al2O3/GaN interface was calculated to be 2.13 eV, in agreement with theoretical predications. A non-zero field of 0.93 MV/cm in the oxide under flat-band conditions in the GaN was inferred, which we attribute to a fixed net positive charge density of magnitude 4.60x1012 cm-2 at the Al2O3/GaN interface. We provide hypotheses to explain the origin of this charge by analyzing the energy band line-up.Comment: 8 pages, 4 figures, Applied Physics Letter

    StrAE: Autoencoding for Pre-Trained Embeddings using Explicit Structure

    Full text link
    This work presents StrAE: a Structured Autoencoder framework that through strict adherence to explicit structure, and use of a novel contrastive objective over tree-structured representations, enables effective learning of multi-level representations. Through comparison over different forms of structure, we verify that our results are directly attributable to the informativeness of the structure provided as input, and show that this is not the case for existing tree models. We then further extend StrAE to allow the model to define its own compositions using a simple localised-merge algorithm. This variant, called Self-StrAE, outperforms baselines that don't involve explicit hierarchical compositions, and is comparable to models given informative structure (e.g. constituency parses). Our experiments are conducted in a data-constrained (circa 10M tokens) setting to help tease apart the contribution of the inductive bias to effective learning. However, we find that this framework can be robust to scale, and when extended to a much larger dataset (circa 100M tokens), our 430 parameter model performs comparably to a 6-layer RoBERTa many orders of magnitude larger in size. Our findings support the utility of incorporating explicit composition as an inductive bias for effective representation learning.Comment: EMNLP 2023 Mai

    Autoencoding Conditional Neural Processes for Representation Learning

    Full text link
    Conditional neural processes (CNPs) are a flexible and efficient family of models that learn to learn a stochastic process from observations. In the visual domain, they have seen particular application in contextual image completion - observing pixel values at some locations to predict a distribution over values at other unobserved locations. However, the choice of pixels in learning such a CNP is typically either random or derived from a simple statistical measure (e.g. pixel variance). Here, we turn the problem on its head and ask: which pixels would a CNP like to observe? That is, which pixels allow fitting CNP, and do such pixels tell us something about the underlying image? Viewing the context provided to the CNP as fixed-size latent representations, we construct an amortised variational framework, Partial Pixel Space Variational Autoencoder (PPS-VAE), for predicting this context simultaneously with learning a CNP. We evaluate PPS-VAE on a set of vision datasets, and find that not only is it possible to learn context points while also fitting CNPs, but that their spatial arrangement and values provides strong signal for the information contained in the image - evaluated through the lens of classification. We believe the PPS-VAE provides a promising avenue to explore learning interpretable and effective visual representations
    • …
    corecore