1,325 research outputs found

    The Long-Short Story of Movie Description

    Full text link
    Generating descriptions for videos has many applications including assisting blind people and human-robot interaction. The recent advances in image captioning as well as the release of large-scale movie description datasets such as MPII Movie Description allow to study this task in more depth. Many of the proposed methods for image captioning rely on pre-trained object classifier CNNs and Long-Short Term Memory recurrent networks (LSTMs) for generating descriptions. While image description focuses on objects, we argue that it is important to distinguish verbs, objects, and places in the challenging setting of movie description. In this work we show how to learn robust visual classifiers from the weak annotations of the sentence descriptions. Based on these visual classifiers we learn how to generate a description using an LSTM. We explore different design choices to build and train the LSTM and achieve the best performance to date on the challenging MPII-MD dataset. We compare and analyze our approach and prior work along various dimensions to better understand the key challenges of the movie description task

    Move Forward and Tell: A Progressive Generator of Video Descriptions

    Full text link
    We present an efficient framework that can generate a coherent paragraph to describe a given video. Previous works on video captioning usually focus on video clips. They typically treat an entire video as a whole and generate the caption conditioned on a single embedding. On the contrary, we consider videos with rich temporal structures and aim to generate paragraph descriptions that can preserve the story flow while being coherent and concise. Towards this goal, we propose a new approach, which produces a descriptive paragraph by assembling temporally localized descriptions. Given a video, it selects a sequence of distinctive clips and generates sentences thereon in a coherent manner. Particularly, the selection of clips and the production of sentences are done jointly and progressively driven by a recurrent network -- what to describe next depends on what have been said before. Here, the recurrent network is learned via self-critical sequence training with both sentence-level and paragraph-level rewards. On the ActivityNet Captions dataset, our method demonstrated the capability of generating high-quality paragraph descriptions for videos. Compared to those by other methods, the descriptions produced by our method are often more relevant, more coherent, and more concise.Comment: Accepted by ECCV 201

    Effective Monte Carlo simulation on System-V massively parallel associative string processing architecture

    Get PDF
    We show that the latest version of massively parallel processing associative string processing architecture (System-V) is applicable for fast Monte Carlo simulation if an effective on-processor random number generator is implemented. Our lagged Fibonacci generator can produce 10810^8 random numbers on a processor string of 12K PE-s. The time dependent Monte Carlo algorithm of the one-dimensional non-equilibrium kinetic Ising model performs 80 faster than the corresponding serial algorithm on a 300 MHz UltraSparc.Comment: 8 pages, 9 color ps figures embedde

    Behavior and Chemical Signals as Markers of Colony Identification in Argentine Ants (Linepithema Humile)

    Get PDF
    Argentine ants, Linepithema humile, are a highly successful invasive species around the globe and are especially prominent in states such as California and the southeastern United States. L.humile have a unique form of unicoloniality, called “supercolonies”. L. humile can detect colonymates through scent markers in their outer cuticle. With these chemical markers, ants will exhibit high aggression if they smell different from one another. In our study, we performed aggression assays among ten different nest sites and analyzed their CHCs through gas chromatography mass spectrometry, or GC-MS, analysis. For our behavior results, while within-nest interactions displayed low aggression as we expected, we also observed one potential colony composed of three of the collected nests. Through GC-MS Analysis, we were able to detect 58 unique CHC compounds within the ten nests samples but were not able to determine any statistically significant patterns among the data to help further explain the unexpected behavior seen between nests that were friendly towards one another, despite being far in distance. We were able to observe that the samples collected show high variation not only between the nests collected, but between samples derived from within the same nest. The high variation present in our study may indicate that the colonies in Georgia present a more complex relationship between CHCs and colony identity than seen with other introduced colonies such as California, and that it is likely that some much smaller subset of these CHC compounds are involved in colony recognition

    (Dis)harmony in times of crisis? An analysis of COVID-related strategic communication by Swiss public health institutions.

    Get PDF
    OBJECTIVES This study aims to assess COVID-related communication by Swiss public health institutions (PHI) as well as the challenges they faced in implementing their communication strategies. STUDY DESIGN This study uses a two-part mixed methods design, combining automated content analysis of press releases by PHI and semi-structured interviews with PHI communication experts. METHODS The automated content analysis uses natural language processing techniques to measure semantic themes and linguistic properties of 1882 press releases from national and regional PHI during the first year of the COVID-19 pandemic. The semi-structured interviews with 25 communication experts from key PHI explore the challenges faced in implementing their communication strategies. RESULTS The content analysis reveals key themes in press releases, including non-pharmaceutical interventions, quarantine, testing, contact tracing, hospital situations, and the pandemic's impact on the economy. The linguistic measures indicated a decrease in complexity and readability over time, with no significant differences between national and regional PHI. Interviews revealed challenges arising from organizational structures, the multi-systemic nature of the pandemic, and from expectations of the public. CONCLUSIONS The study highlights the importance of agility in public health communication and the need for efficient coordination within and between PHI. Organizational structures should be adapted to allow for more agile modes of operation during crises. Policymakers should clarify roles and responsibilities of different actors in public health frameworks to ensure streamlined communication. Understanding the communication efforts and challenges faced by PHI during the pandemic helps preparing for future health crises and improve public health communication practices

    Conditional Image-Text Embedding Networks

    Full text link
    This paper presents an approach for grounding phrases in images which jointly learns multiple text-conditioned embeddings in a single end-to-end model. In order to differentiate text phrases into semantically distinct subspaces, we propose a concept weight branch that automatically assigns phrases to embeddings, whereas prior works predefine such assignments. Our proposed solution simplifies the representation requirements for individual embeddings and allows the underrepresented concepts to take advantage of the shared representations before feeding them into concept-specific layers. Comprehensive experiments verify the effectiveness of our approach across three phrase grounding datasets, Flickr30K Entities, ReferIt Game, and Visual Genome, where we obtain a (resp.) 4%, 3%, and 4% improvement in grounding performance over a strong region-phrase embedding baseline.Comment: ECCV 2018 accepted pape

    Tilt angle dependent three-dimensional position detection of a trapped cylindrical particle in a focused laser beam

    Get PDF
    We investigated theoretically the applicability of an optically trapped cylindrical particle as a local probe in photonic force microscopy. To do this we calculated the far-field scattering from a subwavelength-sized dielectric cylinder in a highly focused laser field. From this we obtained interferometric three-dimensional-position detection signals and compared these to signals calculated for a spherical particle. We have calculated the accuracy to which the position of an optically trapped cylinder can be determined, as a function of the cylinder’s orientational fluctuations. The position accuracy is better than a few nanometers for tilt angle fluctuations up to several degrees. Our study is relevant for trapping experiments, where the influence of angle fluctuations needs to be estimated

    Learning Visual Question Answering by Bootstrapping Hard Attention

    Full text link
    Attention mechanisms in biological perception are thought to select subsets of perceptual information for more sophisticated processing which would be prohibitive to perform on all sensory inputs. In computer vision, however, there has been relatively little exploration of hard attention, where some information is selectively ignored, in spite of the success of soft attention, where information is re-weighted and aggregated, but never filtered out. Here, we introduce a new approach for hard attention and find it achieves very competitive performance on a recently-released visual question answering datasets, equalling and in some cases surpassing similar soft attention architectures while entirely ignoring some features. Even though the hard attention mechanism is thought to be non-differentiable, we found that the feature magnitudes correlate with semantic relevance, and provide a useful signal for our mechanism's attentional selection criterion. Because hard attention selects important features of the input information, it can also be more efficient than analogous soft attention mechanisms. This is especially important for recent approaches that use non-local pairwise operations, whereby computational and memory costs are quadratic in the size of the set of features.Comment: ECCV 201
    • …
    corecore