1,325 research outputs found
The Long-Short Story of Movie Description
Generating descriptions for videos has many applications including assisting
blind people and human-robot interaction. The recent advances in image
captioning as well as the release of large-scale movie description datasets
such as MPII Movie Description allow to study this task in more depth. Many of
the proposed methods for image captioning rely on pre-trained object classifier
CNNs and Long-Short Term Memory recurrent networks (LSTMs) for generating
descriptions. While image description focuses on objects, we argue that it is
important to distinguish verbs, objects, and places in the challenging setting
of movie description. In this work we show how to learn robust visual
classifiers from the weak annotations of the sentence descriptions. Based on
these visual classifiers we learn how to generate a description using an LSTM.
We explore different design choices to build and train the LSTM and achieve the
best performance to date on the challenging MPII-MD dataset. We compare and
analyze our approach and prior work along various dimensions to better
understand the key challenges of the movie description task
Move Forward and Tell: A Progressive Generator of Video Descriptions
We present an efficient framework that can generate a coherent paragraph to
describe a given video. Previous works on video captioning usually focus on
video clips. They typically treat an entire video as a whole and generate the
caption conditioned on a single embedding. On the contrary, we consider videos
with rich temporal structures and aim to generate paragraph descriptions that
can preserve the story flow while being coherent and concise. Towards this
goal, we propose a new approach, which produces a descriptive paragraph by
assembling temporally localized descriptions. Given a video, it selects a
sequence of distinctive clips and generates sentences thereon in a coherent
manner. Particularly, the selection of clips and the production of sentences
are done jointly and progressively driven by a recurrent network -- what to
describe next depends on what have been said before. Here, the recurrent
network is learned via self-critical sequence training with both sentence-level
and paragraph-level rewards. On the ActivityNet Captions dataset, our method
demonstrated the capability of generating high-quality paragraph descriptions
for videos. Compared to those by other methods, the descriptions produced by
our method are often more relevant, more coherent, and more concise.Comment: Accepted by ECCV 201
Effective Monte Carlo simulation on System-V massively parallel associative string processing architecture
We show that the latest version of massively parallel processing associative
string processing architecture (System-V) is applicable for fast Monte Carlo
simulation if an effective on-processor random number generator is implemented.
Our lagged Fibonacci generator can produce random numbers on a processor
string of 12K PE-s. The time dependent Monte Carlo algorithm of the
one-dimensional non-equilibrium kinetic Ising model performs 80 faster than the
corresponding serial algorithm on a 300 MHz UltraSparc.Comment: 8 pages, 9 color ps figures embedde
Behavior and Chemical Signals as Markers of Colony Identification in Argentine Ants (Linepithema Humile)
Argentine ants, Linepithema humile, are a highly successful invasive species around the globe and are especially prominent in states such as California and the southeastern United States. L.humile have a unique form of unicoloniality, called “supercolonies”. L. humile can detect colonymates through scent markers in their outer cuticle. With these chemical markers, ants will exhibit high aggression if they smell different from one another. In our study, we performed aggression assays among ten different nest sites and analyzed their CHCs through gas chromatography mass spectrometry, or GC-MS, analysis. For our behavior results, while within-nest interactions displayed low aggression as we expected, we also observed one potential colony composed of three of the collected nests. Through GC-MS Analysis, we were able to detect 58 unique CHC compounds within the ten nests samples but were not able to determine any statistically significant patterns among the data to help further explain the unexpected behavior seen between nests that were friendly towards one another, despite being far in distance. We were able to observe that the samples collected show high variation not only between the nests collected, but between samples derived from within the same nest. The high variation present in our study may indicate that the colonies in Georgia present a more complex relationship between CHCs and colony identity than seen with other introduced colonies such as California, and that it is likely that some much smaller subset of these CHC compounds are involved in colony recognition
(Dis)harmony in times of crisis? An analysis of COVID-related strategic communication by Swiss public health institutions.
OBJECTIVES
This study aims to assess COVID-related communication by Swiss public health institutions (PHI) as well as the challenges they faced in implementing their communication strategies.
STUDY DESIGN
This study uses a two-part mixed methods design, combining automated content analysis of press releases by PHI and semi-structured interviews with PHI communication experts.
METHODS
The automated content analysis uses natural language processing techniques to measure semantic themes and linguistic properties of 1882 press releases from national and regional PHI during the first year of the COVID-19 pandemic. The semi-structured interviews with 25 communication experts from key PHI explore the challenges faced in implementing their communication strategies.
RESULTS
The content analysis reveals key themes in press releases, including non-pharmaceutical interventions, quarantine, testing, contact tracing, hospital situations, and the pandemic's impact on the economy. The linguistic measures indicated a decrease in complexity and readability over time, with no significant differences between national and regional PHI. Interviews revealed challenges arising from organizational structures, the multi-systemic nature of the pandemic, and from expectations of the public.
CONCLUSIONS
The study highlights the importance of agility in public health communication and the need for efficient coordination within and between PHI. Organizational structures should be adapted to allow for more agile modes of operation during crises. Policymakers should clarify roles and responsibilities of different actors in public health frameworks to ensure streamlined communication. Understanding the communication efforts and challenges faced by PHI during the pandemic helps preparing for future health crises and improve public health communication practices
Conditional Image-Text Embedding Networks
This paper presents an approach for grounding phrases in images which jointly
learns multiple text-conditioned embeddings in a single end-to-end model. In
order to differentiate text phrases into semantically distinct subspaces, we
propose a concept weight branch that automatically assigns phrases to
embeddings, whereas prior works predefine such assignments. Our proposed
solution simplifies the representation requirements for individual embeddings
and allows the underrepresented concepts to take advantage of the shared
representations before feeding them into concept-specific layers. Comprehensive
experiments verify the effectiveness of our approach across three phrase
grounding datasets, Flickr30K Entities, ReferIt Game, and Visual Genome, where
we obtain a (resp.) 4%, 3%, and 4% improvement in grounding performance over a
strong region-phrase embedding baseline.Comment: ECCV 2018 accepted pape
Tilt angle dependent three-dimensional position detection of a trapped cylindrical particle in a focused laser beam
We investigated theoretically the applicability of an optically trapped cylindrical particle as a local probe in photonic force microscopy. To do this we calculated the far-field scattering from a subwavelength-sized dielectric cylinder in a highly focused laser field. From this we obtained interferometric three-dimensional-position detection signals and compared these to signals calculated for a spherical particle. We have calculated the accuracy to which the position of an optically trapped cylinder can be determined, as a function of the cylinder’s orientational fluctuations. The position accuracy is better than a few nanometers for tilt angle fluctuations up to several degrees. Our study is relevant for trapping experiments, where the influence of angle fluctuations needs to be estimated
Learning Visual Question Answering by Bootstrapping Hard Attention
Attention mechanisms in biological perception are thought to select subsets
of perceptual information for more sophisticated processing which would be
prohibitive to perform on all sensory inputs. In computer vision, however,
there has been relatively little exploration of hard attention, where some
information is selectively ignored, in spite of the success of soft attention,
where information is re-weighted and aggregated, but never filtered out. Here,
we introduce a new approach for hard attention and find it achieves very
competitive performance on a recently-released visual question answering
datasets, equalling and in some cases surpassing similar soft attention
architectures while entirely ignoring some features. Even though the hard
attention mechanism is thought to be non-differentiable, we found that the
feature magnitudes correlate with semantic relevance, and provide a useful
signal for our mechanism's attentional selection criterion. Because hard
attention selects important features of the input information, it can also be
more efficient than analogous soft attention mechanisms. This is especially
important for recent approaches that use non-local pairwise operations, whereby
computational and memory costs are quadratic in the size of the set of
features.Comment: ECCV 201
- …