412,746 research outputs found
Coherent Multi-Sentence Video Description with Variable Level of Detail
Humans can easily describe what they see in a coherent way and at varying
level of detail. However, existing approaches for automatic video description
are mainly focused on single sentence generation and produce descriptions at a
fixed level of detail. In this paper, we address both of these limitations: for
a variable level of detail we produce coherent multi-sentence descriptions of
complex videos. We follow a two-step approach where we first learn to predict a
semantic representation (SR) from video and then generate natural language
descriptions from the SR. To produce consistent multi-sentence descriptions, we
model across-sentence consistency at the level of the SR by enforcing a
consistent topic. We also contribute both to the visual recognition of objects
proposing a hand-centric approach as well as to the robust generation of
sentences using a word lattice. Human judges rate our multi-sentence
descriptions as more readable, correct, and relevant than related work. To
understand the difference between more detailed and shorter descriptions, we
collect and analyze a video description corpus of three levels of detail.Comment: 10 page
Describe me an Aucklet: Generating Grounded Perceptual Category Descriptions
Human speakers can generate descriptions of perceptual concepts, abstracted
from the instance-level. Moreover, such descriptions can be used by other
speakers to learn provisional representations of those concepts. Learning and
using abstract perceptual concepts is under-investigated in the
language-and-vision field. The problem is also highly relevant to the field of
representation learning in multi-modal NLP. In this paper, we introduce a
framework for testing category-level perceptual grounding in multi-modal
language models. In particular, we train separate neural networks to generate
and interpret descriptions of visual categories. We measure the communicative
success of the two models with the zero-shot classification performance of the
interpretation model, which we argue is an indicator of perceptual grounding.
Using this framework, we compare the performance of prototype- and
exemplar-based representations. Finally, we show that communicative success
exposes performance issues in the generation model, not captured by traditional
intrinsic NLG evaluation metrics, and argue that these issues stem from a
failure to properly ground language in vision at the category level.Comment: To appear in Proceedings of the 2023 Conference on Empirical Methods
in Natural Language Processing (EMNLP, Main
Modelling Digital Logic in SDL
The specification of digital logic in SDL (Specification and Description Language) is investigated. A specification approach is proposed for multi-level descriptions of hardware behaviour and structure. The modelling method exploits features introduced in SDL-92. The approach also deals with the specification, analysis and simulation of timing aspects at any level in the specification of digital logic
McRunjob: A High Energy Physics Workflow Planner for Grid Production Processing
McRunjob is a powerful grid workflow manager used to manage the generation of
large numbers of production processing jobs in High Energy Physics. In use at
both the DZero and CMS experiments, McRunjob has been used to manage large
Monte Carlo production processing since 1999 and is being extended to uses in
regular production processing for analysis and reconstruction. Described at
CHEP 2001, McRunjob converts core metadata into jobs submittable in a variety
of environments. The powerful core metadata description language includes
methods for converting the metadata into persistent forms, job descriptions,
multi-step workflows, and data provenance information. The language features
allow for structure in the metadata by including full expressions, namespaces,
functional dependencies, site specific parameters in a grid environment, and
ontological definitions. It also has simple control structures for
parallelization of large jobs. McRunjob features a modular design which allows
for easy expansion to new job description languages or new application level
tasks.Comment: CHEP 2003 serial number TUCT00
PoseScript: Linking 3D Human Poses and Natural Language
Natural language plays a critical role in many computer vision applications,
such as image captioning, visual question answering, and cross-modal retrieval,
to provide fine-grained semantic information. Unfortunately, while human pose
is key to human understanding, current 3D human pose datasets lack detailed
language descriptions. To address this issue, we have introduced the PoseScript
dataset. This dataset pairs more than six thousand 3D human poses from AMASS
with rich human-annotated descriptions of the body parts and their spatial
relationships. Additionally, to increase the size of the dataset to a scale
that is compatible with data-hungry learning algorithms, we have proposed an
elaborate captioning process that generates automatic synthetic descriptions in
natural language from given 3D keypoints. This process extracts low-level pose
information, known as "posecodes", using a set of simple but generic rules on
the 3D keypoints. These posecodes are then combined into higher level textual
descriptions using syntactic rules. With automatic annotations, the amount of
available data significantly scales up (100k), making it possible to
effectively pretrain deep models for finetuning on human captions. To showcase
the potential of annotated poses, we present three multi-modal learning tasks
that utilize the PoseScript dataset. Firstly, we develop a pipeline that maps
3D poses and textual descriptions into a joint embedding space, allowing for
cross-modal retrieval of relevant poses from large-scale datasets. Secondly, we
establish a baseline for a text-conditioned model generating 3D poses. Thirdly,
we present a learned process for generating pose descriptions. These
applications demonstrate the versatility and usefulness of annotated poses in
various tasks and pave the way for future research in the field.Comment: Extended version of the ECCV 2022 pape
- …