260,678 research outputs found
Generating collaborative systems for digital libraries: A model-driven approach
This is an open access article shared under a Creative Commons Attribution 3.0 Licence (http://creativecommons.org/licenses/by/3.0/). Copyright @ 2010 The Authors.The design and development of a digital library involves different stakeholders, such as: information architects, librarians, and domain experts, who need to agree on a common language to describe, discuss, and negotiate the services the library has to offer. To this end, high-level, language-neutral models have to be devised. Metamodeling techniques favor the definition of domainspecific visual languages through which stakeholders can share their views and directly manipulate representations of the domain entities. This paper describes CRADLE (Cooperative-Relational Approach to Digital Library Environments), a metamodel-based framework and visual language for the definition of notions and services related to the development of digital libraries. A collection of tools allows the automatic generation of several services, defined with the CRADLE visual language, and of the graphical user interfaces providing access to them for the final user. The effectiveness of the approach is illustrated by presenting digital libraries generated with CRADLE, while the CRADLE environment has been evaluated by using the cognitive dimensions framework
PanoGen: Text-Conditioned Panoramic Environment Generation for Vision-and-Language Navigation
Vision-and-Language Navigation (VLN) requires the agent to follow language
instructions to navigate through 3D environments. One main challenge in VLN is
the limited availability of photorealistic training environments, which makes
it hard to generalize to new and unseen environments. To address this problem,
we propose PanoGen, a generation method that can potentially create an infinite
number of diverse panoramic environments conditioned on text. Specifically, we
collect room descriptions by captioning the room images in existing
Matterport3D environments, and leverage a state-of-the-art text-to-image
diffusion model to generate the new panoramic environments. We use recursive
outpainting over the generated images to create consistent 360-degree panorama
views. Our new panoramic environments share similar semantic information with
the original environments by conditioning on text descriptions, which ensures
the co-occurrence of objects in the panorama follows human intuition, and
creates enough diversity in room appearance and layout with image outpainting.
Lastly, we explore two ways of utilizing PanoGen in VLN pre-training and
fine-tuning. We generate instructions for paths in our PanoGen environments
with a speaker built on a pre-trained vision-and-language model for VLN
pre-training, and augment the visual observation with our panoramic
environments during agents' fine-tuning to avoid overfitting to seen
environments. Empirically, learning with our PanoGen environments achieves the
new state-of-the-art on the Room-to-Room, Room-for-Room, and CVDN datasets.
Pre-training with our PanoGen speaker data is especially effective for CVDN,
which has under-specified instructions and needs commonsense knowledge. Lastly,
we show that the agent can benefit from training with more generated panoramic
environments, suggesting promising results for scaling up the PanoGen
environments.Comment: Project Webpage: https://pano-gen.github.io
Creating and controlling visual environments using BonVision.
Real-time rendering of closed-loop visual environments is important for next-generation understanding of brain function and behaviour, but is often prohibitively difficult for non-experts to implement and is limited to few laboratories worldwide. We developed BonVision as an easy-to-use open-source software for the display of virtual or augmented reality, as well as standard visual stimuli. BonVision has been tested on humans and mice, and is capable of supporting new experimental designs in other animal models of vision. As the architecture is based on the open-source Bonsai graphical programming language, BonVision benefits from native integration with experimental hardware. BonVision therefore enables easy implementation of closed-loop experiments, including real-time interaction with deep neural networks, and communication with behavioural and physiological measurement and manipulation devices
D3Net: A Unified Speaker-Listener Architecture for 3D Dense Captioning and Visual Grounding
Recent studies on dense captioning and visual grounding in 3D have achieved
impressive results. Despite developments in both areas, the limited amount of
available 3D vision-language data causes overfitting issues for 3D visual
grounding and 3D dense captioning methods. Also, how to discriminatively
describe objects in complex 3D environments is not fully studied yet. To
address these challenges, we present D3Net, an end-to-end neural
speaker-listener architecture that can detect, describe and discriminate. Our
D3Net unifies dense captioning and visual grounding in 3D in a self-critical
manner. This self-critical property of D3Net also introduces discriminability
during object caption generation and enables semi-supervised training on
ScanNet data with partially annotated descriptions. Our method outperforms SOTA
methods in both tasks on the ScanRefer dataset, surpassing the SOTA 3D dense
captioning method by a significant margin.Comment: Project website: https://daveredrum.github.io/D3Net
Adding Rule-Based Model Transformation to Modelling Languages in MetaEdit+
MetaEdit+ is a commercial tool by MetaCase for creating domain-specific, syntax-directed visual modelling environments. MetaEdit+ synthesizes such environments from user-provided metamodels and contains a Generator Editor for code/report generation. An API is provided to allow external manipulation of models through SOAP. Currently, the MetaEdit+ tool does not natively support rule-based model-to-model transformation. Such transformations are useful as they allow domain experts to intuitively (using domain-specific notations) model either operational semantics (a simulator) or denotational semantics (through model-to-model transformation onto a model in a known formalism) of a modelling language. We will demonstrate how to add rule-based operational semantics to modelling languages in MetaEdit+. In our approach, transformation rules are visually created in MetaEdit+. The rule editor is synthesized using modified versions of the original language's metamodel. This modification is performed in a structured fashion using a process called RAMification. Both the model and the rules are exported from MetaEdit+ to Python code. This code is combined with Py-T-Core, our library of transformation language primitives, to apply the rules on the model. Our demonstration has a client-server architecture, with the MetaEdit+ visual modelling environment as the client and the transformation engine as the server. After each transformation step, in-place changes to the model are propagated to MetaEdit+ for visualization using the SOAP API. A simple (manufacturing) Production System modelling language is used as an example
Exploiting visual salience for the generation of referring expressions
In this paper we present a novel approach to generating
referring expressions (GRE) that is tailored to a model of the visual context the user is attending to. The approach
integrates a new computational model of visual salience in simulated 3-D environments with Dale and Reiter’s (1995) Incremental Algorithm. The advantage of our GRE framework are: (1) the context set used by the GRE algorithm is dynamically computed by the visual saliency algorithm as a user navigates through a simulation; (2) the integration of visual salience into the generation process means that in some instances underspecified but sufficiently detailed descriptions of the target object are generated that are shorter than those generated by GRE algorithms which focus purely on adjectival and type attributes; (3) the integration of visual saliency into the generation process means that our GRE algorithm will in some instances succeed in generating a description of the target object in situations where GRE algorithms which focus purely on adjectival and type attributes fail
- …