Search CORE

10,339 research outputs found

Semantic Visual Localization

Author: Geiger Andreas
Pollefeys Marc
Sattler Torsten
Schönberger Johannes L.
Publication venue
Publication date: 01/01/2018
Field of study

Robust visual localization under a wide range of viewing conditions is a fundamental problem in computer vision. Handling the difficult cases of this problem is not only very challenging but also of high practical relevance, e.g., in the context of life-long localization for augmented reality or autonomous robots. In this paper, we propose a novel approach based on a joint 3D geometric and semantic understanding of the world, enabling it to succeed under conditions where previous approaches failed. Our method leverages a novel generative model for descriptor learning, trained on semantic scene completion as an auxiliary task. The resulting 3D descriptors are robust to missing observations by encoding high-level 3D geometric and semantic information. Experiments on several challenging large-scale localization datasets demonstrate reliable localization under extreme viewpoint, illumination, and geometry changes

arXiv.org e-Print Archive

MPG.PuRe

Systems Biology Graphical Notation: Activity Flow language Level 1

Author: Anatoly Sorokin
Falk Schreiber
Huaiyu Mi
Nicolas Le Nov&#xe9
Stuart Moodie
Publication venue
Publication date: 05/09/2009
Field of study

Standard graphical representations have played a crucial role in science and engineering throughout the last century. Without electrical symbolism, it is very likely that our industrial society would not have evolved at the same pace. Similarly, specialized notations such as the Feynmann notation or the process flow diagrams did a lot for the adoption of concepts in their own fields. With the advent of Systems Biology, and more recently of Synthetic Biology, the need for precise and unambiguous descriptions of biochemical interactions has become more pressing. While some ideas have been advanced over the last decade, with a few detailed proposals, no actual community standard has emerged. The Systems Biology Graphical Notation (SBGN) is a graphical representation crafted over several years by a community of biochemists, modellers and computer scientists. Three orthogonal and complementary languages have been created, the Process Descriptions, the Entity Relationships and the Activity Flows. Using these three idioms a scientist can represent any network of biochemical interactions, which can then be interpreted in an unambiguous way. The set of symbols used is limited, and the grammar quite simple, to allow its usage ranging from textbooks and teaching in high schools to peer reviewed articles in scientific journals. The first level of the SBGN Activity Flow language has been publicly released. Shared by the communities of biochemists, genomic scientists, theoreticians and computational biologists, SBGN languages will foster efficient storage, exchange and reuse of information on signaling pathways, metabolic networks and gene regulatory maps

Crossref

Nature Precedings

Visual Question Answering: A Survey of Methods and Datasets

Author: Dick Anthony
Hengel Anton van den
Shen Chunhua
Teney Damien
Wang Peng
Wu Qi
Publication venue
Publication date: 20/07/2016
Field of study

Visual Question Answering (VQA) is a challenging task that has received increasing attention from both the computer vision and the natural language processing communities. Given an image and a question in natural language, it requires reasoning over visual elements of the image and general knowledge to infer the correct answer. In the first part of this survey, we examine the state of the art by comparing modern approaches to the problem. We classify methods by their mechanism to connect the visual and textual modalities. In particular, we examine the common approach of combining convolutional and recurrent neural networks to map images and questions to a common feature space. We also discuss memory-augmented and modular architectures that interface with structured knowledge bases. In the second part of this survey, we review the datasets available for training and evaluating VQA systems. The various datatsets contain questions at different levels of complexity, which require different capabilities and types of reasoning. We examine in depth the question/answer pairs from the Visual Genome project, and evaluate the relevance of the structured annotations of images with scene graphs for VQA. Finally, we discuss promising future directions for the field, in particular the connection to structured knowledge bases and the use of natural language processing models.Comment: 25 page

arXiv.org e-Print Archive

Adelaide Research & Scholarship

Language Embedded Radiance Fields for Zero-Shot Task-Oriented Grasping

Author: Chen Lawrence
Goldberg Ken
Kanazawa Angjoo
Kerr Justin
Kim Chung Min
Rashid Adam
Sharma Satvik
Publication venue
Publication date: 18/09/2023
Field of study

Grasping objects by a specific part is often crucial for safety and for executing downstream tasks. Yet, learning-based grasp planners lack this behavior unless they are trained on specific object part data, making it a significant challenge to scale object diversity. Instead, we propose LERF-TOGO, Language Embedded Radiance Fields for Task-Oriented Grasping of Objects, which uses vision-language models zero-shot to output a grasp distribution over an object given a natural language query. To accomplish this, we first reconstruct a LERF of the scene, which distills CLIP embeddings into a multi-scale 3D language field queryable with text. However, LERF has no sense of objectness, meaning its relevancy outputs often return incomplete activations over an object which are insufficient for subsequent part queries. LERF-TOGO mitigates this lack of spatial grouping by extracting a 3D object mask via DINO features and then conditionally querying LERF on this mask to obtain a semantic distribution over the object with which to rank grasps from an off-the-shelf grasp planner. We evaluate LERF-TOGO's ability to grasp task-oriented object parts on 31 different physical objects, and find it selects grasps on the correct part in 81% of all trials and grasps successfully in 69%. See the project website at: lerftogo.github.ioComment: See the project website at: lerftogo.github.i

arXiv.org e-Print Archive

Knowledge-rich Image Gist Understanding Beyond Literal Meaning

Author: Dietz Laura
Effelsberg Wolfgang
Hulpus Ioana
Ponzetto Simone Paolo
Weiland Lydia
Publication venue: 'Elsevier BV'
Publication date: 01/01/2018
Field of study

We investigate the problem of understanding the message (gist) conveyed by images and their captions as found, for instance, on websites or news articles. To this end, we propose a methodology to capture the meaning of image-caption pairs on the basis of large amounts of machine-readable knowledge that has previously been shown to be highly effective for text understanding. Our method identifies the connotation of objects beyond their denotation: where most approaches to image understanding focus on the denotation of objects, i.e., their literal meaning, our work addresses the identification of connotations, i.e., iconic meanings of objects, to understand the message of images. We view image understanding as the task of representing an image-caption pair on the basis of a wide-coverage vocabulary of concepts such as the one provided by Wikipedia, and cast gist detection as a concept-ranking problem with image-caption pairs as queries. To enable a thorough investigation of the problem of gist understanding, we produce a gold standard of over 300 image-caption pairs and over 8,000 gist annotations covering a wide variety of topics at different levels of abstraction. We use this dataset to experimentally benchmark the contribution of signals from heterogeneous sources, namely image and text. The best result with a Mean Average Precision (MAP) of 0.69 indicate that by combining both dimensions we are able to better understand the meaning of our image-caption pairs than when using language or vision information alone. We test the robustness of our gist detection approach when receiving automatically generated input, i.e., using automatically generated image tags or generated captions, and prove the feasibility of an end-to-end automated process

arXiv.org e-Print Archive

MAnnheim DOCument Server

Seeking meaning: Examining a cross-situational solution to learn action verbs using human simulation paradigm

Author: Amatuni A.
Crain E.
Yu C.
Zhang Y.
Publication venue
Publication date: 01/07/2020
Field of study

To acquire the meaning of a verb, language learners not only need to find the correct mapping between a specific verb and an action or event in the world, but also infer the underlying relational meaning that the verb encodes. Most verb naming instances in naturalistic contexts are highly ambiguous as many possible actions can be embedded in the same scenario and many possible verbs can be used to describe those actions. To understand whether learners can find the correct verb meaning from referentially ambiguous learning situations, we conducted three experiments using the Human Simulation Paradigm with adult learners. Our results suggest that although finding the right verb meaning from one learning instance is hard, there is a statistical solution to this problem. When provided with multiple verb learning instances all referring to the same verb, learners are able to aggregate information across situations and gradually converge to the correct semantic space. Even in cases where they may not guess the exact target verb, they can still discover the right meaning by guessing a similar verb that is semantically close to the ground truth

MPG.PuRe

Recommended from our members

Gesture production and comprehension in children with specific language impairment

Author: Alibali
Bates
Bavin
Bavin
Blake
Botting
Broaders
Brownwell
Capirci
Capirci
Capone
Clibbens
Cocks
Cohen
Conti-Ramsden
Dunn
Evans
Gary Morgan
Goldin-Meadow
Goldin-Meadow
Goldin-Meadow
Hick
Hill
Ingersoll
Kita
Lely
Maniela-Arnold
Marguerite Gaynor
Marton
McGregor
McNeil
McNeill
Namy
Nicholas Riches
Nicola Botting
Ramus
Raven
Stefanini
Thal
Thomas
Tomblin
Volterra
Wagner
Weismer
Publication venue: 'Wiley'
Publication date: 01/03/2010
Field of study

Children with specific language impairment (SLI) have difficulties with spoken language. However, some recent research suggests that these impairments reflect underlying cognitive limitations. Studying gesture may inform us clinically and theoretically about the nature of the association between language and cognition. A total of 20 children with SLI and 19 typically developing (TD) peers were assessed on a novel measure of gesture production. Children were also assessed for sentence comprehension errors in a speech-gesture integration task. Children with SLI performed equally to peers on gesture production but performed less well when comprehending integrated speech and gesture. Error patterns revealed a significant group interaction: children with SLI made more gesture-based errors, whilst TD children made semantically based ones. Children with SLI accessed and produced lexically encoded gestures despite having impaired spoken vocabulary and this group also showed stronger associations between gesture and language than TD children. When SLI comprehension breaks down, gesture may be relied on over speech, whilst TD children have a preference for spoken cues. The findings suggest that for children with SLI, gesture scaffolds are still more related to language development than for TD peers who have out-grown earlier reliance on gestures. Future clinical implications may include standardized assessment of symbolic gesture and classroom based gesture support for clinical groups

City Research Online

Crossref