Search CORE

1,129 research outputs found

Developmental Robots - A New Paradigm

Author: Weng Juyang
Zhang Yilu
Publication venue: Lund University Cognitive Studies
Publication date: 01/01/2002
Field of study

It has been proved to be extremely challenging for humans to program a robot to such a sufficient degree that it acts properly in a typical unknown human environment. This is especially true for a humanoid robot due to the very large number of redundant degrees of freedom and a large number of sensors that are required for a humanoid to work safely and effectively in the human environment. How can we address this fundamental problem? Motivated by human mental development from infancy to adulthood, we present a theory, an architecture, and some experimental results showing how to enable a robot to develop its mind automatically, through online, real time interactions with its environment. Humans mentally “raise” the robot through “robot sitting” and “robot schools” instead of task-specific robot programming

CiteSeerX

Transductive Visual Verb Sense Disambiguation

Author: Aslan Sinem
Bigaglia Gianluca
Giudice Lorenzo
Pelillo Marcello
Vascon Sebastiano
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2021
Field of study

Verb Sense Disambiguation is a well-known task in NLP, the aim is to find the correct sense of a verb in a sentence. Recently, this problem has been extended in a multimodal scenario, by exploiting both textual and visual features of ambiguous verbs leading to a new problem, the Visual Verb Sense Disambiguation (VVSD). Here, the sense of a verb is assigned considering the content of an image paired with it rather than a sentence in which the verb appears. Annotating a dataset for this task is more complex than textual disambiguation, because assigning the correct sense to a pair of requires both non-trivial linguistic and visual skills. In this work, differently from the literature, the VVSD task will be performed in a transductive semi-supervised learning (SSL) setting, in which only a small amount of labeled information is required, reducing tremendously the need for annotated data. The disambiguation process is based on a graph-based label propagation method which takes into account mono or multimodal representations for pairs. Experiments have been carried out on the recently published dataset VerSe, the only available dataset for this task. The achieved results outperform the current state-of-the-art by a large margin while using only a small fraction of labeled samples per sens

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

Disambiguating Visual Verbs

Author: Gella Spandana
Keller Frank
Lapata Mirella
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/02/2019
Field of study

Edinburgh Research Explorer

Multi modal multi-semantic image retrieval

Author: Kesorn Kraisak
Publication venue
Publication date: 01/01/2010
Field of study

PhDThe rapid growth in the volume of visual information, e.g. image, and video can overwhelm users’ ability to find and access the specific visual information of interest to them. In recent years, ontology knowledge-based (KB) image information retrieval techniques have been adopted into in order to attempt to extract knowledge from these images, enhancing the retrieval performance. A KB framework is presented to promote semi-automatic annotation and semantic image retrieval using multimodal cues (visual features and text captions). In addition, a hierarchical structure for the KB allows metadata to be shared that supports multi-semantics (polysemy) for concepts. The framework builds up an effective knowledge base pertaining to a domain specific image collection, e.g. sports, and is able to disambiguate and assign high level semantics to ‘unannotated’ images. Local feature analysis of visual content, namely using Scale Invariant Feature Transform (SIFT) descriptors, have been deployed in the ‘Bag of Visual Words’ model (BVW) as an effective method to represent visual content information and to enhance its classification and retrieval. Local features are more useful than global features, e.g. colour, shape or texture, as they are invariant to image scale, orientation and camera angle. An innovative approach is proposed for the representation, annotation and retrieval of visual content using a hybrid technique based upon the use of an unstructured visual word and upon a (structured) hierarchical ontology KB model. The structural model facilitates the disambiguation of unstructured visual words and a more effective classification of visual content, compared to a vector space model, through exploiting local conceptual structures and their relationships. The key contributions of this framework in using local features for image representation include: first, a method to generate visual words using the semantic local adaptive clustering (SLAC) algorithm which takes term weight and spatial locations of keypoints into account. Consequently, the semantic information is preserved. Second a technique is used to detect the domain specific ‘non-informative visual words’ which are ineffective at representing the content of visual data and degrade its categorisation ability. Third, a method to combine an ontology model with xi a visual word model to resolve synonym (visual heterogeneity) and polysemy problems, is proposed. The experimental results show that this approach can discover semantically meaningful visual content descriptions and recognise specific events, e.g., sports events, depicted in images efficiently. Since discovering the semantics of an image is an extremely challenging problem, one promising approach to enhance visual content interpretation is to use any associated textual information that accompanies an image, as a cue to predict the meaning of an image, by transforming this textual information into a structured annotation for an image e.g. using XML, RDF, OWL or MPEG-7. Although, text and image are distinct types of information representation and modality, there are some strong, invariant, implicit, connections between images and any accompanying text information. Semantic analysis of image captions can be used by image retrieval systems to retrieve selected images more precisely. To do this, a Natural Language Processing (NLP) is exploited firstly in order to extract concepts from image captions. Next, an ontology-based knowledge model is deployed in order to resolve natural language ambiguities. To deal with the accompanying text information, two methods to extract knowledge from textual information have been proposed. First, metadata can be extracted automatically from text captions and restructured with respect to a semantic model. Second, the use of LSI in relation to a domain-specific ontology-based knowledge model enables the combined framework to tolerate ambiguities and variations (incompleteness) of metadata. The use of the ontology-based knowledge model allows the system to find indirectly relevant concepts in image captions and thus leverage these to represent the semantics of images at a higher level. Experimental results show that the proposed framework significantly enhances image retrieval and leads to narrowing of the semantic gap between lower level machinederived and higher level human-understandable conceptualisation

Queen Mary Research Online

Bitter taste stimuli induce differential neural codes in mouse brain.

Author: Boughter John D
Lemon Christian H
Wilson David M
Publication venue: eScholarship, University of California
Publication date: 01/01/2012
Field of study

A growing literature suggests taste stimuli commonly classified as "bitter" induce heterogeneous neural and perceptual responses. Here, the central processing of bitter stimuli was studied in mice with genetically controlled bitter taste profiles. Using these mice removed genetic heterogeneity as a factor influencing gustatory neural codes for bitter stimuli. Electrophysiological activity (spikes) was recorded from single neurons in the nucleus tractus solitarius during oral delivery of taste solutions (26 total), including concentration series of the bitter tastants quinine, denatonium benzoate, cycloheximide, and sucrose octaacetate (SOA), presented to the whole mouth for 5 s. Seventy-nine neurons were sampled; in many cases multiple cells (2 to 5) were recorded from a mouse. Results showed bitter stimuli induced variable gustatory activity. For example, although some neurons responded robustly to quinine and cycloheximide, others displayed concentration-dependent activity (p<0.05) to quinine but not cycloheximide. Differential activity to bitter stimuli was observed across multiple neurons recorded from one animal in several mice. Across all cells, quinine and denatonium induced correlated spatial responses that differed (p<0.05) from those to cycloheximide and SOA. Modeling spatiotemporal neural ensemble activity revealed responses to quinine/denatonium and cycloheximide/SOA diverged during only an early, at least 1 s wide period of the taste response. Our findings highlight how temporal features of sensory processing contribute differences among bitter taste codes and build on data suggesting heterogeneity among "bitter" stimuli, data that challenge a strict monoguesia model for the bitter quality

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

Languages adapt to their contextual niche

Author: Kirby Simon
Smith Kenny
Winters James
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 01/01/2014
Field of study

Edinburgh Research Explorer

Perceptual Strategies and Neuronal Underpinnings underlying Pattern Recognition through Visual and Tactile Sensory Modalities in Rats

Author: Di Filippo Alessandro
Publication venue: place:Trieste
Publication date: 29/05/2015
Field of study

The aim of my PhD project was to investigate multisensory perception and multimodal recognition abilities in the rat, to better understand the underlying perceptual strategies and neuronal mechanisms. I have chosen to carry out this project on the laboratory rat, for two reasons. First, the rat is a flexible and highly accessible experimental model, where it is possible to combine state-of-the-art neurophysiological approaches (such as multi-electrode neuronal recordings) with behavioral investigation of perception and (more in general) cognition. Second, extensive research concerning multimodal integration has already been conducted in this species, both at the neurophysiological and behavioral level. My thesis work has been organized in two projects: a psychophysical assessment of object categorization abilities in rats, and a neurophysiological study of neuronal tuning in the primary visual cortex of anaesthetized rats. In both experiments, unisensory (visual and tactile) and multisensory (visuo-tactile) stimulation has been used for training and testing, depending on the task. The first project has required development of a new experimental rig for the study of object categorization in rat, using solid objects, so as to be able to assess their recognition abilities under different modalities: vision, touch and both together. The second project involved an electrophysiological study of rat primary visual cortex, during visual, tactile and visuo-tactile stimulation, with the aim of understanding whether any interaction between these modalities exists, in an area that is mainly deputed to one of them. The results of both of the studies are still preliminary, but they already offer some interesting insights on the defining features of these abilities

Sissa Digital Library

Recommended from our members

Continually improving grounded natural language understanding through human-robot dialog

Author: Thomason Jesse David
Publication venue
Publication date: 23/08/2018
Field of study

As robots become ubiquitous in homes and workplaces such as hospitals and factories, they must be able to communicate with humans. Several kinds of knowledge are required to understand and respond to a human's natural language commands and questions. If a person requests an assistant robot to take me to Alice's office, the robot must know that Alice is a person who owns some unique office, and that take me means it should navigate there. Similarly, if a person requests bring me the heavy, green mug, the robot must have accurate mental models of the physical concepts heavy, green, and mug. To avoid forcing humans to use key phrases or words robots already know, this thesis focuses on helping robots understanding new language constructs through interactions with humans and with the world around them. To understand a command in natural language, a robot must first convert that command to an internal representation that it can reason with. Semantic parsing is a method for performing this conversion, and the target representation is often semantic forms represented as predicate logic with lambda calculus. Traditional semantic parsing relies on hand-crafted resources from a human expert: an ontology of concepts, a lexicon connecting language to those concepts, and training examples of language with abstract meanings. One thrust of this thesis is to perform semantic parsing with sparse initial data. We use the conversations between a robot and human users to induce pairs of natural language utterances with the target semantic forms a robot discovers through its questions, reducing the annotation effort of creating training examples for parsing. We use this data to build more dialog-capable robots in new domains with much less expert human effort (Thomason et al., 2015; Padmakumar et al., 2017). Meanings of many language concepts are bound to the physical world. Understanding object properties and categories, such as heavy, green, and mug requires interacting with and perceiving the physical world. Embodied robots can use manipulation capabilities, such as pushing, picking up, and dropping objects to gather sensory data about them. This data can be used to understand non-visual concepts like heavy and empty (e.g. get the empty carton of milk from the fridge), and assist with concepts that have both visual and non-visual expression (e.g. tall things look big and also exert force sooner than short things when pressed down on). A second thrust of this thesis focuses on strategies for learning these concepts using multi-modal sensory information. We use human-in-the-loop learning to get labels between concept words and actual objects in the environment (Thomason et al., 2016, 2017). We also explore ways to tease out polysemy and synonymy in concept words (Thomason and Mooney, 2017) such as light, which can refer to a weight or a color, the latter sense being synonymous with pale. Additionally, pushing, picking up, and dropping objects to gather sensory information is prohibitively time-consuming, so we investigate strategies for using linguistic information and human input to expedite exploration when learning a new concept (Thomason et al., 2018). Finally, we build an integrated agent with both parsing and perception capabilities that learns from conversations with users to improve both components over time. We demonstrate that parser learning from conversations (Thomason et al., 2015) can be combined with multi-modal perception (Thomason et al., 2016) using predicate-object labels gathered through opportunistic active learning (Thomason et al., 2017) during those conversations to improve performance for understanding natural language commands from humans. Human users also qualitatively rate this integrated learning agent as more usable after it has improved from conversation-based learning.Computer Science

Texas ScholarWorks