Search CORE

26,627 research outputs found

A Review of Codebook Models in Patch-Based Visual Object Recognition

Author: Niranjan Mahesan
Ramanan Amirthalingam
Publication venue
Publication date: 22/09/2011
Field of study

The codebook model-based approach, while ignoring any structural aspect in vision, nonetheless provides state-of-the-art performances on current datasets. The key role of a visual codebook is to provide a way to map the low-level features into a fixed-length vector in histogram space to which standard classifiers can be directly applied. The discriminative power of such a visual codebook determines the quality of the codebook model, whereas the size of the codebook controls the complexity of the model. Thus, the construction of a codebook is an important step which is usually done by cluster analysis. However, clustering is a process that retains regions of high density in a distribution and it follows that the resulting codebook need not have discriminant properties. This is also recognised as a computational bottleneck of such systems. In our recent work, we proposed a resource-allocating codebook, to constructing a discriminant codebook in a one-pass design procedure that slightly outperforms more traditional approaches at drastically reduced computing times. In this review we survey several approaches that have been proposed over the last decade with their use of feature detectors, descriptors, codebook construction schemes, choice of classifiers in recognising objects, and datasets that were used in evaluating the proposed methods

Southampton (e-Prints Soton)

Unitization during Category Learning

Author: Goldstone Robert Lee
Publication venue
Publication date: 01/01/2000
Field of study

Five experiments explored the question of whether new perceptual units can be developed if they are diagnostic for a category learning task, and if so, what are the constraints on this unitization process? During category learning, participants were required to attend either a single component or a conjunction of five components in order to correctly categorize an object. In Experiments 1-4, some evidence for unitization was found in that the conjunctive task becomes much easier with practice, and this improvement was not found for the single component task, or for conjunctive tasks where the components cannot be unitized. Influences of component order (Experiment 1), component contiguity (Experiment 2), component proximity (Experiment 3), and number of components (Experiment 4) on practice effects were found. Using a Fourier Transformation method for deconvolving response times (Experiment 5), prolonged practice effects yielded responses that were faster than expected by analytic model that integrate evidence from independently perceived components

Crossref

CogPrints Cognitive Sciences Eprint Archive

Symbol Emergence in Robotics: A Survey

Author: Asoh Hideki
Iwahashi Naoto
Nagai Takayuki
Nakamura Tomoaki
Ogata Tetsuya
Taniguchi Tadahiro
Publication venue
Publication date: 29/09/2015
Field of study

Humans can learn the use of language through physical interaction with their environment and semiotic communication with other people. It is very important to obtain a computational understanding of how humans can form a symbol system and obtain semiotic skills through their autonomous mental development. Recently, many studies have been conducted on the construction of robotic systems and machine-learning methods that can learn the use of language through embodied multimodal interaction with their environment and other systems. Understanding human social interactions and developing a robot that can smoothly communicate with human users in the long term, requires an understanding of the dynamics of symbol systems and is crucially important. The embodied cognition and social interaction of participants gradually change a symbol system in a constructive manner. In this paper, we introduce a field of research called symbol emergence in robotics (SER). SER is a constructive approach towards an emergent symbol system. The emergent symbol system is socially self-organized through both semiotic communications and physical interactions with autonomous cognitive developmental agents, i.e., humans and developmental robots. Specifically, we describe some state-of-art research topics concerning SER, e.g., multimodal categorization, word discovery, and a double articulation analysis, that enable a robot to obtain words and their embodied meanings from raw sensory--motor information, including visual information, haptic information, auditory information, and acoustic speech signals, in a totally unsupervised manner. Finally, we suggest future directions of research in SER.Comment: submitted to Advanced Robotic

arXiv.org e-Print Archive

Deep filter banks for texture recognition, description, and segmentation

Author: Cimpoi Mircea
Kokkinos Iasonas
Maji Subhransu
Vedaldi Andrea
Publication venue
Publication date: 18/11/2015
Field of study

Visual textures have played a key role in image understanding because they convey important semantics of images, and because texture representations that pool local image descriptors in an orderless manner have had a tremendous impact in diverse applications. In this paper we make several contributions to texture understanding. First, instead of focusing on texture instance and material category recognition, we propose a human-interpretable vocabulary of texture attributes to describe common texture patterns, complemented by a new describable texture dataset for benchmarking. Second, we look at the problem of recognizing materials and texture attributes in realistic imaging conditions, including when textures appear in clutter, developing corresponding benchmarks on top of the recently proposed OpenSurfaces dataset. Third, we revisit classic texture representations, including bag-of-visual-words and the Fisher vectors, in the context of deep learning and show that these have excellent efficiency and generalization properties if the convolutional layers of a deep model are used as filter banks. We obtain in this manner state-of-the-art performance in numerous datasets well beyond textures, an efficient method to apply deep features to image regions, as well as benefit in transferring features from one domain to another.Comment: 29 pages; 13 figures; 8 table

arXiv.org e-Print Archive

HAL-CentraleSupelec

Springer - Publisher Connector

INRIA a CCSD electronic archive server

UCL Discovery

PubMed Central

Oxford University Research Archive

HAL-Rennes 1

Multi modal multi-semantic image retrieval

Author: Kesorn Kraisak
Publication venue
Publication date: 01/01/2010
Field of study

PhDThe rapid growth in the volume of visual information, e.g. image, and video can overwhelm users’ ability to find and access the specific visual information of interest to them. In recent years, ontology knowledge-based (KB) image information retrieval techniques have been adopted into in order to attempt to extract knowledge from these images, enhancing the retrieval performance. A KB framework is presented to promote semi-automatic annotation and semantic image retrieval using multimodal cues (visual features and text captions). In addition, a hierarchical structure for the KB allows metadata to be shared that supports multi-semantics (polysemy) for concepts. The framework builds up an effective knowledge base pertaining to a domain specific image collection, e.g. sports, and is able to disambiguate and assign high level semantics to ‘unannotated’ images. Local feature analysis of visual content, namely using Scale Invariant Feature Transform (SIFT) descriptors, have been deployed in the ‘Bag of Visual Words’ model (BVW) as an effective method to represent visual content information and to enhance its classification and retrieval. Local features are more useful than global features, e.g. colour, shape or texture, as they are invariant to image scale, orientation and camera angle. An innovative approach is proposed for the representation, annotation and retrieval of visual content using a hybrid technique based upon the use of an unstructured visual word and upon a (structured) hierarchical ontology KB model. The structural model facilitates the disambiguation of unstructured visual words and a more effective classification of visual content, compared to a vector space model, through exploiting local conceptual structures and their relationships. The key contributions of this framework in using local features for image representation include: first, a method to generate visual words using the semantic local adaptive clustering (SLAC) algorithm which takes term weight and spatial locations of keypoints into account. Consequently, the semantic information is preserved. Second a technique is used to detect the domain specific ‘non-informative visual words’ which are ineffective at representing the content of visual data and degrade its categorisation ability. Third, a method to combine an ontology model with xi a visual word model to resolve synonym (visual heterogeneity) and polysemy problems, is proposed. The experimental results show that this approach can discover semantically meaningful visual content descriptions and recognise specific events, e.g., sports events, depicted in images efficiently. Since discovering the semantics of an image is an extremely challenging problem, one promising approach to enhance visual content interpretation is to use any associated textual information that accompanies an image, as a cue to predict the meaning of an image, by transforming this textual information into a structured annotation for an image e.g. using XML, RDF, OWL or MPEG-7. Although, text and image are distinct types of information representation and modality, there are some strong, invariant, implicit, connections between images and any accompanying text information. Semantic analysis of image captions can be used by image retrieval systems to retrieve selected images more precisely. To do this, a Natural Language Processing (NLP) is exploited firstly in order to extract concepts from image captions. Next, an ontology-based knowledge model is deployed in order to resolve natural language ambiguities. To deal with the accompanying text information, two methods to extract knowledge from textual information have been proposed. First, metadata can be extracted automatically from text captions and restructured with respect to a semantic model. Second, the use of LSI in relation to a domain-specific ontology-based knowledge model enables the combined framework to tolerate ambiguities and variations (incompleteness) of metadata. The use of the ontology-based knowledge model allows the system to find indirectly relevant concepts in image captions and thus leverage these to represent the semantics of images at a higher level. Experimental results show that the proposed framework significantly enhances image retrieval and leads to narrowing of the semantic gap between lower level machinederived and higher level human-understandable conceptualisation

Queen Mary Research Online

The Descriptive Challenges of Fiber Art

Author: Lunin Lois F.
Publication venue: Graduate School of Library and Information Science. University of Illinois at Urbana-Champaign
Publication date: 01/01/1990
Field of study

published or submitted for publicatio

Illinois Digital Environment for Access to Learning and Scholarship Repository

Recommended from our members

Error analysis of expressive analogy task in Spanish-English bilingual school age children with and without specific language impairment

Author: Moreno Beverly
Publication venue
Publication date: 20/10/2015
Field of study

textPurpose: The relational shift hypothesis (RSH) states that, as children age, the way in which they interpret analogies shifts from a focus on object similarities to relational aspects of objects. This study investigated the validity of the RSH by describing the error patterns of typically developing (TD), low normal (LN), and language impaired (LI) bilingual school-age children when completing an expressive analogy task in A:B::C:D format (e.g. good:bad::happy:_____) in English and Spanish. Method: Participants included a total of 49 Spanish-English bilingual children between the ages of 7;4 and 8; 9 (mean = 8; 1). Ten children were identified as LI, ten scored in the LN range, and 29 were TD. Children were administered English and Spanish versions of the item twice, initially during the second grade and once again approximately one year later. Responses were recorded verbatim and coded as correct (C), thematic/category error (THEM/CAT), wrong object, correct relationship error (WO-CR), unrelated error (UNREL), or repetition/no response (REP/NR). Results: A repeated measures ANOVA was used to compare children’s analogy scores by time, ability, and language. Results demonstrated significant differences for ability. Four chi square tests investigated the error patterns of TD, LN, and LI bilingual children in English and Spanish. We compared responses provided children by response type (C, THEM/CAT, WO-CR, UNREL, or REP/NR). Results from the Spanish analogical reasoning task indicated a decrease in THEM/CAT with age for the LN and TD children. Results from the English analogical reasoning task were inconsistent. Conclusions: Results provide partial support for the RSH in LN and TD children, but not in children with LI. This difference in error patterns may provide insight into the validity of the RSH in bilingual children with specific language impairment and typically developing second language learners.Communication Sciences and Disorder

Texas ScholarWorks