Search CORE

421 research outputs found

NON-LINEAR AND SPARSE REPRESENTATIONS FOR MULTI-MODAL RECOGNITION

Author: Nguyen Hien Van
Publication venue
Publication date: 01/01/2013
Field of study

In the first part of this dissertation, we address the problem of representing 2D and 3D shapes. In particular, we introduce a novel implicit shape representation based on Support Vector Machine (SVM) theory. Each shape is represented by an analytic decision function obtained by training an SVM, with a Radial Basis Function (RBF) kernel, so that the interior shape points are given higher values. This empowers support vector shape (SVS) with multifold advantages. First, the representation uses a sparse subset of feature points determined by the support vectors, which significantly improves the discriminative power against noise, fragmentation and other artifacts that often come with the data. Second, the use of the RBF kernel provides scale, rotation, and translation invariant features, and allows a shape to be represented accurately regardless of its complexity. Finally, the decision function can be used to select reliable feature points. These features are described using gradients computed from highly consistent decision functions instead of conventional edges. Our experiments on 2D and 3D shapes demonstrate promising results. The availability of inexpensive 3D sensors like Kinect necessitates the design of new representation for this type of data. We present a 3D feature descriptor that represents local topologies within a set of folded concentric rings by distances from local points to a projection plane. This feature, called as Concentric Ring Signature (CORS), possesses similar computational advantages to point signatures yet provides more accurate matches. CORS produces compact and discriminative descriptors, which makes it more robust to noise and occlusions. It is also well-known to computer vision researchers that there is no universal representation that is optimal for all types of data or tasks. Sparsity has proved to be a good criterion for working with natural images. This motivates us to develop efficient sparse and non-linear learning techniques for automatically extracting useful information from visual data. Specifically, we present dictionary learning methods for sparse and redundant representations in a high-dimensional feature space. Using the kernel method, we describe how the well-known dictionary learning approaches such as the method of optimal directions and KSVD can be made non-linear. We analyse their kernel constructions and demonstrate their effectiveness through several experiments on classification problems. It is shown that non-linear dictionary learning approaches can provide significantly better discrimination compared to their linear counterparts and kernel PCA, especially when the data is corrupted by different types of degradations. Visual descriptors are often high dimensional. This results in high computational complexity for sparse learning algorithms. Motivated by this observation, we introduce a novel framework, called sparse embedding (SE), for simultaneous dimensionality reduction and dictionary learning. We formulate an optimization problem for learning a transformation from the original signal domain to a lower-dimensional one in a way that preserves the sparse structure of data. We propose an efficient optimization algorithm and present its non-linear extension based on the kernel methods. One of the key features of our method is that it is computationally efficient as the learning is done in the lower-dimensional space and it discards the irrelevant part of the signal that derails the dictionary learning process. Various experiments show that our method is able to capture the meaningful structure of data and can perform significantly better than many competitive algorithms on signal recovery and object classification tasks. In many practical applications, we are often confronted with the situation where the data that we use to train our models are different from that presented during the testing. In the final part of this dissertation, we present a novel framework for domain adaptation using a sparse and hierarchical network (DASH-N), which makes use of the old data to improve the performance of a system operating on a new domain. Our network jointly learns a hierarchy of features together with transformations that rectify the mismatch between different domains. The building block of DASH-N is the latent sparse representation. It employs a dimensionality reduction step that can prevent the data dimension from increasing too fast as traversing deeper into the hierarchy. Experimental results show that our method consistently outperforms the current state-of-the-art by a significant margin. Moreover, we found that a multi-layer {DASH-N} has an edge over the single-layer DASH-N

Digital Repository at the University of Maryland

Recommended from our members

Stretching Method-Based Operational Modal Analysis of An Old Masonry Lighthouse.

Author: Daskalakis Emmanouil
Kalogeras Ioannis
Melis Nikolaos S
Panagiotopoulos Christos G
Tsogka Chrysoula
Publication venue: eScholarship, University of California
Publication date: 01/08/2019
Field of study

We present in this paper a structural health monitoring study of the Egyptian lighthouse of Rethymnon in Crete, Greece. Using structural vibration data collected on a limited number of sensors during a 3-month period, we illustrate the potential of the stretching method for monitoring variations in the natural frequencies of the structure. The stretching method compares two signals, the current that refers to the actual state of the structure, with the reference one that characterizes the structure at a reference healthy condition. For the structure under study, an 8-day time interval is used for the reference quantity while the current quantity is computed using a time window of 24 h. Our results indicate that frequency shifts of 1% can be detected with high accuracy allowing for early damage assessment. We also provide a simple numerical model that is calibrated to match the natural frequencies estimated using the stretching method. The model is used to produce possible damage scenarios that correspond to 1% shift in the first natural frequencies. Although simple in nature, this model seems to deliver a realistic response of the structure. This is shown by comparing the response at the top of the structure to the actual measurement during a small earthquake. This is a preliminary study indicating the potential of the stretching method for structural health monitoring of historical monuments. The results are very promising. Further analysis is necessary requiring the deployment of the instrumentation (possibly with additional instruments) for a longer period of time

eScholarship - University of California

Remembrance of Odors Past Human Olfactory Cortex in Cross-Modal Recognition Memory

Author: Gottfried Jay A
Smith Adam P.R
Rugg Michael D
Dolan Raymond J
Publication venue: Cell Press.
Publication date: 01/01/2004
Field of study

AbstractEpisodic memory is often imbued with multisensory richness, such that the recall of an event can be endowed with the sights, sounds, and smells of its prior occurrence. While hippocampus and related medial temporal structures are implicated in episodic memory retrieval, the participation of sensory-specific cortex in representing the qualities of an episode is less well established. We combined functional magnetic resonance imaging (fMRI) with a cross-modal paradigm, where objects were presented with odors during memory encoding. We then examined the effect of odor context on neural responses at retrieval when these same objects were presented alone. Primary olfactory (piriform) cortex, as well as anterior hippocampus, was activated during the successful retrieval of old (compared to new) objects. Our findings indicate that sensory features of the original engram are preserved in unimodal olfactory cortex. We suggest that reactivation of memory traces distributed across modality-specific brain areas underpins the sensory qualities of episodic memories

Elsevier - Publisher Connector

Crossref

CGSpace

Cross-Modal Object Recognition Is Viewpoint-Independent

Author: A Amedi
A Pasqualotto
Andrew Peters
D Freides
DI Perrett
EW Bushnell
FN Newell
FN Newell
I Biederman
I Gauthier
JF Norman
JM Reales
Justin Harris
K Grill-Spector
K. Sathian
KM Newell
M Kozhevnikov
M Riesenhuber
M Zhang
MA Heller
MJ Tarr
NK Logothetis
O Blajenkova
P Jolicoeur
RL Klatzky
S Lacey
S Peltier
Simon Lacey
SJ Casey
TW James
TW James
Publication venue: Public Library of Science
Publication date: 11/09/2007
Field of study

BACKGROUND: Previous research suggests that visual and haptic object recognition are viewpoint-dependent both within- and cross-modally. However, this conclusion may not be generally valid as it was reached using objects oriented along their extended y-axis, resulting in differential surface processing in vision and touch. In the present study, we removed this differential by presenting objects along the z-axis, thus making all object surfaces more equally available to vision and touch. METHODOLOGY/PRINCIPAL FINDINGS: Participants studied previously unfamiliar objects, in groups of four, using either vision or touch. Subsequently, they performed a four-alternative forced-choice object identification task with the studied objects presented in both unrotated and rotated (180 degrees about the x-, y-, and z-axes) orientations. Rotation impaired within-modal recognition accuracy in both vision and touch, but not cross-modal recognition accuracy. Within-modally, visual recognition accuracy was reduced by rotation about the x- and y-axes more than the z-axis, whilst haptic recognition was equally affected by rotation about all three axes. Cross-modal (but not within-modal) accuracy correlated with spatial (but not object) imagery scores. CONCLUSIONS/SIGNIFICANCE: The viewpoint-independence of cross-modal object identification points to its mediation by a high-level abstract representation. The correlation between spatial imagery scores and cross-modal performance suggest that construction of this high-level representation is linked to the ability to perform spatial transformations. Within-modal viewpoint-dependence appears to have a different basis in vision than in touch, possibly due to surface occlusion being important in vision but not touch

Crossref

Directory of Open Access Journals

PubMed Central

UCL Discovery

Multisensory Motion Perception in 3\u20134 Month-Old Infants

Author: Ashmead
Baart
Bahrick
Bahrick
Bahrick
Bahrick
Bahrick
Bahrick
Bertenthal
Bremner
Bremner
Csibra
De Hevia
Dolscheid
Filippetti
Gogate
Gogate
Johnson
Kellman
Lewkowicz
Lewkowicz
Lewkowicz
Lewkowicz
Nava
Otsuka
Otsuka
Parise
Pickens
Pitteri
Rochat
Rose
Rusconi
Sann
Schlack
Shepard
Streri
Streri
Streri
Tomalski
Walker
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2017
Field of study

Human infants begin very early in life to take advantage of multisensory information by extracting the invariant amodal information that is conveyed redundantly by multiple senses. Here we addressed the question as to whether infants can bind multisensory moving stimuli, and whether this occurs even if the motion produced by the stimuli is only illusory. Three- to 4-month-old infants were presented with two bimodal pairings: visuo-tactile and audio-visual. Visuo-tactile pairings consisted of apparently vertically moving bars (the Barber Pole illusion) moving in either the same or opposite direction with a concurrent tactile stimulus consisting of strokes given on the infant\u2019s back. Audio-visual pairings consisted of the Barber Pole illusion in its visual and auditory version, the latter giving the impression of a continuous rising or ascending pitch. We found that infants were able to discriminate congruently (same direction) vs. incongruently moving (opposite direction) pairs irrespective of modality (Experiment 1). Importantly, we also found that congruently moving visuo-tactile and audio-visual stimuli were preferred over incongruently moving bimodal stimuli (Experiment 2). Our findings suggest that very young infants are able to extract motion as amodal component and use it to match stimuli that only apparently move in the same direction

Crossref

Archivio istituzionale della ricerca - Università di Padova

Towards Simulating Humans in Augmented Multi-party Interaction

Author: Nijholt Anton
Publication venue
Publication date: 01/01/2005
Field of study

Human-computer interaction requires modeling of the user. A user profile typically contains preferences, interests, characteristics, and interaction behavior. However, in its multimodal interaction with a smart environment the user displays characteristics that show how the user, not necessarily consciously, verbally and nonverbally provides the smart environment with useful input and feedback. Especially in ambient intelligence environments we encounter situations where the environment supports interaction between the environment, smart objects (e.g., mobile robots, smart furniture) and human participants in the environment. Therefore it is useful for the profile to contain a physical representation of the user obtained by multi-modal capturing techniques. We discuss the modeling and simulation of interacting participants in the European AMI research project

University of Twente Research Information

Face Recognition using 3D Facial Shape and Color Map Information: Comparison and Combination

Author: Godil Afzal
Grother Patrick
Ressler Sandy
Publication venue: 'SPIE-Intl Soc Optical Eng'
Publication date: 13/05/2011
Field of study

In this paper, we investigate the use of 3D surface geometry for face recognition and compare it to one based on color map information. The 3D surface and color map data are from the CAESAR anthropometric database. We find that the recognition performance is not very different between 3D surface and color map information using a principal component analysis algorithm. We also discuss the different techniques for the combination of the 3D surface and color map information for multi-modal recognition by using different fusion approaches and show that there is significant improvement in results. The effectiveness of various techniques is compared and evaluated on a dataset with 200 subjects in two different positions.Comment: Proceedings of SPIE Vol. 5404 Biometric Technology for Human Identification, Anil K. Jain; Nalini K. Ratha, Editors, pp.351-361, ISBN: 9780819453273 Date: 25 August 200

arXiv.org e-Print Archive

Crossref