3,362 research outputs found

    A Deep Representation for Invariance And Music Classification

    Get PDF
    Representations in the auditory cortex might be based on mechanisms similar to the visual ventral stream; modules for building invariance to transformations and multiple layers for compositionality and selectivity. In this paper we propose the use of such computational modules for extracting invariant and discriminative audio representations. Building on a theory of invariance in hierarchical architectures, we propose a novel, mid-level representation for acoustical signals, using the empirical distributions of projections on a set of templates and their transformations. Under the assumption that, by construction, this dictionary of templates is composed from similar classes, and samples the orbit of variance-inducing signal transformations (such as shift and scale), the resulting signature is theoretically guaranteed to be unique, invariant to transformations and stable to deformations. Modules of projection and pooling can then constitute layers of deep networks, for learning composite representations. We present the main theoretical and computational aspects of a framework for unsupervised learning of invariant audio representations, empirically evaluated on music genre classification.Comment: 5 pages, CBMM Memo No. 002, (to appear) IEEE 2014 International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2014

    Speaker Normalization Using Cortical Strip Maps: A Neural Model for Steady State vowel Categorization

    Full text link
    Auditory signals of speech are speaker-dependent, but representations of language meaning are speaker-independent. The transformation from speaker-dependent to speaker-independent language representations enables speech to be learned and understood from different speakers. A neural model is presented that performs speaker normalization to generate a pitch-independent representation of speech sounds, while also preserving information about speaker identity. This speaker-invariant representation is categorized into unitized speech items, which input to sequential working memories whose distributed patterns can be categorized, or chunked, into syllable and word representations. The proposed model fits into an emerging model of auditory streaming and speech categorization. The auditory streaming and speaker normalization parts of the model both use multiple strip representations and asymmetric competitive circuits, thereby suggesting that these two circuits arose from similar neural designs. The normalized speech items are rapidly categorized and stably remembered by Adaptive Resonance Theory circuits. Simulations use synthesized steady-state vowels from the Peterson and Barney [J. Acoust. Soc. Am. 24, 175-184 (1952)] vowel database and achieve accuracy rates similar to those achieved by human listeners. These results are compared to behavioral data and other speaker normalization models.National Science Foundation (SBE-0354378); Office of Naval Research (N00014-01-1-0624

    Representing an Object by Interchanging What with Where

    Get PDF
    Exploring representations is a fundamental step towards understanding vision. The visual system carries two types of information along separate pathways: One is about what it is and the other is about where it is. Initially, the what is represented by a pattern of activity that is distributed across millions of photoreceptors, whereas the where is 'implicitly' given as their retinotopic positions. Many computational theories of object recognition rely on such pixel-based representations, but they are insufficient to learn spatial information such as position and size due to the implicit encoding of the where information. 
Here we try transforming a retinal image of an object into its internal image via interchanging the what with the where, which means that patterns of intensity in internal image describe the spatial information rather than the object information. To be concrete, the retinal image of an object is deformed and turned over into a negative image, in which light areas appear dark and vice versa, and the object's spatial information is quantified with levels of intensity on borders of that image. 
Interestingly, the inner part excluding the borders of the internal image shows the position and scale invariance. In order to further understand how the internal image associates the what and where, we examined the internal image of a face which moves or is scaled on the retina. As a result, we found that the internal images form a linear vector space under the object translation and scaling. 
In conclusion, these results show that the what-where interchangeability might play an important role for organizing those two into internal representation of brain

    Speaker Normalization Using Cortical Strip Maps: A Neural Model for Steady State Vowel Identification

    Full text link
    Auditory signals of speech are speaker-dependent, but representations of language meaning are speaker-independent. Such a transformation enables speech to be understood from different speakers. A neural model is presented that performs speaker normalization to generate a pitchindependent representation of speech sounds, while also preserving information about speaker identity. This speaker-invariant representation is categorized into unitized speech items, which input to sequential working memories whose distributed patterns can be categorized, or chunked, into syllable and word representations. The proposed model fits into an emerging model of auditory streaming and speech categorization. The auditory streaming and speaker normalization parts of the model both use multiple strip representations and asymmetric competitive circuits, thereby suggesting that these two circuits arose from similar neural designs. The normalized speech items are rapidly categorized and stably remembered by Adaptive Resonance Theory circuits. Simulations use synthesized steady-state vowels from the Peterson and Barney [J. Acoust. Soc. Am. 24, 175-184 (1952)] vowel database and achieve accuracy rates similar to those achieved by human listeners. These results are compared to behavioral data and other speaker normalization models.National Science Foundation (SBE-0354378); Office of Naval Research (N00014-01-1-0624

    Feature-based time-series analysis

    Full text link
    This work presents an introduction to feature-based time-series analysis. The time series as a data type is first described, along with an overview of the interdisciplinary time-series analysis literature. I then summarize the range of feature-based representations for time series that have been developed to aid interpretable insights into time-series structure. Particular emphasis is given to emerging research that facilitates wide comparison of feature-based representations that allow us to understand the properties of a time-series dataset that make it suited to a particular feature-based representation or analysis algorithm. The future of time-series analysis is likely to embrace approaches that exploit machine learning methods to partially automate human learning to aid understanding of the complex dynamical patterns in the time series we measure from the world.Comment: 28 pages, 9 figure

    Diffeomorphic image registration with applications to deformation modelling between multiple data sets

    Get PDF
    Over last years, the diffeomorphic image registration algorithms have been successfully introduced into the field of the medical image analysis. At the same time, the particular usability of these techniques, in majority derived from the solid mathematical background, has been only quantitatively explored for the limited applications such as longitudinal studies on treatment quality, or diseases progression. The thesis considers the deformable image registration algorithms, seeking out those that maintain the medical correctness of the estimated dense deformation fields in terms of the preservation of the object and its neighbourhood topology, offer the reasonable computational complexity to satisfy time restrictions coming from the potential applications, and are able to cope with low quality data typically encountered in Adaptive Radiotherapy (ART). The research has led to the main emphasis being laid on the diffeomorphic image registration to achieve one-to-one mapping between images. This involves introduction of the log-domain parameterisation of the deformation field by its approximation via a stationary velocity field. A quantitative and qualitative examination of existing and newly proposed algorithms for pairwise deformable image registration presented in this thesis, shows that the log-Euclidean parameterisation can be successfully utilised in the biomedical applications. Although algorithms utilising the log-domain parameterisation have theoretical justification for maintaining diffeomorphism, in general, the deformation fields produced by them have similar properties as these estimated by classical methods. Having this in mind, the best compromise in terms of the quality of the deformation fields has been found for the consistent image registration framework. The experimental results suggest also that the image registration with the symmetrical warping of the input images outperforms the classical approaches, and simultaneously can be easily introduced to most known algorithms. Furthermore, the log-domain implicit group-wise image registration is proposed. By linking the various sets of images related to the different subjects, the proposed image registration approach establishes a common subject space and between-subject correspondences therein. Although the correspondences between groups of images can be found by performing the classic image registration, the reference image selection (not required in the proposed implementation), may lead to a biased mean image being estimated and the corresponding common subject space not adequate to represent the general properties of the data sets. The approaches to diffeomorphic image registration have been also utilised as the principal elements for estimating the movements of the organs in the pelvic area based on the dense deformation field prediction system driven by the partial information coming from the specific type of the measurements parameterised using the implicit surface representation, and recognising facial expressions where the stationary velocity fields are used as the facial expression descriptors. Both applications have been extensively evaluated based on the real representative data sets of three-dimensional volumes and two-dimensional images, and the obtained results indicate the practical usability of the proposed techniques
    • …
    corecore