13 research outputs found

    Fusion of Multimodal Information in Music Content Analysis

    Get PDF
    Music is often processed through its acoustic realization. This is restrictive in the sense that music is clearly a highly multimodal concept where various types of heterogeneous information can be associated to a given piece of music (a musical score, musicians\u27 gestures, lyrics, user-generated metadata, etc.). This has recently led researchers to apprehend music through its various facets, giving rise to "multimodal music analysis" studies. This article gives a synthetic overview of methods that have been successfully employed in multimodal signal analysis. In particular, their use in music content processing is discussed in more details through five case studies that highlight different multimodal integration techniques. The case studies include an example of cross-modal correlation for music video analysis, an audiovisual drum transcription system, a description of the concept of informed source separation, a discussion of multimodal dance-scene analysis, and an example of user-interactive music analysis. In the light of these case studies, some perspectives of multimodality in music processing are finally suggested

    A Panorama on Multiscale Geometric Representations, Intertwining Spatial, Directional and Frequency Selectivity

    Full text link
    The richness of natural images makes the quest for optimal representations in image processing and computer vision challenging. The latter observation has not prevented the design of image representations, which trade off between efficiency and complexity, while achieving accurate rendering of smooth regions as well as reproducing faithful contours and textures. The most recent ones, proposed in the past decade, share an hybrid heritage highlighting the multiscale and oriented nature of edges and patterns in images. This paper presents a panorama of the aforementioned literature on decompositions in multiscale, multi-orientation bases or dictionaries. They typically exhibit redundancy to improve sparsity in the transformed domain and sometimes its invariance with respect to simple geometric deformations (translation, rotation). Oriented multiscale dictionaries extend traditional wavelet processing and may offer rotation invariance. Highly redundant dictionaries require specific algorithms to simplify the search for an efficient (sparse) representation. We also discuss the extension of multiscale geometric decompositions to non-Euclidean domains such as the sphere or arbitrary meshed surfaces. The etymology of panorama suggests an overview, based on a choice of partially overlapping "pictures". We hope that this paper will contribute to the appreciation and apprehension of a stream of current research directions in image understanding.Comment: 65 pages, 33 figures, 303 reference

    Deliverable D1.2 Visual, text and audio information analysis for hypervideo, first release

    Get PDF
    Enriching videos by offering continuative and related information via, e.g., audiostreams, web pages, as well as other videos, is typically hampered by its demand for massive editorial work. While there exist several automatic and semi-automatic methods that analyze audio/video content, one needs to decide which method offers appropriate information for our intended use-case scenarios. We review the technology options for video analysis that we have access to, and describe which training material we opted for to feed our algorithms. For all methods, we offer extensive qualitative and quantitative results, and give an outlook on the next steps within the project

    Directional edge and texture representations for image processing

    Get PDF
    An efficient representation for natural images is of fundamental importance in image processing and analysis. The commonly used separable transforms such as wavelets axe not best suited for images due to their inability to exploit directional regularities such as edges and oriented textural patterns; while most of the recently proposed directional schemes cannot represent these two types of features in a unified transform. This thesis focuses on the development of directional representations for images which can capture both edges and textures in a multiresolution manner. The thesis first considers the problem of extracting linear features with the multiresolution Fourier transform (MFT). Based on a previous MFT-based linear feature model, the work extends the extraction method into the situation when the image is corrupted by noise. The problem is tackled by the combination of a "Signal+Noise" frequency model, a refinement stage and a robust classification scheme. As a result, the MFT is able to perform linear feature analysis on noisy images on which previous methods failed. A new set of transforms called the multiscale polar cosine transforms (MPCT) are also proposed in order to represent textures. The MPCT can be regarded as real-valued MFT with similar basis functions of oriented sinusoids. It is shown that the transform can represent textural patches more efficiently than the conventional Fourier basis. With a directional best cosine basis, the MPCT packet (MPCPT) is shown to be an efficient representation for edges and textures, despite its high computational burden. The problem of representing edges and textures in a fixed transform with less complexity is then considered. This is achieved by applying a Gaussian frequency filter, which matches the disperson of the magnitude spectrum, on the local MFT coefficients. This is particularly effective in denoising natural images, due to its ability to preserve both types of feature. Further improvements can be made by employing the information given by the linear feature extraction process in the filter's configuration. The denoising results compare favourably against other state-of-the-art directional representations

    Virtual Synaesthesia: Crossmodal Correspondences and Synesthetic Experiences

    Get PDF
    As technology develops to allow for the integration of additional senses into interactive experiences, there is a need to bridge the divide between the real and the virtual in a manner that stimulates the five senses consistently and in harmony with the sensory expectations of the user. Applying the philosophy of a neurological condition known as synaesthesia and crossmodal correspondences, defined as the coupling of the senses, can provide numerous cognitive benefits and offers an insight into which senses are most likely to be ‘bound’ together. This thesis aims to present a design paradigm called ‘virtual synaesthesia’ the goal of the paradigm is to make multisensory experiences more human-orientated by considering how the brain combines senses in both the general population (crossmodal correspondences) and within a select few individuals (natural synaesthesia). Towards this aim, a literature review is conducted covering the related areas of research umbrellaed by the concept of ‘virtual synaesthesia’. Its research areas are natural synaesthesia, crossmodal correspondences, multisensory experiences, and sensory substitution/augmentation. This thesis examines augmenting interactive and multisensory experiences with strong (natural synaesthesia) and weak (crossmodal correspondences) synaesthesia. This thesis answers the following research questions: Is it possible to replicate the underlying cognitive benefits of odour-vision synaesthesia? Do people have consistent correspondences between olfaction and an aggregate of different sensory modalities? What is the nature and origin of these correspondences? And Is it possible to predict the crossmodal correspondences attributed to odours? The benefits of augmenting a human-machine interface using an artificial form of odour-vision synaesthesia are explored to answer these questions. This concept is exemplified by transforming odours transduced using a custom-made electronic nose and transforming an odour's ‘chemical footprint’ into a 2D abstract shape representing the current odour. Electronic noses can transform odours in the vapour phase generating a series of electrical signals that represent the current odour source. Weak synaesthesia (crossmodal correspondences) is then investigated to determine if people have consistent correspondences between odours and the angularity of shapes, the smoothness of texture, perceived pleasantness, pitch, musical, and emotional dimensions. Following on from this research, the nature and origin of these correspondences were explored using the underlying hedonic (values relating to pleasantness), semantic (knowledge of the identity of the odour) and physicochemical (the physical and chemical characteristics of the odour) dependencies. The final research chapter investigates the possibility of removing the bottleneck of conducting extensive human trials by determining what the crossmodal correspondences towards specific odours are by developing machine learning models to predict the crossmodal perception of odours using their underlying physicochemical features. The work presented in this thesis provides some insight and evidence of the benefits of incorporating the concept ‘virtual synaesthesia’ into human-machine interfaces and research into the methodology embodied by ‘virtual synaesthesia’, namely crossmodal correspondences. Overall, the work presented in this thesis shows potential for augmenting multisensory experiences with more refined capabilities leading to more enriched experiences, better designs, and a more intuitive way to convey information crossmodally

    Multiresolution image models and estimation techniques

    Get PDF

    Supervised and unsupervised segmentation of textured images by efficient multi-level pattern classification

    Get PDF
    This thesis proposes new, efficient methodologies for supervised and unsupervised image segmentation based on texture information. For the supervised case, a technique for pixel classification based on a multi-level strategy that iteratively refines the resulting segmentation is proposed. This strategy utilizes pattern recognition methods based on prototypes (determined by clustering algorithms) and support vector machines. In order to obtain the best performance, an algorithm for automatic parameter selection and methods to reduce the computational cost associated with the segmentation process are also included. For the unsupervised case, the previous methodology is adapted by means of an initial pattern discovery stage, which allows transforming the original unsupervised problem into a supervised one. Several sets of experiments considering a wide variety of images are carried out in order to validate the developed techniques.Esta tesis propone metodologías nuevas y eficientes para segmentar imágenes a partir de información de textura en entornos supervisados y no supervisados. Para el caso supervisado, se propone una técnica basada en una estrategia de clasificación de píxeles multinivel que refina la segmentación resultante de forma iterativa. Dicha estrategia utiliza métodos de reconocimiento de patrones basados en prototipos (determinados mediante algoritmos de agrupamiento) y máquinas de vectores de soporte. Con el objetivo de obtener el mejor rendimiento, se incluyen además un algoritmo para selección automática de parámetros y métodos para reducir el coste computacional asociado al proceso de segmentación. Para el caso no supervisado, se propone una adaptación de la metodología anterior mediante una etapa inicial de descubrimiento de patrones que permite transformar el problema no supervisado en supervisado. Las técnicas desarrolladas en esta tesis se validan mediante diversos experimentos considerando una gran variedad de imágenes

    Violent urban disturbance in England 1980-81

    Get PDF
    This study addresses violent urban disturbances which occurred in England in the early 1980s with particular reference to the Bristol ‘riots’ of April 1980 and the numerous disorders which followed in July 1981. Revisiting two concepts traditionally utilised to explain the spread of collective violence, namely ‘diffusion’ and ‘contagion,’ it argues that the latter offers a more useful model for understanding the above-mentioned events. Diffusion used in this context implies that such disturbances are independent of each other and occur randomly. It is associated with the concept of ‘copycat riots’, which were commonly invoked by the national media as a way of explaining the spread of urban disturbances in July 1981. Contagion by contrast holds that urban disturbances are related to one another and involve a variety of communication processes and rational collective decision-making. This implies that such events can only be fully understood if they are studied in terms of their local dynamics.Providing the first comprehensive macro-historical analysis of the disturbances of July 1981, this thesis utilises a range of quantitative techniques to argue that the temporal and spatial spread of the unrest exhibited patterns of contagion. These mini-waves of disorder located in several conurbations were precipitated by major disturbances in inner-city multi-ethnic areas. This contradicts more conventional explanations which credit the national media as the sole driver of riotous behaviour.The thesis then proceeds to offer a micro analysis of disturbances in Bristol in April 1980, incorporating both qualitative and quantitative techniques. Exploiting previously unexplored primary sources and recently collected oral histories from participants, it establishes detailed narratives of three related disturbances in the city. The anatomy of the individual incidents and local contagious effects are examined using spatial mapping, social network and ethnographic analyses. The results suggest that previously ignored educational, sub-cultural and ethnographic intra- and inter-community linkages were important factors in the spread of the disorders in Bristol.The case studies of the Bristol disorders are then used to illuminate our understanding of the processes at work during the July 1981 disturbances. It is argued that the latter events were essentially characterised by anti-police and anti-racist collective violence, which marked a momentary recomposition of working-class youth across ethnic divides
    corecore