415 research outputs found

    Learning Structured Dictionaries for Image Representation

    Get PDF
    The dictionary approach to signal and image processing has been massively investigated in the last two decades, proving very attractive for a wide range of applications. The effectiveness of dictionary-based methods, however, is strongly influenced by the choice of the set of basis functions. Moreover the structure of the dictionary is of paramount importance regarding efficient implementation and practical applications such as image coding. In this work, an overcomplete code for sparse representation of natural images has been learnt from a set a real-world scenes. Experiments have been carried out using images of different sizes in order to check the influence of this parameter on the learnt bases. The functions found have been organized into a hierarchical structure. We take advantage of this representation of the dictionary, adopting a tree-structured greedy algorithm to build sparse approximations of images. Using this procedure, no a-priori constraint is imposed on the structure of the dictionary, allowing great flexibility in its design and lower computational complexity

    Audiovisual Gestalts

    Get PDF
    This paper presents an algorithm to correlate audio and visual data generated by the same physical phenomenon. According to psychophysical experiments, temporal synchrony strongly contributes to integrate cross-modal information in humans. Thus, we define meaningful audiovisual structures as temporally proximal audio-video events. Audio and video signals are represented as sparse decompositions over redundant dictionaries of functions. In this way, it is possible to define perceptually meaningful audiovisual events. The detection of these cross-modal structures is done using a simple rule called Helmholtz principle. Experimental results show that extracting significant synchronous audiovisual events, we can detect the existing cross-modalcorrelation between those signals even in presence of distracting motion and acoustic noise. These results confirm that temporal proximity between audiovisual events is a key ingredient for the integration of information across modalities and that it can be effectively exploited for the design of multi-modal analysis algorithms

    Image compression with learnt tree-structured dictionaries

    Get PDF
    In the present paper we propose a new framework for the construction of meaningful dictionaries for sparse representation of signals. The dictionary approach to coding and compression proves very attractive since decomposing a signal over a redundant set of basis functions allows a parsimonious representation of information. This interest is witnessed by numerous research efforts that have been done in the last years to develop efficient algorithm for the decomposition of signals over redundant sets of functions. However, the effectiveness of such methods strongly depends on the dictionary and on its structure. In this work, we develop a method to learn overcomplete sets of functions from real-world signals. This technique allows the design of dictionaries that can be adapted to a specific class of signals. The found functions are stored in a tree structure. This data structure is used by a Tree-Based Pursuit algorithm to generate sparse approximations of natural signals. Finally, the proposed method is considered in the context of image compression. Results show that the learning Tree-Based approach outperforms state-of-the-art coding technique

    Blind Audio-Visual Source Separation Using Sparse Redundant Representations

    Get PDF
    This report presents a new method to confront the Blind Audio Source Separation (BASS) problem, by means of audio and visual information. In a given mixture, we are able to locate the video sources first and, posteriorly, recover each source signal, only with one microphone and the associated video. The proposed model is based on the Matching Pursuit (MP) [18] decomposition of both audio and video signals into meaningful structures. Frequency components are extracted from the soundtrack, with the consequent information about energy content in the time-frequency plane of a sound. Moreover, the MP decomposition of the audio is robust in front of noise, because of its plain characteristic in this plane. Concerning the video, the temporal displacement of geometric features means movement in the image. If temporally close to an audio event, this feature points out the video structure which has generated this sound. The method we present links audio and visual structures (atoms) according to their temporal proximity, building audiovisual relationships. Video sources are identified and located in the image exploiting these connections, using a clustering algorithm that rewards video features most frequently related to audio in the whole sequence. The goal of BASS is also achieved considering the audiovisual relationships. First, the video structures close to a source are classified as belonging to it. Then, our method assigns the audio atoms according to the source of the video features related. At this point, the separation performed with the audio reconstruction is still limited, with problems when sources are active exactly at the same time. This procedure allows us to discover temporal periods of activity of each source. However, with a temporal analysis alone it is not possible to separate audio features of different sources precisely synchronous. The goal, now, is to learn the sources frequency behavior when only each one of them is active to predict those moments when they overlap. Applying a simple frequency association, results improve considerably with separated soundtracks of a better audible quality. In this report, we will analyze in depth all the steps of the proposed approach, remarking the motivation of each one of them

    Analysis of Multimodal Sequences Using Geometric Video Representations

    Get PDF
    This paper presents a novel method to correlate audio and visual data generated by the same physical phenomenon, based on sparse geometric representation of video sequences. The video signal is modeled as a sum of geometric primitives evolving through time, that jointly describe the geometric and motion content of the scene. The displacement through time of relevant visual features, like the mouth of a speaker, can thus be compared with the evolution of an audio feature to assess the correspondence between acoustic and visual signals. Experiments show that the proposed approach allows to detect and track the speaker's mouth when several persons are present on the scene, in presence of distracting motion, and without prior face or mouth detection

    Multimodal Analysis Using Redundant Parametric Decompositions

    Get PDF
    In this work we explore the potentialities of a representational framework based on Matching Pursuit (MP) for the decomposition of audio-visual signals over redundant dictionaries. It is relatively easy for a human to correctly interpret a scene consisting on a combination of acoustic and visual stimuli and to take profit of both the information to experience a richer perception of the world. On the contrary, computer systems have considerable difficulties when having to deal with multimodal signals, and the information that each component contains about the others is usually discarded. This is basically due to the complexity of the dependencies that exist between audio and video signals and to the signals representations that are considered when attempting to mix them in multimodal fusion systems. Redundant decompositions describe audio-visual sequences in an extremely concise fashion, preserving good representational properties thanks to the use of redundant, well designed, dictionaries. This allows us to overcome two typical problems of multimodal fusion algorithms, that are the high dimensionality of the considered signals and the limitations of classical representation techniques, like pixel-based measures (for the video) or Fourier-like transforms (for the audio), that take into account only marginally the physics of the problem. The experimental results we obtain by making use of MP decompositions over redundant codebooks are encouraging and make us believe that such a research direction would allow to open a new way through multimodal signal representation

    Tracking Atoms with Particles

    Get PDF
    We present a general framework and an efficient algorithm for tracking relevant video structures. The structures to be tracked are implicitly defined by a Matching Pursuit procedure that extracts and ranks the most important image contours. Based on the ranking, the contours are automatically selected to initialize a Particle Filtering tracker. The proposed algorithm deals with salient video entities whose behavior has an intuitive meaning, related to the physics of the signal. Moreover, as the interactions between such structures are easily defined, the inference of higher level signal configurations can be made intuitive. The proposed algorithm improves the performance of existing video structures trackers, while reducing the computational complexity. The algorithm is demonstrated on audiovisual source localization

    Gene-specific inhibition of breast carcinoma in BALB-neuT mice by active immunization with rat Neu or human ErbB receptors

    Get PDF
    Employing the transgenic BALB-neuT mouse tumor model, we explored the in vivo biologic relevance of immunocompetent epitopes shared among the four ErbB receptors. The outcome of neu-mediated tumorigenesis was compared following vaccination with isogeneic normal rat ErbB2/Neu (LTR-Neu) or xenogeneic human ErbB receptors (LTR-EGFR, LTR-ErbB2, LTR-ErbB3 and LTR-ErbB4), each recombinantly expressed in an NIH3T3 murine cell background. Vaccination using rat LTR-Neu at the stage of atypical hyperplasia potently inhibited neu-mediated mammary tumorigenesis. Moreover, all human ErbB receptors specifically interfered with tumor development in BALB-neuT mice. Relative increase in tumor-free survival and reduction in tumor incidence corresponded to structural similarity shared with the etiologic neu oncogene, as rat orthologue LTR-Neu proved most effective followed by the human homologue LTR-ErbB2 and the other three human ErbB receptors. Vaccination resulted in high titer specific serum antibodies, whose tumor-inhibitory effect correlated with cross-reactivity to purified rat Neu extracellular domain in vitro. Furthermore, a T cell response specific for peptide epitopes of rat Neu was elicited in spleen cells of mice immunized with LTR-Neu and was remotely detectable for discrete peptides upon vaccination with LTR-ErbB2 and LTR-EGFR. The most pronounced tumor inhibition by LTR-Neu vaccination was associated with leukocyte infiltrate and tumor necrosis in vivo, while immune sera specifically induced cytotoxicity and apoptosis of BALB-neuT tumor cells in vitro. Our findings indicated that targeted inhibition of neu oncogene-mediated mammary carcinogenesis is conditional upon the immunization schedule and discrete immunogenic epitopes shared to a variable extent by different ErbB receptors
    • …
    corecore