8 research outputs found

    Approach for Retrieving and Mining Video Clips

    Get PDF
    Multimedia (include video, images, audio and text media) is characterized by its high dimensionality, which makes information retrieval and data mining even more challenging. This research proposesa method to build an indexes database for huge collection of video clips, to make the video retrieval and mining much more efficient and perfectthat by considering similarity in both text of sound and features of frames. The proposed method has the following steps: First, isolates video motion from sound in the video clips. Second, converts the sound to text and index the result with database. Third converts video motion to shots, then select the master frame for each one and extracts the feature vector for them such as color, texture, shape and others and finally index the result with database. Fourth,combines the two resulted indexed database (Second and Third steps)into one database and make it the final and standard for both retrieval and mining

    Comparación de métodos para reducir el ruido en señales emitidas por delfines Tursiops truncatus.

    Get PDF
    La constante búsqueda de metodologías de filtrado y procesamiento de señales biológicas como las emitidas por delfines Tursiops truncatus, se debe a que cada vez es de mayor interés estudiar las especies en peligro de extinción o con la intensión de conocer y entender su comportamiento. Aplicaciones como el censado de la especie, identificación de individuos aislados y reconocimiento de patrones en la comunicación, son solo algunas de las posibles investigaciones que podrían utilizar como base un procesamiento previo basado en el filtrado de las señales mencionadas. En el presente trabajo se presentan la implementación y comparación de dos metodologías, empleadas para extraer los contornos generados por los whistles de delfines Tursiops truncatus en un espectrograma. Las metodologías empleadas son: filtro de partículas y detección de bordes basado en umbralización. El filtro de partículas utilizado, es el “Secuencial Importance Resampling (SIR)”, el cual simplifica la selección del modelo a emplear y remuestrea en algunas iteraciones para descartar bajas probabilidades de detección de la señal. Para el caso del detector de bordes basado en umbralización, la implementación se basa en el espectrograma de la señal, tratando la señal como una imagen y haciendo uso del Toolbox de Matlab para el procesamiento de imágenes. Se utiliza un suavizado Gaussiano y la umbralización de la imagen seccionada, para la detección del contorno generado por los whistles. La comparación de los resultados obtenidos por ambos métodos, se basa en la Relación Señal a Ruido (SNR) obtenida en cada caso

    Audio-coupled video content understanding of unconstrained video sequences

    Get PDF
    Unconstrained video understanding is a difficult task. The main aim of this thesis is to recognise the nature of objects, activities and environment in a given video clip using both audio and video information. Traditionally, audio and video information has not been applied together for solving such complex task, and for the first time we propose, develop, implement and test a new framework of multi-modal (audio and video) data analysis for context understanding and labelling of unconstrained videos. The framework relies on feature selection techniques and introduces a novel algorithm (PCFS) that is faster than the well-established SFFS algorithm. We use the framework for studying the benefits of combining audio and video information in a number of different problems. We begin by developing two independent content recognition modules. The first one is based on image sequence analysis alone, and uses a range of colour, shape, texture and statistical features from image regions with a trained classifier to recognise the identity of objects, activities and environment present. The second module uses audio information only, and recognises activities and environment. Both of these approaches are preceded by detailed pre-processing to ensure that correct video segments containing both audio and video content are present, and that the developed system can be made robust to changes in camera movement, illumination, random object behaviour etc. For both audio and video analysis, we use a hierarchical approach of multi-stage classification such that difficult classification tasks can be decomposed into simpler and smaller tasks. When combining both modalities, we compare fusion techniques at different levels of integration and propose a novel algorithm that combines advantages of both feature and decision-level fusion. The analysis is evaluated on a large amount of test data comprising unconstrained videos collected for this work. We finally, propose a decision correction algorithm which shows that further steps towards combining multi-modal classification information effectively with semantic knowledge generates the best possible results

    Audio-coupled video content understanding of unconstrained video sequences

    Get PDF
    Unconstrained video understanding is a difficult task. The main aim of this thesis is to recognise the nature of objects, activities and environment in a given video clip using both audio and video information. Traditionally, audio and video information has not been applied together for solving such complex task, and for the first time we propose, develop, implement and test a new framework of multi-modal (audio and video) data analysis for context understanding and labelling of unconstrained videos. The framework relies on feature selection techniques and introduces a novel algorithm (PCFS) that is faster than the well-established SFFS algorithm. We use the framework for studying the benefits of combining audio and video information in a number of different problems. We begin by developing two independent content recognition modules. The first one is based on image sequence analysis alone, and uses a range of colour, shape, texture and statistical features from image regions with a trained classifier to recognise the identity of objects, activities and environment present. The second module uses audio information only, and recognises activities and environment. Both of these approaches are preceded by detailed pre-processing to ensure that correct video segments containing both audio and video content are present, and that the developed system can be made robust to changes in camera movement, illumination, random object behaviour etc. For both audio and video analysis, we use a hierarchical approach of multi-stage classification such that difficult classification tasks can be decomposed into simpler and smaller tasks. When combining both modalities, we compare fusion techniques at different levels of integration and propose a novel algorithm that combines advantages of both feature and decision-level fusion. The analysis is evaluated on a large amount of test data comprising unconstrained videos collected for this work. We finally, propose a decision correction algorithm which shows that further steps towards combining multi-modal classification information effectively with semantic knowledge generates the best possible results.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Music-listening systems

    Get PDF
    Thesis (Ph.D.)--Massachusetts Institute of Technology, Dept. of Architecture, 2000.Includes bibliographical references (p. [235]-248).When human listeners are confronted with musical sounds, they rapidly and automatically orient themselves in the music. Even musically untrained listeners have an exceptional ability to make rapid judgments about music from very short examples, such as determining the music's style, performer, beat, complexity, and emotional impact. However, there are presently no theories of music perception that can explain this behavior, and it has proven very difficult to build computer music-analysis tools with similar capabilities. This dissertation examines the psychoacoustic origins of the early stages of music listening in humans, using both experimental and computer-modeling approaches. The results of this research enable the construction of automatic machine-listening systems that can make human-like judgments about short musical stimuli. New models are presented that explain the perception of musical tempo, the perceived segmentation of sound scenes into multiple auditory images, and the extraction of musical features from complex musical sounds. These models are implemented as signal-processing and pattern-recognition computer programs, using the principle of understanding without separation. Two experiments with human listeners study the rapid assignment of high-level judgments to musical stimuli, and it is demonstrated that many of the experimental results can be explained with a multiple-regression model on the extracted musical features. From a theoretical standpoint, the thesis shows how theories of music perception can be grounded in a principled way upon psychoacoustic models in a computational-auditory-scene-analysis framework. Further, the perceptual theory presented is more relevant to everyday listeners and situations than are previous cognitive-structuralist approaches to music perception and cognition. From a practical standpoint, the various models form a set of computer signal-processing and pattern-recognition tools that can mimic human perceptual abilities on a variety of musical tasks such as tapping along with the beat, parsing music into sections, making semantic judgments about musical examples, and estimating the similarity of two pieces of music.Eric D. Scheirer.Ph.D

    Speech/music Discriminator Based On Multiple Fundamental Frequencies Estimation [discriminador Voz/música Baseado Na Estimação De Múltiplas Freqüências Fundamentais]

    No full text
    This paper introduces a new technique to discriminate between music and speech. The strategy is based on the concept of multiple fundamental frequencies estimation, which provides the elements for the extraction of three features from the signal. The discrimination between speech and music is obtained by properly combining such features. The reduced number of features, together with the fact that no training phase is necessary, makes this strategy very robust to a wide range of practical conditions. The performance of the technique is analyzed taking into account the precision of the speech/music separation, the robustness face to extreme conditions, and computational effort. A comparison with previous works reveals an excellent performance under all points of view. © Copyright 2010 IEEE - All Rights Reserved.55294300Alatan, A.A., Akansu, A.N., Wolf, W., Multi-modal Dialogue Scene Detection Using Hidden Markov Models for Content-based Multimedia Indexing (2001) Kluwer Acad., Int. Journal on Multimedia Tools and Applications, 14, pp. 137-151Cao, Y., Tavanapong, W., Kim, K., Oh, J., Audio Assisted Scene Segmentation for Story Browsing (2003) Proc. of Int. Conf. on Image and Video Retrieval, Urbana-Champaign, USA, pp. 446-455Chen, L., Rizvi, S., Özsu, M.T., Incorporating Audio Cues into Dialog and Action Scene Extraction Proc. of the 15th Annual Symp. on Electronic Imaging - Storage and Retrieval for Media Databases, Santa Clara, USA, 2003Dimitrova, N., Multimedia Content Analysis and Indexing for Filtering and Retrieval Applications (1999) Special Issue on Multimedia Technologies and Informing Systems, Part I, 2, pp. 87-100Dinh, P.Q., Dorai, C., Venkatesh, S., Video Genre Categorization Using Audio Wavelet Coefficients Proc. of 5th Asian Conference on Computer Vision, Melbourne, Australia, January 2002Li, Y., Ming, W., Kuo, C.C.J., Semantic Video Content Abstraction Based on Multiple Cues Proc. of Int. Conf. on Multimedia and Expo, Tokyo, Japan, August 2001Liu, Z., Huang, J., Wang, Y., Chen, T., Audio Feature Extraction & Analysis for Scene Classification (1997) Proc. of 1997 Workshop on Multimedia Signal Processing, Princeton, pp. 343-348. , JuneMinami, K., Akutsu, A., Hamada, H., Tonomura, Y., Video Handling with Music and Speech Detection (1998) IEEE MultiMedia, 5 (3), pp. 17-25Zhang, T., Kuo, C.-C.J., Audio content analysis for online audiovisual data segmentation and classification (2001) IEEE Transactions on Speech and Audio Processing, 3 (4), pp. 441-457Beierholm, T., Baggenstoss, P.M., Speech Music Discrimination Using Class-Specific Features (2004) Proc. of Int. Conf. on Pattern Recognition, Cambridge, UK, pp. 379-382Carey, M.J., Parris, E.S., Lloyd-Thomas, H., A comparison of features for speech, music discrimination (1999) Proc. of IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Phoenix, USA, pp. 149-152Cho, Y.-C., Choi, S., Bang, S.Y., Non-negative component parts of sound for classification Proc. IEEE Int. Symp. Signal Processing and Information Technology, Darmstadt, Germany, 2003El-Maleh, K., Klein, M., Petrucci, G., Kabal, P., Speech/Music Discrimination for Multimedia Applications (2000) Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing, Istanbul, Turkey, pp. 2445-2448Harb, H., Chen, L., Robust Speech/Music Discrimination Using Spectrum's First Order Statistics and Neural Networks Proc. of the IEEE Int. Symposium on Signal Processing and Its Applications, Paris, France, July 2003Jarina, R., O'Connor, N., Marlow, S., Rhythm Detection for Speech-Music Discrimination in MPEG Compressed Domain (2002) Proc. of the IEEE Int. Conf. on Digital Signal Processing, Santorini, Greece, pp. 129-132Lu, L., Zhang, H.J., Jiang, H., Content Analysis for Audio Classification and Segmentation (2002) IEEE Transactions on Speech and Audio Processing, 10 (7), pp. 504-516Saunders, J., Real-Time Discrimination of Broadcast Speech/Music (1996) Proc. of the IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Atlanta, pp. 993-996Scheirer, E., Slaney, M., Construction and Evaluation of a Robust Multifeature Speech/Music Discriminator (1997) Proc. of the IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Munich, Germany, pp. 1331-1334Wang, P., Cai, R., Yang, S.-Q., A Hybrid Approach to News Video Classification with Multi-modal Features (2003) Proc. of Int. Conf. on Information, Communications & Signal Processing, Singapore, pp. 787-791Williams, G., Ellis, D., Speech/music discrimination based on posterior probability features Proc. of European Conf. on Speech Communication and Technology, Budapest, Hungary, 1999Tolonen, T., Karjalainen, M., A Computationally Efficient Multipitch Analysis Model (2000) IEEE Transactions on Speech and Audio Processing, 8 (6), pp. 708-716Tzanetakis, G., Cook, P., Musical Genre Classification of Audio Signals (2002) IEEE Transactions on Speech and Audio Processing, 10 (5), pp. 293-30
    corecore