236,390 research outputs found

    Feature extraction of musical content for automatic music transcription

    Get PDF
    The purpose of this thesis is to develop new methods for automatic transcription of melody and harmonic parts of real-life music signal. Music transcription is here defined as an act of analyzing a piece of music signal and writing down the parameter representations, which indicate the pitch, onset time and duration of each pitch, loudness and instrument applied in the analyzed music signal. The proposed algorithms and methods aim at resolving two key sub-problems in automatic music transcription: music onset detection and polyphonic pitch estimation. There are three original contributions in this thesis. The first is an original frequency-dependent time-frequency analysis tool called the Resonator Time-Frequency Image (RTFI). By simply defining a parameterized function mapping frequency to the exponent decay factor of the complex resonator filter bank, the RTFI can easily and flexibly implement the time-frequency analysis with different time-frequency resolutions such as ear-like (similar to human ear frequency analyzer), constant-Q or uniform (evenly-spaced) time-frequency resolutions. The corresponding multi-resolution fast implementation of RTFI has also been developed. The second original contribution consists of two new music onset detection algorithms: Energy-based detection algorithm and Pitch-based detection algorithm. The Energy-based detection algorithm performs well on the detection of hard onsets. The Pitch-based detection algorithm is the first one, which successfully exploits the pitch change clue for the onset detection in real polyphonic music, and achieves a much better performance than the other existing detection algorithms for the detection of soft onsets. The third contribution is the development of two new polyphonic pitch estimation methods. They are based on the RTFI analysis. The first proposed estimation method mainly makes best of the harmonic relation and spectral smoothing principle, consequently achieves an excellent performance on the real polyphonic music signals. The second proposed polyphonic pitch estimation method is based on the combination of signal processing and machine learning. The basic idea behind this method is to transform the polyphonic pitch estimation as a pattern recognition problem. The proposed estimation method is mainly composed by a signal processing block followed by a learning machine. Multi-resolution fast RTFI analysis is used as a signal processing component, and support vector machine (SVM) is selected as learning machine. The experimental result of the first approach show clear improvement versus the other state of the art methods

    Machine learning paradigms for modeling spatial and temporal information in multimedia data mining

    Get PDF
    Multimedia data mining and knowledge discovery is a fast emerging interdisciplinary applied research area. There is tremendous potential for effective use of multimedia data mining (MDM) through intelligent analysis. Diverse application areas are increasingly relying on multimedia under-standing systems. Advances in multimedia understanding are related directly to advances in signal processing, computer vision, machine learning, pattern recognition, multimedia databases, and smart sensors. The main mission of this special issue is to identify state-of-the-art machine learning paradigms that are particularly powerful and effective for modeling and combining temporal and spatial media cues such as audio, visual, and face information and for accomplishing tasks of multimedia data mining and knowledge discovery. These models should be able to bridge the gap between low-level audiovisual features which require signal processing and high-level semantics. A number of papers have been submitted to the special issue in the areas of imaging, artificial intelligence; and pattern recognition and five contributions have been selected covering state-of-the-art algorithms and advanced related topics. The first contribution by D. Xiang et al. “Evaluation of data quality and drought monitoring capability of FY-3A MERSI data” describes some basic parameters and major technical indicators of the FY-3A, and evaluates data quality and drought monitoring capability of the Medium-Resolution Imager (MERSI) onboard the FY-3A. The second contribution by A. Belatreche et al. “Computing with biologically inspired neural oscillators: application to color image segmentation” investigates the computing capabilities and potential applications of neural oscillators, a biologically inspired neural model, to gray scale and color image segmentation, an important task in image understanding and object recognition. The major contribution of this paper is the ability to use neural oscillators as a learning scheme for solving real world engineering problems. The third paper by A. Dargazany et al. entitled “Multibandwidth Kernel-based object tracking” explores new methods for object tracking using the mean shift (MS). A bandwidth-handling MS technique is deployed in which the tracker reach the global mode of the density function not requiring a specific staring point. It has been proven via experiments that the Gradual Multibandwidth Mean Shift tracking algorithm can converge faster than the conventional kernel-based object tracking (known as the mean shift). The fourth contribution by S. Alzu’bi et al. entitled “3D medical volume segmentation using hybrid multi-resolution statistical approaches” studies new 3D volume segmentation using multiresolution statistical approaches based on discrete wavelet transform and hidden Markov models. This system commonly reduced the percentage error achieved using the traditional 2D segmentation techniques by several percent. Furthermore, a contribution by G. Cabanes et al. entitled “Unsupervised topographic learning for spatiotemporal data mining” proposes a new unsupervised algorithm, suitable for the analysis of noisy spatiotemporal Radio Frequency Identification (RFID) data. The new unsupervised algorithm depicted in this article is an efficient data mining tool for behavioral studies based on RFID technology. It has the ability to discover and compare stable patterns in a RFID signal, and is appropriate for continuous learning. Finally, we would like to thank all those who helped to make this special issue possible, especially the authors and the reviewers of the articles. Our thanks go to the Hindawi staff and personnel, the journal Manager in bringing about the issue and giving us the opportunity to edit this special issue

    Signal De-noising method based on particle swarm algorithm and Wavelet transform

    Get PDF
    Wavelet analiza je novi alat za analizu odnosa vrijeme-frekvencija, razvijen na temelju Fourierove analize s dobrim svojstvom lokaliziranja vremena i frekvencije i mogućnosti donošenja višestrukih rješenja. Koristi se u cijelom nizu primjena u području obrade signala. U ovom se radu analizira primjena wavelet transforma u filtriranju signala korištenjem poboljšane optimalizacije roja čestica i predlaže inteligentna metoda uklanjanja šuma iz signala zasnovana na wavelet analizi. Metoda koristi Center Based Particle Swarm Algorithm (CBPSO) za izbor optimalnog praga za svaki pod-pojas u različitim mjerilima, inteligentno razaznavajući vrstu šuma iz samog signala, što ne zahtijeva nikakvo prethodno poznavanje šuma. Poboljšani algoritam roja čestica koristi se da potakne optimalni izbor različitih mjerila praga wavelet domena, što je dovelo do uklanjanja šuma iz signala kod različitih tipova pozadinskog šuma, i povećane brzine wavelet transforma i wavelet konstrukcije te ima veću fleksibilnost. Eksperimentalni rezultati su pokazali da se CBPSO algoritmom može postići bolji učinak uklanjanja šuma.Wavelet analysis is a new time-frequency analysis tool developed on the basis of Fourier analysis with good time-frequency localization property and multi-resolution characteristics, which is in a wide range of applications in the field of signal processing. This paper studies the application of wavelet transform in signal filtering, by using an improved particle swarm optimization, proposes an intelligent signal de-noising method based on wavelet analysis. The method uses a Center Based Particle Swarm Algorithm (CBPSO) to select the optimal threshold for each sub-band in different scales, learning the type of noise from the signal itself intelligently, which does not require any prior knowledge of the noise. The improved particle swarm algorithm is used to enhance the optimal choice of the different scales of the wavelet domain threshold, which realized the signal De-noising under different types of noise background, and improved the speed of wavelet transform and wavelet construction, and has greater flexibility. The experimental results showed that CBPSO algorithm can get better De-noising effect

    Time-varying parametric modelling and time-dependent spectral characterisation with applications to EEG signals using multi-wavelets

    Get PDF
    A new time-varying autoregressive (TVAR) modelling approach is proposed for nonstationary signal processing and analysis, with application to EEG data modelling and power spectral estimation. In the new parametric modelling framework, the time-dependent coefficients of the TVAR model are represented using a novel multi-wavelet decomposition scheme. The time-varying modelling problem is then reduced to regression selection and parameter estimation, which can be effectively resolved by using a forward orthogonal regression algorithm. Two examples, one for an artificial signal and another for an EEG signal, are given to show the effectiveness and applicability of the new TVAR modelling method

    A Phase Vocoder based on Nonstationary Gabor Frames

    Full text link
    We propose a new algorithm for time stretching music signals based on the theory of nonstationary Gabor frames (NSGFs). The algorithm extends the techniques of the classical phase vocoder (PV) by incorporating adaptive time-frequency (TF) representations and adaptive phase locking. The adaptive TF representations imply good time resolution for the onsets of attack transients and good frequency resolution for the sinusoidal components. We estimate the phase values only at peak channels and the remaining phases are then locked to the values of the peaks in an adaptive manner. During attack transients we keep the stretch factor equal to one and we propose a new strategy for determining which channels are relevant for reinitializing the corresponding phase values. In contrast to previously published algorithms we use a non-uniform NSGF to obtain a low redundancy of the corresponding TF representation. We show that with just three times as many TF coefficients as signal samples, artifacts such as phasiness and transient smearing can be greatly reduced compared to the classical PV. The proposed algorithm is tested on both synthetic and real world signals and compared with state of the art algorithms in a reproducible manner.Comment: 10 pages, 6 figure

    Advanced Multi-Channel SAR Imaging - Measured Data Demonstration

    Get PDF
    Synthetic Aperture Radar (SAR) is a well-established technique for remote sensing of the Earth. However, conventional SAR systems relying on only a single transmit and receive aperture are not capable of imaging a wide swath with high spatial resolution. Multi-channel SAR concepts, such as systems based on multiple receive apertures in azimuth, promise to overcome these restrictions, thus enabling high-resolution wide-swath imaging. Analysis revealed that these systems imperatively require sophisticated digital processing of the received signals in order to guarantee full performance independently of the spatial sample distribution imposed by the applied pulse repetition frequency (PRF). A suitable algorithm to cope with these challenges of multi-channel data is given by the “multi-channel reconstruction algorithm”, which demonstrated in comprehensive analysis and system design examples its potential for high perform-ance SAR imaging. In this context, various optimization strategies were investigated and aspects of operating multi-channel systems in burst modes such as ScanSAR or TOPS were discussed. Furthermore, a first proof-of-principle showed the algorithm’s applicability to measured multi-channel X-band data gathered by the German Aerospace Cen-ter’s (DLR) airborne F-SAR system. As a next step in the framework of multi-channel azimuth processing, this paper builds on the results recalled above and continues two paths. Firstly, focus is turned to further optimization of the proc-essing algorithm by investigating the classical Space-Time Adaptive Processing (STAP) applied to SAR. Secondly, attention is turned to the analysis of the measured multi-channel data by elaborating the impact and compensation of channel mismatch and by verifying the derived theory
    • …
    corecore