2 research outputs found

    Quick Audio Retrieval Using Multiple Feature Vector

    Get PDF
    The types of information are changed text-based into various multimedia data such as speech, image, and moving picture. Therefore, it is necessary to study about searching algorithm. Previous keyword-based retrieval is not optimal for searching the multimedia data. Therefore, the studying is focus on the content-based retrieval (etc. MPEG-7) has been attracted. This thesis concentrated on the content-based retrieval and proposed a quick search method. In the Audio Information Retrieval (AIR) System, it is important to extract feature vectors. Feature extraction is the process of computing a numerical representation that can be used to characterize a segment of audio. In this thesis, we use the features based on the Short Time Fourier Transform (STFT) and the zero-crossing rates. Firstly, Features based on the STFT are very common and have the advantage of fast calculation based on the Fast Fourier Transform algorithm. The STFT features can be classified into the spectral centroid, the spectral roll-off and the spectral flux. In the second place, the zero-crossing features have been used in the previous papers because of reducing the computation. This thesis also proposes a new search using the preprocessing and code matching. The previous papers propose a time-series search method using the upper bound proof. It is assumed that similarity between the test and reference template shows considerable correlation from one time step to the next. Because the search algorithm using the upper bound proof computes upper bound on the similarity measures, this method can make possible the quick search. However the search speed of a time-series search method is very low at real time. Therefore this thesis proposes a method using the preprocessing to make up for this defect. Furthermore, we use the code matching method to reduce the matching rates. This thesis is organized as follows : Section 2 overviews the previous time-series search algorithm. Section 3 explains the core part of our new algorithm and the new optimal combination of multiple features. Section 4 evaluates the accuracy and speed of the algorithm using multiple features. Finally Section 5 gives conclusions and future works.์ œ 1 ์žฅ ์„œ ๋ก  1 ์ œ 2 ์žฅ ์˜ค๋””์˜ค ๊ฒ€์ƒ‰ ๊ณผ์ • 5 2.1 ํŠน์ง• ๋ฒกํ„ฐ ์ถ”์ถœ 6 2.1.1 Zero Crossing Rate (ZCR) 7 2.1.2 STFT์— ๊ธฐ๋ฐ˜์„ ๋‘” ํŠน์ง• ๋ฒกํ„ฐ 8 2.2 ํžˆ์Šคํ† ๊ทธ๋žจ ๋ชจ๋ธ๋ง๊ณผ ์œ ์‚ฌ๋„ ์ธก์ • 11 2.3 window skipping 11 2.4 ์‹œ๊ฐ„ ์ˆœ์„œ ์˜ค๋””์˜ค ๊ฒ€์ƒ‰์˜ ๋‹จ์  15 ์ œ 3 ์žฅ ๋‹ค์ค‘ ํŠน์ง• ๋ฒกํ„ฐ๋ฅผ ์ด์šฉํ•œ ๊ณ ์† ์˜ค๋””์˜ค ๊ฒ€์ƒ‰ 16 3.1 ๋‹ค์ค‘ ํŠน์ง• ๋ฒกํ„ฐ ๊ตฌ์„ฑ 18 3.1.1 ๋‹ค์ค‘ ํŠน์ง• ๋ฒกํ„ฐ ์กฐํ•ฉ์˜ ์ •ํ™•๋„ ๋น„๊ต 20 3.1.2 ๋‹ค์ค‘ ํŠน์ง• ๋ฒกํ„ฐ ์กฐํ•ฉ์˜ ์ฒ˜๋ฆฌ ์‹œ๊ฐ„ ๋น„๊ต 25 3.1.3 ๋‹ค์ค‘ ํŠน์ง• ๋ฒกํ„ฐ์˜ ์กฐํ•ฉ 26 3.2 ์œ ์‚ฌ๋„ ์ธก์ • 28 3.2.1 ์—ฌ๋Ÿฌ ๊ฐ€์ง€ ์œ ์‚ฌ๋„ ์ธก์ • ๋ฐฉ๋ฒ• 30 3.3 ์ œ์•ˆํ•œ ๊ณ ์† ์˜ค๋””์˜ค ๊ฒ€์ƒ‰์˜ ์•Œ๊ณ ๋ฆฌ์ฆ˜ 34 ์ œ 4 ์žฅ ์‹คํ—˜ ๊ณผ์ • ๋ฐ ๊ฒฐ๊ณผ 35 4.1 ๊ฒ€์ƒ‰์˜ ์ •ํ™•๋„ 35 4.2 ๊ฒ€์ƒ‰ ์†๋„ 37 ์ œ 5 ์žฅ ๊ฒฐ ๋ก  39 ์ฐธ๊ณ ๋ฌธํ—Œ 4

    Music-listening systems

    Get PDF
    Thesis (Ph.D.)--Massachusetts Institute of Technology, Dept. of Architecture, 2000.Includes bibliographical references (p. [235]-248).When human listeners are confronted with musical sounds, they rapidly and automatically orient themselves in the music. Even musically untrained listeners have an exceptional ability to make rapid judgments about music from very short examples, such as determining the music's style, performer, beat, complexity, and emotional impact. However, there are presently no theories of music perception that can explain this behavior, and it has proven very difficult to build computer music-analysis tools with similar capabilities. This dissertation examines the psychoacoustic origins of the early stages of music listening in humans, using both experimental and computer-modeling approaches. The results of this research enable the construction of automatic machine-listening systems that can make human-like judgments about short musical stimuli. New models are presented that explain the perception of musical tempo, the perceived segmentation of sound scenes into multiple auditory images, and the extraction of musical features from complex musical sounds. These models are implemented as signal-processing and pattern-recognition computer programs, using the principle of understanding without separation. Two experiments with human listeners study the rapid assignment of high-level judgments to musical stimuli, and it is demonstrated that many of the experimental results can be explained with a multiple-regression model on the extracted musical features. From a theoretical standpoint, the thesis shows how theories of music perception can be grounded in a principled way upon psychoacoustic models in a computational-auditory-scene-analysis framework. Further, the perceptual theory presented is more relevant to everyday listeners and situations than are previous cognitive-structuralist approaches to music perception and cognition. From a practical standpoint, the various models form a set of computer signal-processing and pattern-recognition tools that can mimic human perceptual abilities on a variety of musical tasks such as tapping along with the beat, parsing music into sections, making semantic judgments about musical examples, and estimating the similarity of two pieces of music.Eric D. Scheirer.Ph.D
    corecore