7 research outputs found

    [[alternative]]An Efficient Music Information Retrieval System

    Get PDF
    計畫編號:NSC96-2221-E032-049研究期間:200708~200807研究經費:537,000[[abstract]]隨著網際網路的快速發展與電腦使用的普及,數位儲存的音樂媒體快速增加, 網路上的音樂查詢系統的使用也越來越普遍。傳統的音樂查詢系統是以文字搜尋為基 礎,使用者透過網路檢索書目性音樂資料,即以輸入作曲家名字或曲名等相關資料的 方式來搜尋音樂檔案。此外,以音樂內容來查詢資料的方式為近年來發展的趨勢,這 方面的研究,國內外的學術單位已陸續開發出一些應用系統並發表許多相關論文,這 些查詢系統在效能上還有許多不足之處。本計畫擬提出一個更有效率、具彈性的以內 容為基礎之音樂擷取系統。 音樂查詢系統中最關鍵的部份為音樂比對,其直接影響查詢效能。在相關幾何 比對演算法研究中[1][9],查詢樂段 (query melody) 與本文樂段 (reference melody) 的距離定義為兩旋律之間所圍出來的面積最小值;面積最小值愈小,則相似度愈高, 亦表示查詢樂段為本文樂段的一部分的可能性愈大。在比對過程中,查詢樂段在水平 與垂直方向平移,以尋找其與本文樂段能圍出最小面積之平移位置。此方法雖有不錯 的比對結果,但須耗費大量的計算時間。本計畫擬提出一個較有效率的幾何比對演算 法,並建立一個高效能的音樂查詢系統。首先我們提出使用音程 (pitch interval) 幾 何比對來取代絕對音高 (absolute pitch) 幾何比對,如此可以免除查詢樂段垂直方 向的移動,而改善比對之時間效率。其後,在尋找面積最小值步驟中,利用分支與剪 裁 (branch-and-prune) 的機制,進一步加強比對的速度。 接著我們將開發有小的主旋律抽取 (main melody extraction) 技術,此部分的研 究已做出一些成果,在本計畫當中擬提出一個新的方法來擷取旋律中近似重複樂段做 為主旋律集合。任何一個本文旋律依照節拍資訊以小節為單位進行分解,再使用幾何 比對來量測兩個小節之間的距離並建立一個以小節為單位的自相關矩陣 (correlative matrix)。接著將自相關矩陣轉換成一個相等大小的256 灰階圖形,並使用Otsu [13] 二 值化將灰階圖形轉變成黑白圖形,其黑色與白色分別對應兩個小節為相似與不相似之 判斷結果。我們可以在此黑白圖形中偵測平行於主對角線方向的線段以合併相似的重 複小節片段。 我們利用所提之音程幾何比對法,建立出一個有效率、具彈性的音樂查詢系統, 可以在一全曲資料庫或在一主旋律資料庫中查詢,前者搜尋結果較正確而後者搜尋速 度較快,使用者可自行選擇。實驗初步的結果顯示,我們提出的音程幾何比對法,在 以上兩種資料庫上搜尋,都有令人滿意的結果。而我們目前還有一些新的想法,所提 的比對方法也持續地在研究與改進當中。 為了拓展系統之應用性,本計畫擬將之移植到行動裝置 (mobile devices) 上,如 行動電話 (Cellular Phone)、個人數位助理 (Personal Digital Assistant) 等設備上,可做 各種查詢,例如音樂隨選系統 (audio-on-demand)、線上卡拉OK 等,以實現更便捷 的音樂查詢應用。期盼在國科會的支持下,順利完成計畫,讓我們的研究對這個領域 能有所貢獻。[[sponsorship]]行政院國家科學委員

    Progressive Filtering Using Multiresolution Histograms for Query by Humming System

    Full text link
    The rising availability of digital music stipulates effective categorization and retrieval methods. Real world scenarios are characterized by mammoth music collections through pertinent and non-pertinent songs with reference to the user input. The primary goal of the research work is to counter balance the perilous impact of non-relevant songs through Progressive Filtering (PF) for Query by Humming (QBH) system. PF is a technique of problem solving through reduced space. This paper presents the concept of PF and its efficient design based on Multi-Resolution Histograms (MRH) to accomplish searching in manifolds. Initially the entire music database is searched to obtain high recall rate and narrowed search space. Later steps accomplish slow search in the reduced periphery and achieve additional accuracy. Experimentation on large music database using recursive programming substantiates the potential of the method. The outcome of proposed strategy glimpses that MRH effectively locate the patterns. Distances of MRH at lower level are the lower bounds of the distances at higher level, which guarantees evasion of false dismissals during PF. In due course, proposed method helps to strike a balance between efficiency and effectiveness. The system is scalable for large music retrieval systems and also data driven for performance optimization as an added advantage.Comment: 12 Pages, 6 Figures, Full version of the paper published at ICMCCA-2012 with the same title, Link:http://link.springer.com/chapter/10.1007/978-81-322-1143-3_2

    Content-based music retrieval by acoustic query

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    A Geometric Approach to Pattern Matching in Polyphonic Music

    Get PDF
    The music pattern matching problem involves finding matches of a small fragment of music called the "pattern" into a larger body of music called the "score". We represent music as a series of horizontal line segments in the plane, and reformulate the problem as finding the best translation of a small set of horizontal line segments into a larger set of horizontal line segments. We present an efficient algorithm that can handle general weight models that measure the musical quality of a match of the pattern into the score, allowing for approximate pattern matching. We give an algorithm with running time O(nm(d + log m)), where n is the size of the score, m is the size of the pattern, and d is the size of the discrete set of musical pitches used. Our algorithm compares favourably to previous approaches to the music pattern matching problem. We also demonstrate that this geometric formulation of the music pattern matching problem is unlikely to have a significantly faster algorithm since it is at least as hard as 3SUM, a basic problem that is conjectured to have no subquadratic algorithm. Lastly, we present experiments to show how our algorithm can find musically sensible variations of a theme, as well as polyphonic musical patterns in a polyphonic score

    Untersuchungen der rhythmischen Struktur von Sprache unter Alkoholeinfluss

    Get PDF
    This thesis is concerned with the rhythmical structure of speech under the influence of alcohol. All analyses presented are based on the Alcohol Language Corpus, which is a collection of speech uttered by 77 female and 85 male sober and intoxicated speakers. Experimental research was carried out to find robust, automatically extractable features of the speech signal that indicate speaker intoxication. These features included rhythm measures, which reflect the durational variability of vocalic and consonantal elements and are normally used to classify languages into different rhythm classes. The durational variability was found to be greater in the speech of intoxicated individuals than in the speech of sober individuals, which suggests, that speech of intoxicated speakers is more irregular than speech of sober speakers. Another set of features describes the dynamics of the short-time energy function of speech. Therefore different measures are derived from a sequence of energy minima and maxima. The results also reveal a greater irregularity in the speech of intoxicated individuals. A separate investigation about speaking rate included two different measures. One is based on the phonetic segmentation and is an estimate of the number of syllables per second. The other is the mean duration of the time intervals between successive maxima of the short-time energy function of speech. Both measures denote a decreased speaking rate in the speech of intoxicated speakers compared to speech uttered in sober condition. The results of a perception experiment show that a decrease in speaking rate also is an indicator for intoxication in the perception of speech. The last experiment investigates rhythmical features based on the fundamental frequency and energy contours of speech signals. Contours are compared directly with different distance measures (root mean square error, statistical correlation and the Euclidean distance in the spectral space of the contours). They are also compared by parameterization of the contours using Discrete Cosine Transform and the first and second moments of the lower DCT spectrum. A Principal Components Analysis on the contour data was also carried out to find fundamental contour forms regarding the speech of intoxicated and sober individuals. Concerning the distance measures, contours of speech signals uttered by intoxicated speakers differ significantly from contours of speech signals uttered in sober condition. Parameterization of the contours showed that fundamental frequency contours of speech signals uttered by intoxicated speakers consist of faster movements and energy contours of speech signals uttered by intoxicated speakers of slower movements than the respective contours of speech signals uttered in sober condition. Principal Components Analysis did not find any interpretable fundamental contour forms that could help distinguishing contours of speech signals of intoxicated speakers from those of speech uttered in sober condition. All analyses prove that the effects of alcoholic intoxication on different features of speech cannot be generalized but are to a great extent speaker-dependent
    corecore