332 research outputs found
Recommended from our members
Sound, Mixtures, and Learning: LabROSA Overview
An overview of the work of the Laboratory for Recognition and Organization of Speech and Audio, Department of Electrical Engineering, Columbia University, including a discussion of graphical models for speech separation
CHORUS Deliverable 2.1: State of the Art on Multimedia Search Engines
Based on the information provided by European projects and national initiatives related to multimedia search as well as domains experts that participated in the CHORUS Think-thanks and workshops, this document reports on the state of the art related to multimedia content search from, a technical, and socio-economic perspective.
The technical perspective includes an up to date view on content based indexing and retrieval technologies, multimedia search in the context of mobile devices and peer-to-peer networks, and an overview of current evaluation and benchmark inititiatives to measure the performance of multimedia search engines.
From a socio-economic perspective we inventorize the impact and legal consequences of these technical advances and point out future directions of research
Audeosynth: music-driven video montage
We introduce music-driven video montage, a media format that offers a pleasant way to browse or summarize video clips collected from various occasions, including gatherings and adventures. In music-driven video montage, the music drives the composition of the video content. According to musical movement and beats, video clips are organized to form a montage that visually reflects the experiential properties of the music. Nonetheless, it takes enormous manual work and artistic expertise to create it. In this paper, we develop a framework for automatically generating music-driven video montages. The input is a set of video clips and a piece of background music. By analyzing the music and video content, our system extracts carefully designed temporal features from the input, and casts the synthesis problem as an optimization and solves the parameters through Markov Chain Monte Carlo sampling. The output is a video montage whose visual activities are cut and synchronized with the rhythm of the music, rendering a symphony of audio-visual resonance.postprin
Clustering by compression
We present a new method for clustering based on compression. The method
doesn't use subject-specific features or background knowledge, and works as
follows: First, we determine a universal similarity distance, the normalized
compression distance or NCD, computed from the lengths of compressed data files
(singly and in pairwise concatenation). Second, we apply a hierarchical
clustering method. The NCD is universal in that it is not restricted to a
specific application area, and works across application area boundaries. A
theoretical precursor, the normalized information distance, co-developed by one
of the authors, is provably optimal but uses the non-computable notion of
Kolmogorov complexity. We propose precise notions of similarity metric, normal
compressor, and show that the NCD based on a normal compressor is a similarity
metric that approximates universality. To extract a hierarchy of clusters from
the distance matrix, we determine a dendrogram (binary tree) by a new quartet
method and a fast heuristic to implement it. The method is implemented and
available as public software, and is robust under choice of different
compressors. To substantiate our claims of universality and robustness, we
report evidence of successful application in areas as diverse as genomics,
virology, languages, literature, music, handwritten digits, astronomy, and
combinations of objects from completely different domains, using statistical,
dictionary, and block sorting compressors. In genomics we presented new
evidence for major questions in Mammalian evolution, based on
whole-mitochondrial genomic analysis: the Eutherian orders and the Marsupionta
hypothesis against the Theria hypothesis.Comment: LaTeX, 27 pages, 20 figure
Combining Metadata, Inferred Similarity of Content, and Human Interpretation for Managing and Listening to Music Collections
Music services, media players and managers provide support for content
classification and access based on filtering metadata values, statistics of access and user
ratings. This approach fails to capture characteristics of mood and personal history that
are often the deciding factors when creating personal playlists and collections in music.
This dissertation work presents MusicWiz, a music management environment that
combines traditional metadata with spatial hypertext-based expression and automatically
extracted characteristics of music to generate personalized associations among songs.
MusicWiz’s similarity inference engine combines the personal expression in the
workspace with assessments of similarity based on the artists, other metadata, lyrics and
the audio signal to make suggestions and to generate playlists. An evaluation of
MusicWiz with and without the workspace and suggestion capabilities showed
significant differences for organizing and playlist creation tasks. The workspace features
were more valuable for organizing tasks, while the suggestion features had more value
for playlist creation activities
Video Abstracting at a Semantical Level
One the most common form of a video abstract is the movie trailer. Contemporary movie trailers share a common structure across genres which allows for an automatic generation and also reflects the corresponding moviea s composition. In this thesis a system for the automatic generation of trailers is presented. In addition to action trailers, the system is able to deal with further genres such as Horror and comedy trailers, which were first manually analyzed in order to identify their basic structures. To simplify the modeling of trailers and the abstract generation itself a new video abstracting application was developed. This application is capable of performing all steps of the abstract generation automatically and allows for previews and manual optimizations. Based on this system, new abstracting models for horror and comedy trailers were created and the corresponding trailers have been automatically generated using the new abstracting models. In an evaluation the automatic trailers were compared to the original Trailers and showed a similar structure. However, the automatically generated trailers still do not exhibit the full perfection of the Hollywood originals as they lack intentional storylines across shots
Audio Signal Processing Using Time-Frequency Approaches: Coding, Classification, Fingerprinting, and Watermarking
Audio signals are information rich nonstationary signals that play an important role in our day-to-day communication, perception of environment, and entertainment. Due to its non-stationary nature, time- or frequency-only approaches are inadequate in analyzing these signals. A joint time-frequency (TF) approach would be a better choice to efficiently process these signals. In this digital era, compression, intelligent indexing for content-based retrieval, classification, and protection of digital audio content are few of the areas that encapsulate a majority of the audio signal processing applications. In this paper, we present a comprehensive array of TF methodologies that successfully address applications in all of the above mentioned areas. A TF-based audio coding scheme with novel psychoacoustics model, music classification, audio classification of environmental sounds, audio fingerprinting, and audio watermarking will be presented to demonstrate the advantages of using time-frequency approaches in analyzing and extracting information from audio signals.</p
Audio content identification
Die Entwicklung und Erforschung von inhaltsbasierenden "Music Information Retrieval (MIR)'' - Anwendungen in den letzten Jahren hat gezeigt, dass die automatische Generierung von Inhaltsbeschreibungen, die eine Identifikation oder Klassifikation von Musik oder Musikteilen ermöglichen, eine bewältigbare Aufgabe darstellt. Aufgrund der großen Massen an verfügbarer digitaler Musik und des enormen Wachstums der entsprechenden Datenbanken, werden Untersuchungen durchgeführt, die eine möglichst automatisierte Ausführung der typischen Managementprozesse von digitaler Musik ermöglichen.
In dieser Arbeit stelle ich eine allgemeine Einführung in das Gebiet des ``Music Information Retrieval'' vor, insbesondere die automatische Identifikation von Audiomaterial und den Vergleich von ähnlichkeitsbasierenden Ansätzen mit reinen inhaltsbasierenden “Fingerprint”-Technologien. Einerseits versuchen Systeme, den menschlichen Hörapparat bzw. die Wahrnehmung und Definition von "Ähnlichkeit'' zu modellieren, um eine Klassifikation in Gruppen von verwandten Musiktiteln und im Weiteren eine Identifikation zu ermöglichen. Andererseits liegt der Fokus auf der Erstellung von Signaturen, die auf eine eindeutige Wiedererkennung abzielen ohne jede Aussage über ähnlich klingende Alternativen. In der Arbeit werden eine Reihe von Tests durchgeführt, die deutlich machen sollen, wie robust, zuverlässig und anpassbar Erkennungssysteme arbeiten sollen, wobei eine möglichst hohe Rate an richtig erkannten Musikstücken angestrebt wird. Dafür werden zwei Algorithmen, Rhythm Patterns, ein ähnlichkeitsbasierter Ansatz, und FDMF, ein frei verfügbarer Fingerprint-Extraktionsalgorithmus mittels 24 durchgeführten Testfällen gegenübergestellt, um die Arbeitsweisen der Verfahren zu vergleichen. Diese Untersuchungen zielen darauf ab, eine möglichst hohe Genauigkeit in der Wiedererkennung zu erreichen. Ähnlichkeitsbasierte Ansätze wie Rhythm Patterns erreichen bei der Identifikation Wiedererkennungsraten bis zu 89.53% und übertreffen in den durchgeführten Testszenarien somit den untersuchten Fingerprint-Ansatz deutlich. Eine sorgfältige Auswahl relevanter Features, die zur Berechnung von Ähnlichkeit herangezogen werden, führen zu äußerst vielversprechenden Ergebnissen sowohl bei variierten Ausschnitten der Musikstücke als auch nach erheblichen Signalveränderungen.The development and research of content-based music information retrieval (MIR) applications in the last years have shown that the generation of descriptions enabling the identification and classification of pieces of musical audio is a challenge that can be coped with. Due to the huge masses of digital music available and the growth of the particular databases, there are investigations of how to automatically perform tasks concerning the management of audio data.
In this thesis I will provide a general introduction of the music information retrieval techniques, especially the identification of audio material and the comparison of similarity-based approaches with content-based fingerprint technology. On the one hand, similarity retrieval systems try to model the human auditory system in various aspects and therewith the model of perceptual similarity. On the other hand there are fingerprints or signatures which try to exactly identify music without any assessment of similarity of sound titles. To figure out the differences and consequences of using these approaches I have performed several experiments that make clear how robust and adaptable an identification system must work. Rhythm Patterns, a similarity based feature extraction scheme and FDMF, a free fingerprint algorithm have been investigated by performing 24 test cases in order to compare the principle behind. This evaluation has also been done focusing on the greatest possible accuracy. It has come out that similarity features like Rhythm Patterns are able to identify audio titles promisingly as well (i.e. up to 89.53 %) in the introduced test scenarios. The proper choice of features enables that music tracks are identified at best when focusing on the highest similarity between the candidates both for varied excerpts and signal modifications
Recommended from our members
Automated synthesis of data extraction and transformation programs
Due to the abundance of data in today’s data-rich world, end-users increasingly need to perform various data extraction and transformation tasks. While many of these tedious tasks can be performed in a programmatic way, most end-users lack the required programming expertise to automate them and end up spending their valuable time in manually performing various data- related tasks. The field of program synthesis aims to overcome this problem by automatically generating programs from informal specifications, such as input-output examples or natural language.
This dissertation focuses on the design and implementation of new systems for automating important classes of data transformation and extraction tasks. It introduces solutions for automating data manipulation tasks on fully- structured data formats like relational tables, or on semi-structured formats such as XML and JSON documents.
First, we describe a novel algorithm for synthesizing hierarchical data transformations from input-output examples. A key novelty of our approach is that it reduces the synthesis of tree transformations to the simpler problem of synthesizing transformations over the paths of the tree. We also describe a new and effective algorithm for learning path transformations that combines logical SMT-based reasoning with machine learning techniques based on decision trees.
Next, we present a new methodology for learning programs that migrate tree-structured documents to relational table representations from input-output examples. Our approach achieves its goal by decomposing the synthesis task to two subproblems of (A) learning the column extraction logic, and (B) learning the row extraction logic. We propose a technique for learning column extraction programs using deterministic finite automata, and a new algorithm for predicate learning which combines integer linear programing and logic minimization.
Finally, we address the problem of automating data extraction tasks from natural language. Specifically, we focus on data retrieval from relational databases and describe a novel approach for learning SQL queries from English descriptions. The method we describe is fully automatic and database-agnostic
(i.e., does not require customization for each database). Our method combines semantic parsing techniques from the NLP community with novel programming languages ideas involving probabilistic type inhabitation and automated sketch repair.Computer Science
- …