74 research outputs found

    Music Information Retrieval in Live Coding: A Theoretical Framework

    Get PDF
    The work presented in this article has been partly conducted while the first author was at Georgia Tech from 2015–2017 with the support of the School of Music, the Center for Music Technology and Women in Music Tech at Georgia Tech. Another part of this research has been conducted while the first author was at Queen Mary University of London from 2017–2019 with the support of the AudioCommons project, funded by the European Commission through the Horizon 2020 programme, research and innovation grant 688382. The file attached to this record is the author's final peer reviewed version. The Publisher's final version can be found by following the DOI link.Music information retrieval (MIR) has a great potential in musical live coding because it can help the musician–programmer to make musical decisions based on audio content analysis and explore new sonorities by means of MIR techniques. The use of real-time MIR techniques can be computationally demanding and thus they have been rarely used in live coding; when they have been used, it has been with a focus on low-level feature extraction. This article surveys and discusses the potential of MIR applied to live coding at a higher musical level. We propose a conceptual framework of three categories: (1) audio repurposing, (2) audio rewiring, and (3) audio remixing. We explored the three categories in live performance through an application programming interface library written in SuperCollider, MIRLC. We found that it is still a technical challenge to use high-level features in real time, yet using rhythmic and tonal properties (midlevel features) in combination with text-based information (e.g., tags) helps to achieve a closer perceptual level centered on pitch and rhythm when using MIR in live coding. We discuss challenges and future directions of utilizing MIR approaches in the computer music field

    Real-time audiovisual and interactive applications for desktop and mobile platforms

    Get PDF
    Tese de mestrado integrado. Engenharia Informática e Computação. Universidade do Porto. Faculdade de Engenharia. 201

    Soundscape Generation Using Web Audio Archives

    Get PDF
    Os grandes e crescentes acervos de áudio na web têm transformado a prática do design de som. Neste contexto, sampling -- uma ferramenta essencial do design de som -- mudou de gravações mecânicas para os domínios da cópia e reprodução no computador. A navegação eficaz nos grandes acervos e a recuperação de conteúdo tornaram-se um problema bem identificado em Music Information Retrieval, nomeadamente através da adoção de metodologias baseadas no conteúdo do áudio.Apesar da sua robustez e eficácia, as soluções tecnológicas atuais assentam principalmente em métodos (estatísticos) de processamento de sinal, cuja terminologia atinge um nível de adequação centrada no utilizador.Esta dissertação avança uma nova estratégia orientada semanticamente para navegação e recuperação de conteúdo de áudio, em particular, sons ambientais, a partir de grandes acervos de áudio na web. Por fim, pretendemos simplificar a extração de pedidos definidos pelo utilizador para promover uma geração fluida de paisagens sonoras. No nosso trabalho, os pedidos aos acervos de áudio na web são feitos por dimensões afetivas que se relacionam com estados emocionais (exemplo: baixa ativação e baixa valência) e descrições semânticas das fontes de áudio (exemplo: chuva). Para tal, mapeamos as anotações humanas das dimensões afetivas para descrições espectrais de áudio extraídas do conteúdo do sinal. A extração de novos sons dos acervos da web é feita estipulando um pedido que combina um ponto num plano afetivo bidimensional e tags semânticas. A aplicação protótipo, MScaper, implementa o método no ambiente Ableton Live. A avaliação da nossa pesquisa avaliou a confiabilidade perceptual dos descritores espectrais de áudio na captura de dimensões afetivas e a usabilidade da MScaper. Os resultados mostram que as características espectrais do áudio capturam significativamente as dimensões afetivas e que o MScaper foi entendido pelos os utilizadores experientes como tendo excelente usabilidade.The large and growing archives of audio content on the web have been transforming the sound design practice. In this context, sampling -- a fundamental sound design tool -- has shifted from mechanical recording to the realms of the copying and cutting on the computer. To effectively browse these large archives and retrieve content became a well-identified problem in Music Information Retrieval, namely through the adoption of audio content-based methodologies. Despite its robustness and effectiveness, current technological solutions rely mostly on (statistical) signal processing methods, whose terminology do attain a level of user-centered explanatory adequacy.This dissertation advances a novel semantically-oriented strategy for browsing and retrieving audio content, in particular, environmental sounds, from large web audio archives. Ultimately, we aim to streamline the retrieval of user-defined queries to foster a fluid generation of soundscapes. In our work, querying web audio archives is done by affective dimensions that relate to emotional states (e.g., low arousal and low valence) and semantic audio source descriptions (e.g., rain). To this end, we map human annotations of affective dimensions to spectral audio-content descriptions extracted from the signal content. Retrieving new sounds from web archives is then made by specifying a query which combines a point in a 2-dimensional affective plane and semantic tags. A prototype application, MScaper, implements the method in the Ableton Live environment. An evaluation of our research assesses the perceptual soundness of the spectral audio-content descriptors in capturing affective dimensions and the usability of MScaper. The results show that spectral audio features significantly capture affective dimensions and that MScaper has been perceived by expert-users as having excellent usability

    Workset Creation for Scholarly Analysis: Prototyping Project

    Get PDF
    Scholars rely on library collections to support their scholarship. Out of these collections, scholars select, organize, and refine the worksets that will answer to their particular research objectives. The requirements for those worksets are becoming increasingly sophisticated and complex, both as humanities scholarship has become more interdisciplinary and as it has become more digital. The HathiTrust is a repository that centrally collects image and text representations of library holdings digitized by the Google Books project and other mass-digitization efforts. The HathiTrust's computational infrastructure is being built to support large-scale manipulation and preservation of these representations, but it organizes them according to catalog records that were created to enable users to find books in a building or to make high-level generalizations about duplicate holdings across libraries, etc. These catalog records were never meant to support the granularity of sorting and selection or works that scholars now expect, much less page-level or chapter-level sorting and selection out of a corpus of billions of pages. The ability to slice through a massive corpus consisting of many different library collections, and out of that to construct the precise workset required for a particular scholarly investigation, is the “game changing” potential of the HathiTrust; understanding how to do that is a research problem, and one that is keenly of interest to the HathiTrust Research Center (HTRC), since we believe that scholarship begins with the selection of appropriate resources. Given the unprecedented size and scope of the HathiTrust corpus—in conjunction with the HTRC’s unique computational access to copyrighted materials—we are proposing a project that will engage scholars in designing tools for exploration, location, and analytic grouping of materials so they can routinely conduct computational scholarship at scale, based on meaningful worksets. “Workset Creation for Scholarly Analysis: Prototyping Project” (WCSA) seeks to address three sets of tightly intertwined research questions regarding 1) enriching the metadata in the HathiTrust corpus, 2) augmenting string-based metadata with URIs to leverage discovery and sharing through external services, and 3) formalizing the notion of collections and worksets in the context of the HathiTrust Research Center. Building upon the model of the Open Annotation Collaboration, the HTRC proposes to release an open, competitive Request for Proposals with the intent to fund four prototyping projects that will build tools for enriching and augmenting metadata for the HathiTrust corpus. Concurrently, the HTRC will work closely with the Center for Informatics Research in Science and Scholarship (CIRSS) to develop and instantiate a set of formal data models that will be used to capture and integrate the outputs of the funded prototyping projects with the larger HathiTrust corpus.Andrew W. Mellon Foundation, grant no. 21300666Ope

    Music similarity analysis using the big data framework spark

    Get PDF
    A parameterizable recommender system based on the Big Data processing framework Spark is introduced, which takes multiple tonal properties of music into account and is capable of recommending music based on a user's personal preferences. The implemented system is fully scalable; more songs can be added to the dataset, the cluster size can be increased, and the possibility to add different kinds of audio features and more state-of-the-art similarity measurements is given. This thesis also deals with the extraction of the required audio features in parallel on a computer cluster. The extracted features are then processed by the Spark based recommender system, and song recommendations for a dataset consisting of approximately 114000 songs are retrieved in less than 12 seconds on a 16 node Spark cluster, combining eight different audio feature types and similarity measurements.Ein parametrisierbares Empfehlungssystem, basierend auf dem Big Data Framework Spark, wird präsentiert. Dieses berücksichtigt verschiedene klangliche Eigenschaften der Musik und erstellt Musikempfehlungen basierend auf den persönlichen Vorlieben eines Nutzers. Das implementierte Empfehlungssystem ist voll skalierbar. Mehr Lieder können dem Datensatz hinzugefügt werden, mehr Rechner können in das Computercluster eingebunden werden und die Möglichkeit andere Audiofeatures und aktuellere Ähnlichkeitsmaße hizuzufügen und zu verwenden, ist ebenfalls gegeben. Des Weiteren behandelt die Arbeit die parallele Berechnung der benötigten Audiofeatures auf einem Computercluster. Die Features werden von dem auf Spark basierenden Empfehlungssystem verarbeitet und Empfehlungen für einen Datensatz bestehend aus ca. 114000 Liedern können unter Berücksichtigung von acht verschiedenen Arten von Audiofeatures und Abstandsmaßen innerhalb von zwölf Sekunden auf einem Computercluster mit 16 Knoten berechnet werden

    Visualizing Music Collections Based on Metadata: Concepts, User Studies and Design Implications

    Get PDF
    Modern digital music services and applications enable easy access to vast online and local music collections. To differentiate from their competitors, software developers should aim to design novel, interesting, entertaining, and easy-to-use user interfaces (UIs) and interaction methods for accessing the music collections. One potential approach is to replace or complement the textual lists with static, dynamic, adaptive, and/or interactive visualizations of selected musical attributes. A well-designed visualization has the potential to make interaction with a service or an application an entertaining and intuitive experience, and it can also improve the usability and efficiency of the system. This doctoral thesis belongs to the intersection of the fields of human-computer interaction (HCI), music information retrieval (MIR), and information visualization (Infovis). HCI studies the design, implementation and evaluation of interactive computing systems; MIR focuses on the different strategies for helping users seek music or music-related information; and Infovis studies the use of visual representations of abstract data to amplify cognition. The purpose of the thesis is to explore the feasibility of visualizing music collections based on three types of musical metadata: musical genre, tempo, and the release year of the music. More specifically, the research goal is to study which visual variables and structures are best suitable for representing the metadata, and how the visualizations can be used in the design of novel UIs for music player applications, including music recommendation systems. The research takes a user- centered and constructive design-science approach, and covers all the different aspects of interaction design: understanding the users, the prototype design, and the evaluation. The performance of the different visualizations from the user perspective was studied in a series of online surveys with 51-104 (mostly Finnish) participants. In addition to tempo and release year, five different visualization methods (colors, icons, fonts, emoticons and avatars) for representing musical genres were investigated. Based on the results, promising ways to represent tempo include the number of objects, shapes with a varying number of corners, and y-axis location combined with some other visual variable or clear labeling. Promising ways to represent the release year include lightness and the perceived location on the z- or x-axis. In the case of genres, the most successful method was the avatars, which used elements from the other methods and required the most screen estate. In the second part of the thesis, three interactive prototype applications (avatars, potentiometers and a virtual world) focusing on visualizing musical genres were designed and evaluated with 40-41 Finnish participants. While the concepts had great potential for complementing traditional text-based music applications, they were too simple and restricted to replace them in longer-term use. Especially the lack of textual search functionality was seen as a major shortcoming. Based on the results of the thesis, it is possible to design recognizable, acceptable, entertaining, and easy-to-use (especially genre) visualizations with certain limitations. Important factors include, e.g., the used metadata vocabulary (e.g., set of musical genres) and visual variables/structures; preferred music discovery mode; available screen estate; and the target culture of the visualizations
    corecore