585 research outputs found

    Sequential decision making in artificial musical intelligence

    Get PDF
    Over the past 60 years, artificial intelligence has grown from a largely academic field of research to a ubiquitous array of tools and approaches used in everyday technology. Despite its many recent successes and growing prevalence, certain meaningful facets of computational intelligence have not been as thoroughly explored. Such additional facets cover a wide array of complex mental tasks which humans carry out easily, yet are difficult for computers to mimic. A prime example of a domain in which human intelligence thrives, but machine understanding is still fairly limited, is music. Over the last decade, many researchers have applied computational tools to carry out tasks such as genre identification, music summarization, music database querying, and melodic segmentation. While these are all useful algorithmic solutions, we are still a long way from constructing complete music agents, able to mimic (at least partially) the complexity with which humans approach music. One key aspect which hasn't been sufficiently studied is that of sequential decision making in musical intelligence. This thesis strives to answer the following question: Can a sequential decision making perspective guide us in the creation of better music agents, and social agents in general? And if so, how? More specifically, this thesis focuses on two aspects of musical intelligence: music recommendation and human-agent (and more generally agent-agent) interaction in the context of music. The key contributions of this thesis are the design of better music playlist recommendation algorithms; the design of algorithms for tracking user preferences over time; new approaches for modeling people's behavior in situations that involve music; and the design of agents capable of meaningful interaction with humans and other agents in a setting where music plays a roll (either directly or indirectly). Though motivated primarily by music-related tasks, and focusing largely on people's musical preferences, this thesis also establishes that insights from music-specific case studies can also be applicable in other concrete social domains, such as different types of content recommendation. Showing the generality of insights from musical data in other contexts serves as evidence for the utility of music domains as testbeds for the development of general artificial intelligence techniques. Ultimately, this thesis demonstrates the overall usefulness of taking a sequential decision making approach in settings previously unexplored from this perspectiveComputer Science

    Rádióműsorok elemzése a WordNetAffect érzelmi szótár segítségével

    Get PDF
    A hang alapú tartalom személyre szabásához az elmúlt évek technikai fejlődése, elsősorban az okostelefonok és a mobilinternet elterjedése, megteremtette a technikai hátteret. Ennek megfelelően az lejátszási lista készítés (angol: „playlist generation”) fontos kutatási területté lépett elő. Jelen munka célja a kevert beszéd-zene lejátszási listák készítésének nyelvtechnológiai vizsgálata. Előzetes kutatásunk alapján a hanganyagok szöveges leiratából elsősorban a hangulatnak van jelentősége a lejátszási lista készítésnél. A kérdés vizsgálatához rádióadók mintegy 2500 órányi műsorát vizsgáltuk meg. A felvételekben automatikus beszédfelismerővel a WordNetAffect érzelmi szótár szavait ismertük fel, majd az így kapott adatbázist elemeztük. Jellegzetes mintákat találtunk az érzelmi kategóriák együttes előfordulására és az érzelmek időbeli – heti, napi és óránkénti – változására vonatkozóan is

    Beszéd-zene lejátszási listák nyelvtechnológiai vonatkozása

    Get PDF
    Az internetes és okostelefonos médiafogyasztás lehetővé és szükségessé teszi a tartalom személyre szabását. Hangalapú média esetén ezzel a lejátszási lista (playlist generation) témakör foglalkozik. A korábbi munkák a területen kizárólag a zene alapú lejátszási listákkal foglalkoztak, a beszéd-zene lejátszási listákkal foglalkozó első kutatások is az akusztikai oldalt vizsgálták. Jelen munka, úttörő módon, a beszéd-zene lejátszási listák készítésének nyelvtechnológiai oldalával foglalkozik. Az előzetes vizsgálatok alapján javaslatot tesz a beszéd-zene lejátszási lista készítésének vázára. A nyelvtechnológiai feldolgozásnál különösen a hangulati, érzelmi vonatkozásnak, ezek dalszövegekből, interjúátiratokból és hangzó beszédből való hangulatkinyerésének van jelentősége. Ehhez hangulati szótárakat használunk fel, hangulati szavak dalszövegekben és interjúátiratokban való előfordulását vizsgáljuk. A beszédet tartalmazó hanganyagok esetén a szöveg előállításához automatikus beszédfelismerést is végzünk, kétféle módon: a teljes hanganyag felismerésével, ill. a hangulati szavakra való fókuszálással. Vizsgáljuk, hogy a hangulati szavak előfordulását hogyan változtatja meg a beszédfelismerés korlátozott minősége. A munkát angol nyelvű szótárakkal és BBC anyagokon végeztük

    Screening TED: A rhetorical analysis of the intersections of rhetoric, digital media, and pedagogy

    Get PDF
    The presence of expertise resonates across our daily lives. Experts are called upon to consult us about which candidate is ideal for office, which type of wood is the best choice for a carpentry project, which scientist has optimal data on the effects of air pollution, which speech teacher is the best one to take for proper credit hours, and more. An expert is typically conceived as an individual who knows more about a given topic and can create stronger identification than an average person. The struggle to achieve expert status is one that is fundamentally tied to power and is reliant on the establishment of authenticity and legitimacy from audiences. It is, at its core, a struggle that utilizes rhetoric. Begun in 1984, the TED (Technology, Entertainment, and Design) conference has become a critical player in an architectonic movement to manufacture expertise. Modeled on the Lyceum and Chautauqua movements of the early American 20th century, the TED conferences have spread rapidly into public culture, but most notably in field of education via social media and online video. TED “talks” are classroom artifacts. They are teaching tools and aid in increasing learning for a more digital native student population. Likewise, the TED conferences have become models of community engagement that work rhetorically to demonstrate the attribution and manufacturing of expertise amidst a 21st century digital world. In short, we have acknowledged TED’s growth and expansion as credible and sanctioned their identity as the harbinger of expert and inspirational ideas. The democratization of digital media, particularly video, has made it possible to increase the sharing and collaboration of ideas faster than ever before, and as our world becomes more reliant on digital devices for the receiving and sending of information, the consumption and production of information, and the attribution of expertise, the precise role of technology within pedagogy becomes increasingly complex. My dissertation posits that TED employs current uses of digital media technologies in order to manufacture its ethos of expertise within public culture

    Binaural virtual auditory display for music discovery and recommendation

    Get PDF
    Emerging patterns in audio consumption present renewed opportunity for searching or navigating music via spatial audio interfaces. This thesis examines the potential benefits and considerations for using binaural audio as the sole or principal output interface in a music browsing system. Three areas of enquiry are addressed. Specific advantages and constraints in spatial display of music tracks are explored in preliminary work. A voice-led binaural music discovery prototype is shown to offer a contrasting interactive experience compared to a mono smartspeaker. Results suggest that touch or gestural interaction may be more conducive input modes in the former case. The limit of three binaurally spatialised streams is identified from separate data as a usability threshold for simultaneous presentation of tracks, with no evident advantages derived from visual prompts to aid source discrimination or localisation. The challenge of implementing personalised binaural rendering for end-users of a mobile system is addressed in detail. A custom framework for assessing head-related transfer function (HRTF) selection is applied to data from an approach using 2D rendering on a personal computer. That HRTF selection method is developed to encompass 3D rendering on a mobile device. Evaluation against the same criteria shows encouraging results in reliability, validity, usability and efficiency. Computational analysis of a novel approach for low-cost, real-time, head-tracked binaural rendering demonstrates measurable advantages compared to first order virtual Ambisonics. Further perceptual evaluation establishes working parameters for interactive auditory display use cases. In summation, the renderer and identified tolerances are deployed with a method for synthesised, parametric 3D reverberation (developed through related research) in a final prototype for mobile immersive playlist editing. Task-oriented comparison with a graphical interface reveals high levels of usability and engagement, plus some evidence of enhanced flow state when using the eyes-free binaural system

    A hybrid approach for item collection recommendations : an application to automatic playlist continuation

    Get PDF
    Current recommender systems aim mainly to generate accurate item recommendations, without properly evaluating the multiple dimensions of the recommendation problem. However, in many domains, like in music, where items are rarely consumed in isolation, users would rather need a set of items, designed to work well together, while having some cognitive properties as a whole, related to their perception of quality and satisfaction. In this thesis, a hybrid case-based recommendation approach for item collections is proposed. In particular, an application to automatic playlist continuation, addressing similar cognitive concepts, rather than similar users, is presented. Playlists, that are sets of music items designed to be consumed as a sequence, with a specific purpose and within a specific context, are treated as cases. The proposed recommender system is based on a meta-level hybridization. First, Latent Dirichlet Allocation is applied to the set of past playlists, described as distributions over music styles, to identify their underlying concepts. Then, for a started playlist, its semantic characteristics, like its latent concept and the styles of the included items, are inferred, and Case-Based Reasoning is applied to the set of past playlists addressing the same concept, to construct and recommend a relevant playlist continuation. A graph-based item model is used to overcome the semantic gap between songs’ signal-based descriptions and users’ high-level preferences, efficiently capture the playlists’ structures and the similarity of the music items in those. As the proposed method bases its reasoning on previous playlists, it does not require the construction of complex user profiles to generate accurate recommendations. Furthermore, apart from relevance, support to parameters beyond accuracy, like increased coherence or support to diverse items is provided to deliver a more complete user experience. Experiments on real music datasets have revealed improved results, compared to other state of the art techniques, while achieving a “good trade-off” between recommendations’ relevance, diversity and coherence. Finally, although actually focusing on playlist continuations, the designed approach could be easily adapted to serve other recommendation domains with similar characteristics.Los sistemas de recomendación actuales tienen como objetivo principal generar recomendaciones precisas de artículos, sin evaluar propiamente las múltiples dimensiones del problema de recomendación. Sin embargo, en dominios como la música, donde los artículos rara vez se consumen en forma aislada, los usuarios más bien necesitarían recibir recomendaciones de conjuntos de elementos, diseñados para que se complementaran bien juntos, mientras se cubran algunas propiedades cognitivas, relacionadas con su percepción de calidad y satisfacción. En esta tesis, se propone un sistema híbrido de recomendación meta-nivel, que genera recomendaciones de colecciones de artículos. En particular, el sistema se centra en la generación automática de continuaciones de listas de música, tratando conceptos cognitivos similares, en lugar de usuarios similares. Las listas de reproducción son conjuntos de elementos musicales diseñados para ser consumidos en secuencia, con un propósito específico y dentro de un contexto específico. El sistema propuesto primero aplica el método de Latent Dirichlet Allocation a las listas de reproducción, que se describen como distribuciones sobre estilos musicales, para identificar sus conceptos. Cuando se ha iniciado una nueva lista, se deducen sus características semánticas, como su concepto y los estilos de los elementos incluidos en ella. A continuación, el sistema aplica razonamiento basado en casos, utilizando las listas del mismo concepto, para construir y recomendar una continuación relevante. Se utiliza un grafo que modeliza las relaciones de los elementos, para superar el ?salto semántico? existente entre las descripciones de las canciones, normalmente basadas en características sonoras, y las preferencias de los usuarios, expresadas en características de alto nivel. También se utiliza para calcular la similitud de los elementos musicales y para capturar la estructura de las listas de dichos elementos. Como el método propuesto basa su razonamiento en las listas de reproducción y no en usuarios que las construyeron, no se requiere la construcción de perfiles de usuarios complejos para poder generar recomendaciones precisas. Aparte de la relevancia de las recomendaciones, el sistema tiene en cuenta parámetros más allá de la precisión, como mayor coherencia o soporte a la diversidad de los elementos para enriquecer la experiencia del usuario. Los experimentos realizados en bases de datos reales, han revelado mejores resultados, en comparación con las técnicas utilizadas normalmente. Al mismo tiempo, el algoritmo propuesto logra un "buen equilibrio" entre la relevancia, la diversidad y la coherencia de las recomendaciones generadas. Finalmente, aunque la metodología presentada se centra en la recomendación de continuaciones de listas de reproducción musical, el sistema se puede adaptar fácilmente a otros dominios con características similares.Postprint (published version

    Parsing consumption preferences of music streaming audiences

    Get PDF
    As demands for insights on music streaming listeners continue to grow, scientists and industry analysts face the challenge to comprehend a mutated consumption behavior, which demands a renewed approach to listener typologies. This study aims to determine how audience segmentation can be performed in a time-relevant and replicable manner. Thus, it interrogates which parameters best serve as indicators of preferences to ultimately assist in delimiting listener segments. Accordingly, the primary objective of this research is to develop a revised typology that classifies music streaming listeners in the light of the progressive phenomenology of music listening. The hypothesis assumes that this could be solved by positioning listeners – rather than products – at the center of streaming analysis and supplementing sales- with user-centered metrics. The empirical research of this paper was based on grounded theories, enriched by analytical case studies. For this purpose, behavioral and psychological research results were interconnected with market analysis and streaming platform usage data. Analysis of the results demonstrates that a concatenation of multi-dimensional data streams facilitates the derivation of a typology that is applicable to varying audience pools. The findings indicate that for the delimitation of listener types, the motivation, and listening context are essential key constituents. Since these variables demand insights that reach beyond existing metrics, descriptive data points relating to the listening process are subjoined. Ultimately, parameter indexation results in listener profiles that offer novel access points for investigations, which make imperceptible, interdisciplinary correlations tangible. The framework of the typology can be consulted in analytical and creational processes. In this respect, the results of the derived analytical approach contribute to better determine and ultimately satisfy listener preferences.Während die Nachfrage nach Erkenntnissen über Musik-Streaming-Hörer kontinuierlich steigt, stehen Wissenschaftler sowie Industrieanalysten einem geänderten Konsumptions- verhalten gegenüber, das eine überarbeitete Hörertypologie fordert. Die vorliegende Studie erörtert, wie eine Hörersegmentierung auf zeitgemäße und replizierbare Weise umgesetzt werden kann. Demnach beschäftigt sie sich mit der Frage, welche Parameter am besten als Indikatoren für Hörerpräferenzen dienen und wie diese zur Abgrenzung der Publikumsseg- mente beitragen können. Dementsprechend ist es das primäre Ziel dieser Forschung, eine überarbeitete Typologie aufzustellen, die Musik-Streaming-Hörer in Anbetracht der progressiven Erscheinungsform des Musikhörens klassifiziert. Die Hypothese nimmt an, dass dies realisierbar ist, wenn der Hörer – anstelle von Produkten – im Zentrum der Streaming-Analyse steht und absatzzen- trierte durch hörerzentrierte Messungen ergänzt werden. Die empirische Forschung basiert auf systematischen Theorien, untermauert durch analytische Fallbeispiele. Hierfür werden psychologische und verhaltenswissenschaftliche Forschungserkenntnisse mit Marktanalysen und Nutzerdaten von Musikstreaming-Portalen fusioniert. Die Analyse der Ergebnisse verdeutlicht, dass eine Verkettung von multidimensionalen Rohdaten die Erhebung einer Typologie ermöglicht, die auf mehrere Hörergruppen anwend- bar ist. Die Befunde signalisieren, dass die Hörmotivation und der Hörkontext bei der Abgrenzung der Publikumstypen Schlüsselelemente darstellen. Da diese Variablen spezifis- che Kenntnisse fordern, die über vorliegende Kennzahlen hinausgehen, werden deskriptive Datenpunkte über den Hörvorgang ergänzt. Letztlich, resultiert die Indexierung der Pa- rameter in Hörerprofilen, die neue Zugangspunkte für Untersuchungen bieten, die nicht ersichtliche, interdisziplinäre Korrelationen greifbar machen. Das Gerüst der Hörertypologie kann sowohl in Erstellungs- als auch in Analyseprozessen herangezogen werden. Somit tragen die Ergebnisse der entwickelten Analysemethode zum Verständnis und letztlich zur Erfüllung von Hörerpräferenzen bei

    Parsing consumption preferences of music streaming audiences

    Get PDF
    As demands for insights on music streaming listeners continue to grow, scientists and industry analysts face the challenge to comprehend a mutated consumption behavior, which demands a renewed approach to listener typologies. This study aims to determine how audience segmentation can be performed in a time-relevant and replicable manner. Thus, it interrogates which parameters best serve as indicators of preferences to ultimately assist in delimiting listener segments. Accordingly, the primary objective of this research is to develop a revised typology that classifies music streaming listeners in the light of the progressive phenomenology of music listening. The hypothesis assumes that this could be solved by positioning listeners – rather than products – at the center of streaming analysis and supplementing sales- with user-centered metrics. The empirical research of this paper was based on grounded theories, enriched by analytical case studies. For this purpose, behavioral and psychological research results were interconnected with market analysis and streaming platform usage data. Analysis of the results demonstrates that a concatenation of multi-dimensional data streams facilitates the derivation of a typology that is applicable to varying audience pools. The findings indicate that for the delimitation of listener types, the motivation, and listening context are essential key constituents. Since these variables demand insights that reach beyond existing metrics, descriptive data points relating to the listening process are subjoined. Ultimately, parameter indexation results in listener profiles that offer novel access points for investigations, which make imperceptible, interdisciplinary correlations tangible. The framework of the typology can be consulted in analytical and creational processes. In this respect, the results of the derived analytical approach contribute to better determine and ultimately satisfy listener preferences.Während die Nachfrage nach Erkenntnissen über Musik-Streaming-Hörer kontinuierlich steigt, stehen Wissenschaftler sowie Industrieanalysten einem geänderten Konsumptions- verhalten gegenüber, das eine überarbeitete Hörertypologie fordert. Die vorliegende Studie erörtert, wie eine Hörersegmentierung auf zeitgemäße und replizierbare Weise umgesetzt werden kann. Demnach beschäftigt sie sich mit der Frage, welche Parameter am besten als Indikatoren für Hörerpräferenzen dienen und wie diese zur Abgrenzung der Publikumsseg- mente beitragen können. Dementsprechend ist es das primäre Ziel dieser Forschung, eine überarbeitete Typologie aufzustellen, die Musik-Streaming-Hörer in Anbetracht der progressiven Erscheinungsform des Musikhörens klassifiziert. Die Hypothese nimmt an, dass dies realisierbar ist, wenn der Hörer – anstelle von Produkten – im Zentrum der Streaming-Analyse steht und absatzzen- trierte durch hörerzentrierte Messungen ergänzt werden. Die empirische Forschung basiert auf systematischen Theorien, untermauert durch analytische Fallbeispiele. Hierfür werden psychologische und verhaltenswissenschaftliche Forschungserkenntnisse mit Marktanalysen und Nutzerdaten von Musikstreaming-Portalen fusioniert. Die Analyse der Ergebnisse verdeutlicht, dass eine Verkettung von multidimensionalen Rohdaten die Erhebung einer Typologie ermöglicht, die auf mehrere Hörergruppen anwend- bar ist. Die Befunde signalisieren, dass die Hörmotivation und der Hörkontext bei der Abgrenzung der Publikumstypen Schlüsselelemente darstellen. Da diese Variablen spezifis- che Kenntnisse fordern, die über vorliegende Kennzahlen hinausgehen, werden deskriptive Datenpunkte über den Hörvorgang ergänzt. Letztlich, resultiert die Indexierung der Pa- rameter in Hörerprofilen, die neue Zugangspunkte für Untersuchungen bieten, die nicht ersichtliche, interdisziplinäre Korrelationen greifbar machen. Das Gerüst der Hörertypologie kann sowohl in Erstellungs- als auch in Analyseprozessen herangezogen werden. Somit tragen die Ergebnisse der entwickelten Analysemethode zum Verständnis und letztlich zur Erfüllung von Hörerpräferenzen bei

    Spatial auditory display for acoustics and music collections

    Get PDF
    PhDThis thesis explores how audio can be better incorporated into how people access information and does so by developing approaches for creating three-dimensional audio environments with low processing demands. This is done by investigating three research questions. Mobile applications have processor and memory requirements that restrict the number of concurrent static or moving sound sources that can be rendered with binaural audio. Is there a more e cient approach that is as perceptually accurate as the traditional method? This thesis concludes that virtual Ambisonics is an ef cient and accurate means to render a binaural auditory display consisting of noise signals placed on the horizontal plane without head tracking. Virtual Ambisonics is then more e cient than convolution of HRTFs if more than two sound sources are concurrently rendered or if movement of the sources or head tracking is implemented. Complex acoustics models require signi cant amounts of memory and processing. If the memory and processor loads for a model are too large for a particular device, that model cannot be interactive in real-time. What steps can be taken to allow a complex room model to be interactive by using less memory and decreasing the computational load? This thesis presents a new reverberation model based on hybrid reverberation which uses a collection of B-format IRs. A new metric for determining the mixing time of a room is developed and interpolation between early re ections is investigated. Though hybrid reverberation typically uses a recursive lter such as a FDN for the late reverberation, an average late reverberation tail is instead synthesised for convolution reverberation. Commercial interfaces for music search and discovery use little aural information even though the information being sought is audio. How can audio be used in interfaces for music search and discovery? This thesis looks at 20 interfaces and determines that several themes emerge from past interfaces. These include using a two or three-dimensional space to explore a music collection, allowing concurrent playback of multiple sources, and tools such as auras to control how much information is presented. A new interface, the amblr, is developed because virtual two-dimensional spaces populated by music have been a common approach, but not yet a perfected one. The amblr is also interpreted as an art installation which was visited by approximately 1000 people over 5 days. The installation maps the virtual space created by the amblr to a physical space
    • …
    corecore