585 research outputs found
Sequential decision making in artificial musical intelligence
Over the past 60 years, artificial intelligence has grown from a largely academic field of research to a ubiquitous array of tools and approaches used in everyday technology. Despite its many recent successes and growing prevalence, certain meaningful facets of computational intelligence have not been as thoroughly explored. Such additional facets cover a wide array of complex mental tasks which humans carry out easily, yet are difficult for computers to mimic. A prime example of a domain in which human intelligence thrives, but machine understanding is still fairly limited, is music. Over the last decade, many researchers have applied computational tools to carry out tasks such as genre identification, music summarization, music database querying, and melodic segmentation. While these are all useful algorithmic solutions, we are still a long way from constructing complete music agents, able to mimic (at least partially) the complexity with which humans approach music. One key aspect which hasn't been sufficiently studied is that of sequential decision making in musical intelligence. This thesis strives to answer the following question: Can a sequential decision making perspective guide us in the creation of better music agents, and social agents in general? And if so, how? More specifically, this thesis focuses on two aspects of musical intelligence: music recommendation and human-agent (and more generally agent-agent) interaction in the context of music. The key contributions of this thesis are the design of better music playlist recommendation algorithms; the design of algorithms for tracking user preferences over time; new approaches for modeling people's behavior in situations that involve music; and the design of agents capable of meaningful interaction with humans and other agents in a setting where music plays a roll (either directly or indirectly). Though motivated primarily by music-related tasks, and focusing largely on people's musical preferences, this thesis also establishes that insights from music-specific case studies can also be applicable in other concrete social domains, such as different types of content recommendation. Showing the generality of insights from musical data in other contexts serves as evidence for the utility of music domains as testbeds for the development of general artificial intelligence techniques. Ultimately, this thesis demonstrates the overall usefulness of taking a sequential decision making approach in settings previously unexplored from this perspectiveComputer Science
RádiĂłműsorok elemzĂ©se a WordNetAffect Ă©rzelmi szĂłtár segĂtsĂ©gĂ©vel
A hang alapĂş tartalom szemĂ©lyre szabásához az elmĂşlt Ă©vek technikai fejlĹ‘dĂ©se, elsĹ‘sorban az okostelefonok Ă©s a mobilinternet elterjedĂ©se, megteremtette a technikai hátteret. Ennek megfelelĹ‘en az lejátszási lista kĂ©szĂtĂ©s (angol: „playlist generation”) fontos kutatási terĂĽlettĂ© lĂ©pett elĹ‘. Jelen munka cĂ©lja a kevert beszĂ©d-zene lejátszási listák kĂ©szĂtĂ©sĂ©nek nyelvtechnolĂłgiai vizsgálata. ElĹ‘zetes kutatásunk alapján a hanganyagok szöveges leiratábĂłl elsĹ‘sorban a hangulatnak van jelentĹ‘sĂ©ge a lejátszási lista kĂ©szĂtĂ©snĂ©l. A kĂ©rdĂ©s vizsgálatához rádiĂładĂłk mintegy 2500 Ăłrányi műsorát vizsgáltuk meg. A felvĂ©telekben automatikus beszĂ©dfelismerĹ‘vel a WordNetAffect Ă©rzelmi szĂłtár szavait ismertĂĽk fel, majd az Ăgy kapott adatbázist elemeztĂĽk. Jellegzetes mintákat találtunk az Ă©rzelmi kategĂłriák egyĂĽttes elĹ‘fordulására Ă©s az Ă©rzelmek idĹ‘beli – heti, napi Ă©s ĂłránkĂ©nti – változására vonatkozĂłan is
Beszéd-zene lejátszási listák nyelvtechnológiai vonatkozása
Az internetes Ă©s okostelefonos mĂ©diafogyasztás lehetĹ‘vĂ© Ă©s szĂĽksĂ©gessĂ© teszi a tartalom szemĂ©lyre szabását. HangalapĂş mĂ©dia esetĂ©n ezzel a lejátszási lista (playlist generation) tĂ©makör foglalkozik. A korábbi munkák a terĂĽleten kizárĂłlag a zene alapĂş lejátszási listákkal foglalkoztak, a beszĂ©d-zene lejátszási listákkal foglalkozĂł elsĹ‘ kutatások is az akusztikai oldalt vizsgálták. Jelen munka, ĂşttörĹ‘ mĂłdon, a beszĂ©d-zene lejátszási listák kĂ©szĂtĂ©sĂ©nek nyelvtechnolĂłgiai oldalával foglalkozik. Az elĹ‘zetes vizsgálatok alapján javaslatot tesz a beszĂ©d-zene lejátszási lista kĂ©szĂtĂ©sĂ©nek vázára. A nyelvtechnolĂłgiai feldolgozásnál kĂĽlönösen a hangulati, Ă©rzelmi vonatkozásnak, ezek dalszövegekbĹ‘l, interjúátiratokbĂłl Ă©s hangzĂł beszĂ©dbĹ‘l valĂł hangulatkinyerĂ©sĂ©nek van jelentĹ‘sĂ©ge. Ehhez hangulati szĂłtárakat használunk fel, hangulati szavak dalszövegekben Ă©s interjúátiratokban valĂł elĹ‘fordulását vizsgáljuk. A beszĂ©det tartalmazĂł hanganyagok esetĂ©n a szöveg előállĂtásához automatikus beszĂ©dfelismerĂ©st is vĂ©gzĂĽnk, kĂ©tfĂ©le mĂłdon: a teljes hanganyag felismerĂ©sĂ©vel, ill. a hangulati szavakra valĂł fĂłkuszálással. Vizsgáljuk, hogy a hangulati szavak elĹ‘fordulását hogyan változtatja meg a beszĂ©dfelismerĂ©s korlátozott minĹ‘sĂ©ge. A munkát angol nyelvű szĂłtárakkal Ă©s BBC anyagokon vĂ©geztĂĽk
Screening TED: A rhetorical analysis of the intersections of rhetoric, digital media, and pedagogy
The presence of expertise resonates across our daily lives. Experts are called upon to consult us about which candidate is ideal for office, which type of wood is the best choice for a carpentry project, which scientist has optimal data on the effects of air pollution, which speech teacher is the best one to take for proper credit hours, and more. An expert is typically conceived as an individual who knows more about a given topic and can create stronger identification than an average person. The struggle to achieve expert status is one that is fundamentally tied to power and is reliant on the establishment of authenticity and legitimacy from audiences. It is, at its core, a struggle that utilizes rhetoric. Begun in 1984, the TED (Technology, Entertainment, and Design) conference has become a critical player in an architectonic movement to manufacture expertise. Modeled on the Lyceum and Chautauqua movements of the early American 20th century, the TED conferences have spread rapidly into public culture, but most notably in field of education via social media and online video. TED “talks” are classroom artifacts. They are teaching tools and aid in increasing learning for a more digital native student population. Likewise, the TED conferences have become models of community engagement that work rhetorically to demonstrate the attribution and manufacturing of expertise amidst a 21st century digital world. In short, we have acknowledged TED’s growth and expansion as credible and sanctioned their identity as the harbinger of expert and inspirational ideas. The democratization of digital media, particularly video, has made it possible to increase the sharing and collaboration of ideas faster than ever before, and as our world becomes more reliant on digital devices for the receiving and sending of information, the consumption and production of information, and the attribution of expertise, the precise role of technology within pedagogy becomes increasingly complex. My dissertation posits that TED employs current uses of digital media technologies in order to manufacture its ethos of expertise within public culture
Binaural virtual auditory display for music discovery and recommendation
Emerging patterns in audio consumption present renewed opportunity for searching or navigating music via spatial audio interfaces. This thesis examines the potential benefits and considerations for using binaural audio as the sole or principal output interface in a music browsing system. Three areas of enquiry are addressed. Specific advantages and constraints in spatial display of music tracks are explored in preliminary work. A voice-led binaural music discovery prototype is shown to offer a contrasting interactive experience compared to a mono smartspeaker. Results suggest that touch or gestural interaction may be more conducive input modes in the former case. The limit of three binaurally spatialised streams is identified from separate data as a usability threshold for simultaneous presentation of tracks, with no evident advantages derived from visual prompts to aid source discrimination or localisation. The challenge of implementing personalised binaural rendering for end-users of a mobile system is addressed in detail. A custom framework for assessing head-related transfer function (HRTF) selection is applied to data from an approach using 2D rendering on a personal computer. That HRTF selection method is developed to encompass 3D rendering on a mobile device. Evaluation against the same criteria shows encouraging results in reliability, validity, usability and efficiency. Computational analysis of a novel approach for low-cost, real-time, head-tracked binaural rendering demonstrates measurable advantages compared to first order virtual Ambisonics. Further perceptual evaluation establishes working parameters for interactive auditory display use cases. In summation, the renderer and identified tolerances are deployed with a method for synthesised, parametric 3D reverberation (developed through related research) in a final prototype for mobile immersive playlist editing. Task-oriented comparison with a graphical interface reveals high levels of usability and engagement, plus some evidence of enhanced flow state when using the eyes-free binaural system
A hybrid approach for item collection recommendations : an application to automatic playlist continuation
Current recommender systems aim mainly to generate accurate item recommendations, without properly evaluating the multiple dimensions of the recommendation problem. However, in many domains, like in music, where items are rarely consumed in isolation, users would rather need a set of items, designed to work well together, while having some cognitive properties as a whole, related to their perception of quality and satisfaction.
In this thesis, a hybrid case-based recommendation approach for item collections is proposed. In particular, an application to automatic playlist continuation, addressing similar cognitive concepts, rather than similar users, is presented. Playlists, that are sets of music items designed to be consumed as a sequence, with a specific purpose and within a specific context, are treated as cases. The proposed recommender system is based on a meta-level hybridization. First, Latent Dirichlet Allocation is applied to the set of past playlists, described as distributions over music styles, to identify their underlying concepts. Then, for a started playlist, its semantic characteristics, like its latent concept and the styles of the included items, are inferred, and Case-Based Reasoning is applied to the set of past playlists addressing the same concept, to construct and recommend a relevant playlist continuation. A graph-based item model is used to overcome the semantic gap between songs’ signal-based descriptions and users’ high-level preferences, efficiently capture the playlists’ structures and the similarity of the music items in those. As the proposed method bases its reasoning on previous playlists, it does not require the construction of complex user profiles to generate accurate recommendations. Furthermore, apart from relevance, support to parameters beyond accuracy, like increased coherence or support to diverse items is provided to deliver a more complete user experience.
Experiments on real music datasets have revealed improved results, compared to other state of the art techniques, while achieving a “good trade-off” between recommendations’ relevance, diversity and coherence. Finally, although actually focusing on playlist continuations, the designed approach could be easily adapted to serve other recommendation domains with similar characteristics.Los sistemas de recomendaciĂłn actuales tienen como objetivo principal generar recomendaciones precisas de artĂculos, sin evaluar propiamente las mĂşltiples dimensiones del problema de recomendaciĂłn. Sin embargo, en dominios como la mĂşsica, donde los artĂculos rara vez se consumen en forma aislada, los usuarios más bien necesitarĂan recibir recomendaciones de conjuntos de elementos, diseñados para que se complementaran bien juntos, mientras se cubran algunas propiedades cognitivas, relacionadas con su percepciĂłn de calidad y satisfacciĂłn. En esta tesis, se propone un sistema hĂbrido de recomendaciĂłn meta-nivel, que genera recomendaciones de colecciones de artĂculos. En particular, el sistema se centra en la generaciĂłn automática de continuaciones de listas de mĂşsica, tratando conceptos cognitivos similares, en lugar de usuarios similares. Las listas de reproducciĂłn son conjuntos de elementos musicales diseñados para ser consumidos en secuencia, con un propĂłsito especĂfico y dentro de un contexto especĂfico. El sistema propuesto primero aplica el mĂ©todo de Latent Dirichlet Allocation a las listas de reproducciĂłn, que se describen como distribuciones sobre estilos musicales, para identificar sus conceptos. Cuando se ha iniciado una nueva lista, se deducen sus caracterĂsticas semánticas, como su concepto y los estilos de los elementos incluidos en ella. A continuaciĂłn, el sistema aplica razonamiento basado en casos, utilizando las listas del mismo concepto, para construir y recomendar una continuaciĂłn relevante. Se utiliza un grafo que modeliza las relaciones de los elementos, para superar el ?salto semántico? existente entre las descripciones de las canciones, normalmente basadas en caracterĂsticas sonoras, y las preferencias de los usuarios, expresadas en caracterĂsticas de alto nivel. TambiĂ©n se utiliza para calcular la similitud de los elementos musicales y para capturar la estructura de las listas de dichos elementos. Como el mĂ©todo propuesto basa su razonamiento en las listas de reproducciĂłn y no en usuarios que las construyeron, no se requiere la construcciĂłn de perfiles de usuarios complejos para poder generar recomendaciones precisas. Aparte de la relevancia de las recomendaciones, el sistema tiene en cuenta parámetros más allá de la precisiĂłn, como mayor coherencia o soporte a la diversidad de los elementos para enriquecer la experiencia del usuario. Los experimentos realizados en bases de datos reales, han revelado mejores resultados, en comparaciĂłn con las tĂ©cnicas utilizadas normalmente. Al mismo tiempo, el algoritmo propuesto logra un "buen equilibrio" entre la relevancia, la diversidad y la coherencia de las recomendaciones generadas. Finalmente, aunque la metodologĂa presentada se centra en la recomendaciĂłn de continuaciones de listas de reproducciĂłn musical, el sistema se puede adaptar fácilmente a otros dominios con caracterĂsticas similares.Postprint (published version
Parsing consumption preferences of music streaming audiences
As demands for insights on music streaming listeners continue to grow, scientists and industry analysts face the challenge to comprehend a mutated consumption behavior, which demands a renewed approach to listener typologies. This study aims to determine how audience segmentation can be performed in a time-relevant and replicable manner. Thus, it interrogates which parameters best serve as indicators of preferences to ultimately assist in delimiting listener segments.
Accordingly, the primary objective of this research is to develop a revised typology that classifies music streaming listeners in the light of the progressive phenomenology of music listening. The hypothesis assumes that this could be solved by positioning listeners
– rather than products – at the center of streaming analysis and supplementing sales- with user-centered metrics. The empirical research of this paper was based on grounded theories, enriched by analytical case studies. For this purpose, behavioral and psychological research results were interconnected with market analysis and streaming platform usage data.
Analysis of the results demonstrates that a concatenation of multi-dimensional data streams facilitates the derivation of a typology that is applicable to varying audience pools. The findings indicate that for the delimitation of listener types, the motivation, and listening context are essential key constituents. Since these variables demand insights that reach beyond existing metrics, descriptive data points relating to the listening process are subjoined. Ultimately, parameter indexation results in listener profiles that offer novel access points for investigations, which make imperceptible, interdisciplinary correlations tangible. The framework of the typology can be consulted in analytical and creational processes. In this respect, the results of the derived analytical approach contribute to better determine and ultimately satisfy listener preferences.Während die Nachfrage nach Erkenntnissen über Musik-Streaming-Hörer kontinuierlich steigt, stehen Wissenschaftler sowie Industrieanalysten einem geänderten Konsumptions- verhalten gegenüber, das eine überarbeitete Hörertypologie fordert. Die vorliegende Studie erörtert, wie eine Hörersegmentierung auf zeitgemäße und replizierbare Weise umgesetzt werden kann. Demnach beschäftigt sie sich mit der Frage, welche Parameter am besten als Indikatoren für Hörerpräferenzen dienen und wie diese zur Abgrenzung der Publikumsseg- mente beitragen können.
Dementsprechend ist es das primäre Ziel dieser Forschung, eine überarbeitete Typologie aufzustellen, die Musik-Streaming-Hörer in Anbetracht der progressiven Erscheinungsform des Musikhörens klassifiziert. Die Hypothese nimmt an, dass dies realisierbar ist, wenn der Hörer – anstelle von Produkten – im Zentrum der Streaming-Analyse steht und absatzzen- trierte durch hörerzentrierte Messungen ergänzt werden. Die empirische Forschung basiert auf systematischen Theorien, untermauert durch analytische Fallbeispiele. Hierfür werden psychologische und verhaltenswissenschaftliche Forschungserkenntnisse mit Marktanalysen und Nutzerdaten von Musikstreaming-Portalen fusioniert.
Die Analyse der Ergebnisse verdeutlicht, dass eine Verkettung von multidimensionalen Rohdaten die Erhebung einer Typologie ermöglicht, die auf mehrere Hörergruppen anwend- bar ist. Die Befunde signalisieren, dass die Hörmotivation und der Hörkontext bei der Abgrenzung der Publikumstypen Schlüsselelemente darstellen. Da diese Variablen spezifis- che Kenntnisse fordern, die über vorliegende Kennzahlen hinausgehen, werden deskriptive Datenpunkte über den Hörvorgang ergänzt. Letztlich, resultiert die Indexierung der Pa- rameter in Hörerprofilen, die neue Zugangspunkte für Untersuchungen bieten, die nicht ersichtliche, interdisziplinäre Korrelationen greifbar machen. Das Gerüst der Hörertypologie kann sowohl in Erstellungs- als auch in Analyseprozessen herangezogen werden. Somit tragen die Ergebnisse der entwickelten Analysemethode zum Verständnis und letztlich zur Erfüllung von Hörerpräferenzen bei
Parsing consumption preferences of music streaming audiences
As demands for insights on music streaming listeners continue to grow, scientists and industry analysts face the challenge to comprehend a mutated consumption behavior, which demands a renewed approach to listener typologies. This study aims to determine how audience segmentation can be performed in a time-relevant and replicable manner. Thus, it interrogates which parameters best serve as indicators of preferences to ultimately assist in delimiting listener segments.
Accordingly, the primary objective of this research is to develop a revised typology that classifies music streaming listeners in the light of the progressive phenomenology of music listening. The hypothesis assumes that this could be solved by positioning listeners
– rather than products – at the center of streaming analysis and supplementing sales- with user-centered metrics. The empirical research of this paper was based on grounded theories, enriched by analytical case studies. For this purpose, behavioral and psychological research results were interconnected with market analysis and streaming platform usage data.
Analysis of the results demonstrates that a concatenation of multi-dimensional data streams facilitates the derivation of a typology that is applicable to varying audience pools. The findings indicate that for the delimitation of listener types, the motivation, and listening context are essential key constituents. Since these variables demand insights that reach beyond existing metrics, descriptive data points relating to the listening process are subjoined. Ultimately, parameter indexation results in listener profiles that offer novel access points for investigations, which make imperceptible, interdisciplinary correlations tangible. The framework of the typology can be consulted in analytical and creational processes. In this respect, the results of the derived analytical approach contribute to better determine and ultimately satisfy listener preferences.Während die Nachfrage nach Erkenntnissen über Musik-Streaming-Hörer kontinuierlich steigt, stehen Wissenschaftler sowie Industrieanalysten einem geänderten Konsumptions- verhalten gegenüber, das eine überarbeitete Hörertypologie fordert. Die vorliegende Studie erörtert, wie eine Hörersegmentierung auf zeitgemäße und replizierbare Weise umgesetzt werden kann. Demnach beschäftigt sie sich mit der Frage, welche Parameter am besten als Indikatoren für Hörerpräferenzen dienen und wie diese zur Abgrenzung der Publikumsseg- mente beitragen können.
Dementsprechend ist es das primäre Ziel dieser Forschung, eine überarbeitete Typologie aufzustellen, die Musik-Streaming-Hörer in Anbetracht der progressiven Erscheinungsform des Musikhörens klassifiziert. Die Hypothese nimmt an, dass dies realisierbar ist, wenn der Hörer – anstelle von Produkten – im Zentrum der Streaming-Analyse steht und absatzzen- trierte durch hörerzentrierte Messungen ergänzt werden. Die empirische Forschung basiert auf systematischen Theorien, untermauert durch analytische Fallbeispiele. Hierfür werden psychologische und verhaltenswissenschaftliche Forschungserkenntnisse mit Marktanalysen und Nutzerdaten von Musikstreaming-Portalen fusioniert.
Die Analyse der Ergebnisse verdeutlicht, dass eine Verkettung von multidimensionalen Rohdaten die Erhebung einer Typologie ermöglicht, die auf mehrere Hörergruppen anwend- bar ist. Die Befunde signalisieren, dass die Hörmotivation und der Hörkontext bei der Abgrenzung der Publikumstypen Schlüsselelemente darstellen. Da diese Variablen spezifis- che Kenntnisse fordern, die über vorliegende Kennzahlen hinausgehen, werden deskriptive Datenpunkte über den Hörvorgang ergänzt. Letztlich, resultiert die Indexierung der Pa- rameter in Hörerprofilen, die neue Zugangspunkte für Untersuchungen bieten, die nicht ersichtliche, interdisziplinäre Korrelationen greifbar machen. Das Gerüst der Hörertypologie kann sowohl in Erstellungs- als auch in Analyseprozessen herangezogen werden. Somit tragen die Ergebnisse der entwickelten Analysemethode zum Verständnis und letztlich zur Erfüllung von Hörerpräferenzen bei
Spatial auditory display for acoustics and music collections
PhDThis thesis explores how audio can be better incorporated into how people access
information and does so by developing approaches for creating three-dimensional audio
environments with low processing demands. This is done by investigating three research
questions.
Mobile applications have processor and memory requirements that restrict the
number of concurrent static or moving sound sources that can be rendered with binaural
audio. Is there a more e cient approach that is as perceptually accurate as the traditional
method? This thesis concludes that virtual Ambisonics is an ef cient and accurate means
to render a binaural auditory display consisting of noise signals placed on the horizontal
plane without head tracking. Virtual Ambisonics is then more e cient than convolution
of HRTFs if more than two sound sources are concurrently rendered or if movement of
the sources or head tracking is implemented.
Complex acoustics models require signi cant amounts of memory and processing. If
the memory and processor loads for a model are too large for a particular device, that
model cannot be interactive in real-time. What steps can be taken to allow a complex
room model to be interactive by using less memory and decreasing the computational
load? This thesis presents a new reverberation model based on hybrid reverberation
which uses a collection of B-format IRs. A new metric for determining the mixing
time of a room is developed and interpolation between early re
ections is investigated.
Though hybrid reverberation typically uses a recursive lter such as a FDN for the late
reverberation, an average late reverberation tail is instead synthesised for convolution
reverberation.
Commercial interfaces for music search and discovery use little aural information
even though the information being sought is audio. How can audio be used in
interfaces for music search and discovery? This thesis looks at 20 interfaces and
determines that several themes emerge from past interfaces. These include using a two
or three-dimensional space to explore a music collection, allowing concurrent playback of
multiple sources, and tools such as auras to control how much information is presented. A
new interface, the amblr, is developed because virtual two-dimensional spaces populated
by music have been a common approach, but not yet a perfected one. The amblr is also
interpreted as an art installation which was visited by approximately 1000 people over 5
days. The installation maps the virtual space created by the amblr to a physical space
- …