2 research outputs found

    Computer audition for emotional wellbeing

    Get PDF
    This thesis is focused on the application of computer audition (i. e., machine listening) methodologies for monitoring states of emotional wellbeing. Computer audition is a growing field and has been successfully applied to an array of use cases in recent years. There are several advantages to audio-based computational analysis; for example, audio can be recorded non-invasively, stored economically, and can capture rich information on happenings in a given environment, e. g., human behaviour. With this in mind, maintaining emotional wellbeing is a challenge for humans and emotion-altering conditions, including stress and anxiety, have become increasingly common in recent years. Such conditions manifest in the body, inherently changing how we express ourselves. Research shows these alterations are perceivable within vocalisation, suggesting that speech-based audio monitoring may be valuable for developing artificially intelligent systems that target improved wellbeing. Furthermore, computer audition applies machine learning and other computational techniques to audio understanding, and so by combining computer audition with applications in the domain of computational paralinguistics and emotional wellbeing, this research concerns the broader field of empathy for Artificial Intelligence (AI). To this end, speech-based audio modelling that incorporates and understands paralinguistic wellbeing-related states may be a vital cornerstone for improving the degree of empathy that an artificial intelligence has. To summarise, this thesis investigates the extent to which speech-based computer audition methodologies can be utilised to understand human emotional wellbeing. A fundamental background on the fields in question as they pertain to emotional wellbeing is first presented, followed by an outline of the applied audio-based methodologies. Next, detail is provided for several machine learning experiments focused on emotional wellbeing applications, including analysis and recognition of under-researched phenomena in speech, e. g., anxiety, and markers of stress. Core contributions from this thesis include the collection of several related datasets, hybrid fusion strategies for an emotional gold standard, novel machine learning strategies for data interpretation, and an in-depth acoustic-based computational evaluation of several human states. All of these contributions focus on ascertaining the advantage of audio in the context of modelling emotional wellbeing. Given the sensitive nature of human wellbeing, the ethical implications involved with developing and applying such systems are discussed throughout

    Flamenco music information retrieval.

    Get PDF
    El flamenco, un género musical centrado en la improvisación y la espontaneidad, tiene su origen en el sur de España y atrae a una creciente comunidad de aficionados de países de todo el mundo. El aumento constante y la accesibilidad a colecciones digitales de flamenco, en archivos de música y plataformas online, exige el desarrollo de métodos de análisis y descripción computacionales con el fin de indexar y analizar el contenido musical de manera automática. Music Information Retrieval (MIR) es un área de investigación multidisciplinaria dedicada a la extracción automática de información musical desde grabaciones de audio y partituras. Sin embargo, la gran mayoría de las herramientas existentes se dirigen a la música clásica y la música popular occidental y, a menudo, no se generalizan bien a las tradiciones musicales no occidentales, particularmente cuando las suposiciones relacionadas con la teoría musical no son válidas para estos géneros. Por otro lado, las características y los conceptos musicales específicos de una tradición musical pueden implicar nuevos desafíos computacionales, para los cuales no existen métodos adecuados. Esta tesis enfoca estas limitaciones existentes en el área abordando varios desafíos computacionales que surgen en el contexto de la música flamenca. Con este fin, se realizan una serie de contribuciones en forma de algoritmos novedosos, evaluaciones comparativas y estudios basados en datos, dirigidos a varias dimensiones musicales y que abarcan varias subáreas de ingeniería, matemática computacional, estadística, optimización y musicología computacional. Una particularidad del género, que influye enormemente en el trabajo presentado en esta tesis, es la ausencia de partituras para el cante flamenco. En consecuencia, los métodos computacionales deben basarse únicamente en el análisis de grabaciones, o de transcripciones extraídas automáticamente, lo que genera una colección de nuevos problemas computacionales. Un aspecto clave del flamenco es la presencia de patrones melódicos recurrentes, que esán sujetos a variación y ornamentación durante su interpretación. Desde la perspectiva computacional, identificamos tres tareas relacionadas a esta característica que se abordan en esta tesis: la clasificación por melodía, la búsqueda de secuencias melódicas y la extracción de patrones melódicos. Además, nos acercamos a la tarea de la detección no supervisada de frases melódicas repetidas y exploramos el uso de métodos de deep learning para la identificación de cantaores en grabaciones de video y la segmentación estructural de grabaciones de audio. Finalmente, demostramos en un estudio de minería de datos, cómo una exploración de anotaciones extraídas de manera automática de un corpus amplio de grabaciones nos ayuda a descubrir correlaciones interesantes y asimilar conocimientos sobre este género mayormente indocumentado.Flamenco is a rich performance-oriented art music genre from Southern Spain, which attracts a growing community of aficionados around the globe. The constantly increasing number of digitally available flamenco recordings in music archives, video sharing platforms and online music services calls for the development of genre-specific description and analysis methods, capable of automatically indexing and examining these collections in a content-driven manner. Music Information Retrieval is a multi-disciplinary research area dedicated to the automatic extraction of musical information from audio recordings and scores. Most existing approaches were however developed in the context of popular or classical music and do often not generalise well to non-Western music traditions, in particular when the underlying music theoretical assumptions do not hold for these genres. The specific characteristics and concepts of a music tradition can furthermore imply newcomputational challenges, for which no suitable methods exist. This thesis addresses these current shortcomings of Music Information Retrieval by tackling several computational challenge which arise in the context of flamenco music. To this end, a number of contributions to the field are made in form of novel algorithms, comparative evaluations and data-driven studies, directed at various musical dimensions and encompassing several sub-areas of computer science, computational mathematics, statistics, optimisation and computational musicology. A particularity of flamenco, which immensely shapes the work presented in this thesis, is the absence of written scores. Consequently, computational approaches can solely rely on the direct analysis of raw audio recordings or automatically extracted transcriptions, and this restriction generates set of new computational challenges. A key aspect of flamenco is the presence of reoccurring melodic templates, which are subject to heavy variation during performance. From a computational perspective, we identify three tasks related to this characteristic - melody classification, melody retrieval and melodic template extraction - which are addressed in this thesis. We furthermore approach the task of detecting repeated sung phrases in an unsupervised manner and explore the use of deep learning methods for image-based singer identification in flamenco videos and structural segmentation of flamenco recordings. Finally, we demonstrate in a data-driven corpus study, how automatic annotations can be mined to discover interesting correlations and gain insights into a largely undocumented genre
    corecore