157 research outputs found

    The language of sounds unheard: Exploring sensory semantic knowledge in large language models

    Full text link
    Semantic dimensions of sound have been playing a central role in understanding the nature of auditory sensory experience as well as the broader relation between perception, language, and meaning. Accordingly, and given the recent proliferation of large language models (LLMs), here we asked whether such models exhibit an organisation of perceptual semantics similar to those observed in humans. Specifically, we prompted ChatGPT, a chatbot based on a state-of-the-art LLM, to rate musical instrument sounds on a set of 20 semantic scales. We elicited multiple responses in separate chats, analogous to having multiple human raters. ChatGPT generated semantic profiles that only partially correlated with human ratings, yet showed robust agreement along well-known psychophysical dimensions of musical sounds such as brightness (bright-dark) and pitch height (deep-high). Exploratory factor analysis suggested the same dimensionality but different spatial configuration of a latent factor space between the chatbot and human ratings. Unexpectedly, the chatbot showed degrees of internal variability that were comparable in magnitude to that of human ratings. Our work highlights the potential of LLMs to capture salient dimensions of human sensory experience.Comment: 12 pages, 3 figure

    Brightness perception for musical instrument sounds: Relation to timbre dissimilarity and source-cause categories.

    Get PDF
    Timbre dissimilarity of orchestral sounds is well-known to be multidimensional, with attack time and spectral centroid representing its two most robust acoustical correlates. The centroid dimension is traditionally considered as reflecting timbral brightness. However, the question of whether multiple continuous acoustical and/or categorical cues influence brightness perception has not been addressed comprehensively. A triangulation approach was used to examine the dimensionality of timbral brightness, its robustness across different psychoacoustical contexts, and relation to perception of the sounds' source-cause. Listeners compared 14 acoustic instrument sounds in three distinct tasks that collected general dissimilarity, brightness dissimilarity, and direct multi-stimulus brightness ratings. Results confirmed that brightness is a robust unitary auditory dimension, with direct ratings recovering the centroid dimension of general dissimilarity. When a two-dimensional space of brightness dissimilarity was considered, its second dimension correlated with the attack-time dimension of general dissimilarity, which was interpreted as reflecting a potential infiltration of the latter into brightness dissimilarity. Dissimilarity data were further modeled using partial least-squares regression with audio descriptors as predictors. Adding predictors derived from instrument family and the type of resonator and excitation did not improve the model fit, indicating that brightness perception is underpinned primarily by acoustical rather than source-cause cues

    The Responsibility Problem in Neural Networks with Unordered Targets

    Full text link
    We discuss the discontinuities that arise when mapping unordered objects to neural network outputs of fixed permutation, referred to as the responsibility problem. Prior work has proved the existence of the issue by identifying a single discontinuity. Here, we show that discontinuities under such models are uncountably infinite, motivating further research into neural networks for unordered data.Comment: Accepted for TinyPaper archival at ICLR 2023: https://openreview.net/forum?id=jd7Hy1jRiv

    Fast Diffusion GAN Model for Symbolic Music Generation Controlled by Emotions

    Full text link
    Diffusion models have shown promising results for a wide range of generative tasks with continuous data, such as image and audio synthesis. However, little progress has been made on using diffusion models to generate discrete symbolic music because this new class of generative models are not well suited for discrete data while its iterative sampling process is computationally expensive. In this work, we propose a diffusion model combined with a Generative Adversarial Network, aiming to (i) alleviate one of the remaining challenges in algorithmic music generation which is the control of generation towards a target emotion, and (ii) mitigate the slow sampling drawback of diffusion models applied to symbolic music generation. We first used a trained Variational Autoencoder to obtain embeddings of a symbolic music dataset with emotion labels and then used those to train a diffusion model. Our results demonstrate the successful control of our diffusion model to generate symbolic music with a desired emotion. Our model achieves several orders of magnitude improvement in computational cost, requiring merely four time steps to denoise while the steps required by current state-of-the-art diffusion models for symbolic music generation is in the order of thousands

    Management in the Greek system of higher education

    Get PDF
    This article aims to outline the relationship between the Ministry of Education and the institutions of higher education in Greece. The co~ordination of this relationship is an issue vital for both the academic institutions - which require a degree of administrative independence to do their work on behalf of society; and for the State - which wishes to assure itself that the institutions of higher education are serving adequately the needs of society. The article concludes by arguing that the Ministry of Education exercises its control in the higher education sector through laws and regulations and intervenes in the day-ta-day administrative work of the academic institutions. The institutions in higher education are entirely subordinate to the State and have a limited voice in the decisions affecting their future development. Therefore. ministerial supervision may be considered as a case of 'bureaucratic overcentralisation' rather than as 'guidance' of the State.peer-reviewe

    Interactive Neural Resonators

    Full text link
    In this work, we propose a method for the controllable synthesis of real-time contact sounds using neural resonators. Previous works have used physically inspired statistical methods and physical modelling for object materials and excitation signals. Our method incorporates differentiable second-order resonators and estimates their coefficients using a neural network that is conditioned on physical parameters. This allows for interactive dynamic control and the generation of novel sounds in an intuitive manner. We demonstrate the practical implementation of our method and explore its potential creative applications

    Composer Style-specific Symbolic Music Generation Using Vector Quantized Discrete Diffusion Models

    Full text link
    Emerging Denoising Diffusion Probabilistic Models (DDPM) have become increasingly utilised because of promising results they have achieved in diverse generative tasks with continuous data, such as image and sound synthesis. Nonetheless, the success of diffusion models has not been fully extended to discrete symbolic music. We propose to combine a vector quantized variational autoencoder (VQ-VAE) and discrete diffusion models for the generation of symbolic music with desired composer styles. The trained VQ-VAE can represent symbolic music as a sequence of indexes that correspond to specific entries in a learned codebook. Subsequently, a discrete diffusion model is used to model the VQ-VAE's discrete latent space. The diffusion model is trained to generate intermediate music sequences consisting of codebook indexes, which are then decoded to symbolic music using the VQ-VAE's decoder. The results demonstrate our model can generate symbolic music with target composer styles that meet the given conditions with a high accuracy of 72.36%

    Multimodal Classification of Stressful Environments in Visually Impaired Mobility Using EEG and Peripheral Biosignals

    Get PDF
    In this study, we aim to better understand the cognitive-emotional experience of visually impaired people when navigating in unfamiliar urban environments, both outdoor and indoor. We propose a multimodal framework based on random forest classifiers, which predict the actual environment among predefined generic classes of urban settings, inferring on real-time, non-invasive, ambulatory monitoring of brain and peripheral biosignals. Model performance reached 93% for the outdoor and 87% for the indoor environments (expressed in weighted AUROC), demonstrating the potential of the approach. Estimating the density distributions of the most predictive biomarkers, we present a series of geographic and temporal visualizations depicting the environmental contexts in which the most intense affective and cognitive reactions take place. A linear mixed model analysis revealed significant differences between categories of vision impairment, but not between normal and impaired vision. Despite the limited size of our cohort, these findings pave the way to emotionally intelligent mobility-enhancing systems, capable of implicit adaptation not only to changing environments but also to shifts in the affective state of the user in relation to different environmental and situational factors
    • …
    corecore