Search CORE

915 research outputs found

Sculpting Unrealities: Using Machine Learning to Control Audiovisual Compositions in Virtual Reality

Author: Dunphy Bryan
Publication venue: Goldsmiths, University of London
Publication date
Field of study

This thesis explores the use of interactive machine learning (IML) techniques to control audiovisual compositions within the emerging medium of virtual reality (VR). Accompanying the text is a portfolio of original compositions and open-source software. These research outputs represent the practical elements of the project that help to shed light on the core research question: how can IML techniques be used to control audiovisual compositions in VR? In order to find some answers to this question, it was broken down into its constituent elements. To situate the research, an exploration of the contemporary field of audiovisual art locates the practice between the areas of visual music and generative AV. This exploration of the field results in a new method of categorising the constituent practices. The practice of audiovisual composition is then explored, focusing on the concept of equality. It is found that, throughout the literature, audiovisual artists aim to treat audio and visual material equally. This is interpreted as a desire for balance between the audio and visual material. This concept is then examined in the context of VR. A feeling of presence is found to be central to this new medium and is identified as an important consideration for the audiovisual composer in addition to the senses of sight and sound. Several new terms are formulated which provide the means by which the compositions within the portfolio are analysed. A control system, based on IML techniques, is developed called the Neural AV Mapper. This is used to develop a compositional methodology through the creation of several studies. The outcomes from these studies are incorporated into two live performance pieces, Ventriloquy I and Ventriloquy II. These pieces showcase the use of IML techniques to control audiovisual compositions in a live performance context. The lessons learned from these pieces are incorporated into the development of the ImmersAV toolkit. This open-source software toolkit was built specifically to allow for the exploration of the IML control paradigm within VR. The toolkit provides the means by which the immersive audiovisual compositions, Obj_#3 and Ag Fás Ar Ais Arís are created. Obj_#3 takes the form of an immersive audiovisual sculpture that can be manipulated in real-time by the user. The title of the thesis references the physical act of sculpting audiovisual material. It also refers to the ability of VR to create alternate realities that are not bound to the physics of real-life. This exploration of unrealities emerges as an important aspect of the medium. The final piece in the portfolio, Ag Fás Ar Ais Arís takes the knowledge gained from the earlier work and pushes the boundaries to maximise the potential of the medium and the material

Goldsmiths Research Online

How touch and hearing influence visual processing in sensory substitution, synaesthesia and cross-modal correspondences

Author: Hamilton-Fletcher Giles
Publication venue
Publication date: 16/10/2015
Field of study

Sensory substitution devices (SSDs) systematically turn visual dimensions into patterns of tactile or auditory stimulation. After training, a user of these devices learns to translate these audio or tactile sensations back into a mental visual picture. Most previous SSDs translate greyscale images using intuitive cross-sensory mappings to help users learn the devices. However more recent SSDs have started to incorporate additional colour dimensions such as saturation and hue. Chapter two examines how previous SSDs have translated the complexities of colour into hearing or touch. The chapter explores if colour is useful for SSD users, how SSD and veridical colour perception differ and how optimal cross-sensory mappings might be considered. After long-term training, some blind users of SSDs report visual sensations from tactile or auditory stimulation. A related phenomena is that of synaesthesia, a condition where stimulation of one modality (i.e. touch) produces an automatic, consistent and vivid sensation in another modality (i.e. vision). Tactile-visual synaesthesia is an extremely rare variant that can shed light on how the tactile-visual system is altered when touch can elicit visual sensations. Chapter three reports a series of investigations on the tactile discrimination abilities and phenomenology of tactile-vision synaesthetes, alongside questionnaire data from synaesthetes unavailable for testing. Chapter four introduces a new SSD to test if the presentation of colour information in sensory substitution affects object and colour discrimination. Chapter five presents experiments on intuitive auditory-colour mappings across a wide variety of sounds. These findings are used to predict the reported colour hallucinations resulting from LSD use while listening to these sounds. Chapter six uses a new sensory substitution device designed to test the utility of these intuitive sound-colour links for visual processing. These findings are discussed with reference to how cross-sensory links, LSD and synaesthesia can inform optimal SSD design for visual processing

Sussex Research Online

Investigating the Cognitive and Neural Mechanisms underlying Multisensory Perceptual Decision-Making in Humans

Author: Bolam Joshua William
Publication venue
Publication date: 01/12/2022
Field of study

On a frequent day-to-day basis, we encounter situations that require the formation of decisions based on ambiguous and often incomplete sensory information. Perceptual decision-making defines the process by which sensory information is consolidated and accumulated towards one of multiple possible choice alternatives, which inform our behavioural responses. Perceptual decision-making can be understood both theoretically and neurologically as a process of stochastic sensory evidence accumulation towards some choice threshold. Once this threshold is exceeded, a response is facilitated, informing the overt actions undertaken. Prevalent progress has been made towards understanding the cognitive and neural mechanisms underlying perceptual decision-making. Analyses of Reaction Time (RTs; typically constrained to milliseconds) and choice accuracy; reflecting decision-making behaviour, can be coupled with neuroimaging methodologies; notably electroencephalography (EEG) and functional Magnetic Resonance Imaging (fMRI), to identify spatiotemporal components representative of the neural signatures corresponding to such accumulation-to-bound decision formation on a single-trial basis. Taken together, these provide us with an experimental framework conceptualising the key computations underlying perceptual decision-making. Despite this, relatively little remains known about the enhancements or alternations to the process of perceptual decision-making from the integration of information across multiple sensory modalities. Consolidating the available sensory evidence requires processing information presented in more than one sensory modality, often near-simultaneously, to exploit the salient percepts for what we term as multisensory (perceptual) decision-making. Specifically, multisensory integration must be considered within the perceptual decision-making framework in order to understand how information becomes stochastically accumulated to inform overt sensory-motor choice behaviours. Recently, substantial progress in research has been made through the application of behaviourally-informed, and/or neurally-informed, modelling approaches to benefit our understanding of multisensory decision-making. In particular, these approaches fit a number of model parameters to behavioural and/or neuroimaging datasets, in order to (a) dissect the constituent internal cognitive and neural processes underlying perceptual decision-making with both multisensory and unisensory information, and (b) mechanistically infer how multisensory enhancements arise from the integration of information across multiple sensory modalities to benefit perceptual decision formation. Despite this, the spatiotemporal locus of the neural and cognitive underpinnings of enhancements from multisensory integration remains subject to debate. In particular, our understanding of which brain regions are predictive of such enhancements, where they arise, and how they influence decision-making behaviours requires further exploration. The current thesis outlines empirical findings from three studies aimed at providing a more complete characterisation of multisensory perceptual decision-making, utilising EEG and accumulation-to-bound modelling methodologies to incorporate both behaviourally-informed and neurally-informed modelling approaches, investigating where, when, and how perceptual improvements arise during multisensory perceptual decision-making. Pointedly, these modelling approaches sought to probe the exerted modulatory influences of three factors: unisensory formulated cross-modal associations (Chapter 2), natural ageing (Chapter 3), and perceptual learning (Chapter 4), on the integral cognitive and neural mechanisms underlying observable benefits towards multisensory decision formation. Chapter 2 outlines secondary analyses, utilising a neurally-informed modelling approach, characterising the spatiotemporal dynamics of neural activity underlying auditory pitch-visual size cross-modal associations. In particular, how unisensory auditory pitch-driven associations benefit perceptual decision formation was functionally probed. EEG measurements were recorded from participants during performance of an Implicit Association Test (IAT), a two-alternative forced-choice (2AFC) paradigm which presents one unisensory stimulus feature per trial for participants to categorise, but manipulates the stimulus feature-response key mappings of auditory pitch-visual size cross-modal associations from unisensory stimuli alone, thus overcoming the issue of mixed selectivity in recorded neural activity prevalent in previous cross-modal associative research, which near-simultaneously presented multisensory stimuli. Categorisations were faster (i.e., lower RTs) when stimulus feature-response key mappings were associatively congruent, compared to associatively incongruent, between the two associative counterparts, thus demonstrating a behavioural benefit to perceptual decision formation. Multivariate Linear Discriminant Analysis (LDA) was used to characterise the spatiotemporal dynamics of EEG activity underpinning IAT performance, in which two EEG components were identified that discriminated neural activity underlying the benefits of associative congruency of stimulus feature-response key mappings. Application of a neurally-informed Hierarchical Drift Diffusion Model (HDDM) demonstrated early sensory processing benefits, with increases in the duration of non-decisional processes with incongruent stimulus feature-response key mappings, and late post-sensory alterations to decision dynamics, with congruent stimulus feature-response key mappings decreasing the quantity of evidence required to facilitate a decision. Hence, we found that the trial-by-trial variability in perceptual decision formation from unisensory facilitated cross-modal associations could be predicted by neural activity within our neurally-informed modelling approach. Next, Chapter 3 outlines cognitive research investigating age-related impacts on the behavioural indices of multisensory perceptual decision-making (i.e., RTs and choice accuracy). Natural ageing has been demonstrated to diversely affect multisensory perceptual decision-making dynamics. However, the constituent cognitive processes affected remain unclear. Specifically, a mechanistic insight reconciling why older adults may exhibit preserved multisensory integrative benefits, yet display generalised perceptual deficits, relative to younger adults, remains inconclusive. To address this limitation, 212 participants performed an online variant of a well-established audiovisual object categorisation paradigm, whereby age-related differences in RTs and choice accuracy (binary responses) between audiovisual (AV), visual (V), and auditory (A) trial types could be assessed between Younger Adults (YAs; Mean ± Standard Deviation = 27.95 ± 5.82 years) and Older Adults (OAs; Mean ± Standard Deviation = 60.96 ± 10.35 years). Hierarchical Drift Diffusion Modelling (HDDM) was fitted to participants’ RTs and binary responses in order to probe age-related impacts on the latent underlying processes of multisensory decision formation. Behavioural results found that whereas OAs were typically slower (i.e., ↑ RTs) and less accurate (i.e., ↓ choice accuracy), relative to YAs across all sensory trial types, they exhibited greater differences in RTs between AV and V trials (i.e., ↑ AV-V RT difference), with no significant effects of choice accuracy, implicating preserved benefits of multisensory integration towards perceptual decision formation. HDDM demonstrated parsimonious fittings for characterising these behavioural discrepancies between YAs and OAs. Notably we found slower rates of sensory evidence accumulation (i.e., ↓ drift rates) for OAs across all sensory trial types, coupled with (1) higher rates of sensory evidence accumulation (i.e., ↑ drift rates) for OAs between AV versus V trial types irrespective of stimulus difficulty, coupled with (2) increased response caution (i.e., ↑ decision boundaries) between AV versus V trial types, and (3) decreased non-decisional processing duration (i.e., ↓ non-decision times) between AV versus V trial types for stimuli of increased difficulty respectively. Our findings suggest that older adults trade-off multisensory decision-making speed for accuracy to preserve enhancements towards perceptual decision formation relative to younger adults. Hence, they display an increased reliance on integrating multimodal information; through the principle of inverse effectiveness, as a compensatory mechanism for a generalised cognitive slowing when processing unisensory information. Overall, our findings demonstrate how computational modelling can reconcile contrasting hypotheses of age-related changes in processes underlying multisensory perceptual decision-making behaviour. Finally, Chapter 4 outlines research probing the exerted influence of perceptual learning on multisensory perceptual decision-making. Views of unisensory perceptual learning imply that improvements in perceptual sensitivity may be due to enhancements in early sensory representations and/or modulations to post-sensory decision dynamics. We sought to assess whether these views could account for improvements in perceptual sensitivity for multisensory stimuli, or even exacerbations of multisensory enhancements towards decision formation, by consolidating the spatiotemporal locus of where and when in the brain they may be observed. We recorded EEG activity from participants who completed the same audiovisual object categorisation paradigm (as outlined in Chapter 3), over three consecutive days. We used single-trial multivariate LDA to characterise the spatiotemporal trajectory of the decision dynamics underlying any observed multisensory benefits both (a) within and (b) between visual, auditory, and audiovisual trial types. While found significant decreases were found in RTs and increases in choice accuracy over testing days, we did not find any significant effects of perceptual learning on multisensory nor unisensory perceptual decision formation. Similarly, EEG analysis did not find any neural components indicative of early or late modulatory effects from perceptual learning in brain activity, which we attribute to (1) a long duration of stimulus presentations (300ms), and (2) a lack of sufficient statistical power for our LDA classifier to discriminate face-versus-car trial types. We end this chapter with considerations for discerning multisensory benefits towards perceptual decision formation, and recommendations for altering our experimental design to observe the effects of perceptual learning as a decision neuromodulator. These findings contribute to literature justifying the increasing relevance of utilising behaviourally-informed and/or neurally-informed modelling approaches for investigating multisensory perceptual decision-making. In particular, a discussion of the underlying cognitive and/or neural mechanisms that can be attributed to the benefits of multisensory integration towards perceptual decision formation, as well as the modulatory impact of the decision modulators in question, can contribute to a theoretical reconciliation that multisensory integrative benefits are not ubiquitous to specific spatiotemporal neural dynamics nor cognitive processes

White Rose E-theses Online

Tangible auditory interfaces : combining auditory displays and tangible interfaces

Author: Bovermann Till
Publication venue: Bielefeld University
Publication date: 01/01/2009
Field of study

Bovermann T. Tangible auditory interfaces : combining auditory displays and tangible interfaces. Bielefeld (Germany): Bielefeld University; 2009.Tangible Auditory Interfaces (TAIs) investigates into the capabilities of the interconnection of Tangible User Interfaces and Auditory Displays. TAIs utilise artificial physical objects as well as soundscapes to represent digital information. The interconnection of the two fields establishes a tight coupling between information and operation that is based on the human's familiarity with the incorporated interrelations. This work gives a formal introduction to TAIs and shows their key features at hand of seven proof of concept applications

Publications at Bielefeld University

Fostering Social Interaction Through Sound Feedback: Sentire

Author: Kim Jin Hyun
Lussana Marcello
Rizzonelli Marta
Staudt Pascal
Publication venue: Humboldt-Universität zu Berlin
Publication date: 09/02/2022
Field of study

Sentire is a body–machine interface that sonifies motor behaviour in real time and a participatory, interactive performance in which two people use their physical movements to collaboratively create sound while constantly being influenced by the results. Based on our informal observation that basal social behaviours emerge during Sentire performances, the present article investigates our principal hypothesis that Sentire can foster basic mechanisms underlying non-verbal social interaction. We illustrate how coordination serves as a crucial basic mechanism for social interaction, and consider how it is addressed by various therapeutic approaches, including therapeutic use of real-time auditory feedback. Then we argue that the implementation of Sentire may be fruitful in healthcare contexts and in promoting general well-being. We describe how the Sentire system has been developed further within the scope of the research project ‘Social interaction through sound feedback–Sentire’ that combines human–computer interaction, sound design and real-world research, against the background of the relationship between sound, sociality and therapy. The question concerning how interaction is facilitated through Sentire is addressed through the first results of behavioural analysis using structured observation, which allows for a quasi-quantitative sequential analysis of interactive behaviour.Peer Reviewe

Dokumenten-Publikationsserver der Humboldt-Universität zu Berlin

Modelling talking human faces

Author: Albasri Samia
Publication venue
Publication date
Field of study

This thesis investigates a number of new approaches for visual speech synthesis using data-driven methods to implement a talking face. The main contributions in this thesis are the following. The accuracy of shared Gaussian process latent variable model (SGPLVM) built using the active appearance model (AAM) and relative spectral transform-perceptual linear prediction (RASTAPLP) features is improved by employing a more accurate AAM. This is the first study to report that using a more accurate AAM improves the accuracy of SGPLVM. Objective evaluation via reconstruction error is performed to compare the proposed approach against previously existing methods. In addition, it is shown experimentally that the accuracy of AAM can be improved by using a larger number of landmarks and/or larger number of samples in the training data. The second research contribution is a new method for visual speech synthesis utilising a fully Bayesian method namely the manifold relevance determination (MRD) for modelling dynamical systems through probabilistic non-linear dimensionality reduction. This is the first time MRD was used in the context of generating talking faces from the input speech signal. The expressive power of this model is in the ability to consider non-linear mappings between audio and visual features within a Bayesian approach. An efficient latent space has been learnt iii Abstract iv using a fully Bayesian latent representation relying on conditional nonlinear independence framework. In the SGPLVM the structure of the latent space cannot be automatically estimated because of using a maximum likelihood formulation. In contrast to SGPLVM the Bayesian approaches allow the automatic determination of the dimensionality of the latent spaces. The proposed method compares favourably against several other state-of-the-art methods for visual speech generation, which is shown in quantitative and qualitative evaluation on two different datasets. Finally, the possibility of incremental learning of AAM for inclusion in the proposed MRD approach for visual speech generation is investigated. The quantitative results demonstrate that using MRD in conjunction with incremental AAMs produces only slightly less accurate results than using batch methods. These results support a way of training this kind of models on computers with limited resources, for example in mobile computing. Overall, this thesis proposes several improvements to the current state-of-the-art in generating talking faces from speech signal leading to perceptually more convincing results

Online Research @ Cardiff

Framework for proximal personified interfaces

Author: Parker Liam
Publication venue: The University of Edinburgh
Publication date: 01/01/2000
Field of study

Edinburgh Research Archive

Shared cross-modal associations and the emergence of the lexicon

Author: Cuskley Christine F.
Publication venue: The University of Edinburgh
Publication date: 02/07/2013
Field of study

This thesis centres around a sensory theory of protolanguage emergence, or STP. The STP proposes that shared biases to make associations between sensory modalities provided the basis for the emergence of a shared protolinguistic lexicon. Crucially, this lexicon would have been grounded in our perceptual systems, and thus fundamentally non-arbitrary. The foundation of such a lexicon lies in shared cross-modal associations: biases shared among language users to map properties in one modality (e.g., visual size) onto another (e.g., vowel sounds). While there is broad evidence that we make associations between a variety of modalities (Spence, 2011), this thesis focuses specifically on associations involving linguistic sound, arguing that these associations would have been most important in language emergence. Early linguistic utterances, by virtue of their grounding in shared cross-modal associations, could be formed and understood with high mutual intelligibility. The first chapter of the thesis will outline this theory in detail, addressing the nature of the proposed protolanguage system, arguing for the utility of non-arbitrariness at the point of language emergence, and proposing evidence for the likely transition form a non-arbitrary protolanguage to the predominantly arbitrary language systems we observe today. The remainder of the thesis will focus on providing empirical evidence to support this theory in two ways: (i) presenting experimental data showing evidence of shared associations between linguistic sound and other modalities, and (ii) providing evidence that such associations are evident cross-linguistically, despite the predominantly arbitrary nature of modern languages. Chapter two will examine well-documented associations between vowel quality and physical size (e.g., /i/ is small, and /a/ is large; Sapir, 1929). This chapter presents a new experimental approach which fails to find robust associations between vowel quality and size absent the use of a forced choice paradigm. Chapter three turns to associations between linguistic sound and shape angularity, taking a critical perspective on the classic takete/maluma experiment (Kohler, 1929). New empirical evidence shows that the acquisition of visual word forms plays a highly influential role in mediating associations between linguistic sound and angularity, but that associations between linguistic sound and visual form also play a minor role in auditory tasks. Chapter four will examine a relatively unexplored modality: taste. A simple survey which asks participants to choose non-words to match representative tastes shows that certain linguistic sounds are preferred for certain food items. In a more detailed study, we use a more direct perceptual matching task with actual tastants and synthesises speech sounds, further showing that people make robust shared associations between linguistic sound and taste. Chapter five returns to the visual modality, considering previously unexmained associations between linguistic sound and motion, specifically the feature of speed. This study demonstrates that people do make robust associations between the two modalities, particularly for vowel quality. Chapter six will aim to take a different empirical approach, considering non-arbitrariness in natural language. Motivated by the experimental data from the previous chapters, we turn to corpus analyses to assess the presence of non-arbitrariness in natural language which concurs with behavioural data showing linguistic cross-modal associations. First, a corpus analysis of taste synonyms in English shows small but significant correlations between form and meaning. With the goal of addressing the universality of specific sound-meaning associations, we examine cross-linguistic corpora of taste and motion terms, showing that particular phonological features tend to connect to certain tastes and types of motion across genetically and geographically distinct languages. Lastly, the thesis will conclude by considering the STP in light of the empirical evidence presented, and suggesting possible future empirical directions to explore the theory more broadly

Edinburgh Research Archive