Search CORE

5,809 research outputs found

Automatic Detectors for Underwater Soundscape Measurements

Author: Madhusudhana Shyam Kumar
Publication venue: Curtin University
Publication date: 01/01/2015
Field of study

Environmental impact regulations require that marine industrial operators quantify their contribution to underwater noise scenes. Automation of such assessments becomes feasible with the successful categorisation of sounds into broader classes based on source types – biological, anthropogenic and physical. Previous approaches to passive acoustic monitoring have mostly been limited to a few specific sources of interest. In this study, source-independent signal detectors are developed and a framework is presented for the automatic categorisation of underwater sounds into the aforementioned classes

espace@Curtin

Glottal-synchronous speech processing

Author: Thomas Mark R P
Thomas Mark R P
Publication venue: Electrical and Electronic Engineering, Imperial College London
Publication date: 01/01/2010
Field of study

Glottal-synchronous speech processing is a field of speech science where the pseudoperiodicity of voiced speech is exploited. Traditionally, speech processing involves segmenting and processing short speech frames of predefined length; this may fail to exploit the inherent periodic structure of voiced speech which glottal-synchronous speech frames have the potential to harness. Glottal-synchronous frames are often derived from the glottal closure instants (GCIs) and glottal opening instants (GOIs). The SIGMA algorithm was developed for the detection of GCIs and GOIs from the Electroglottograph signal with a measured accuracy of up to 99.59%. For GCI and GOI detection from speech signals, the YAGA algorithm provides a measured accuracy of up to 99.84%. Multichannel speech-based approaches are shown to be more robust to reverberation than single-channel algorithms. The GCIs are applied to real-world applications including speech dereverberation, where SNR is improved by up to 5 dB, and to prosodic manipulation where the importance of voicing detection in glottal-synchronous algorithms is demonstrated by subjective testing. The GCIs are further exploited in a new area of data-driven speech modelling, providing new insights into speech production and a set of tools to aid deployment into real-world applications. The technique is shown to be applicable in areas of speech coding, identification and artificial bandwidth extension of telephone speec

Spiral - Imperial College Digital Repository

OpenGrey Repository

Phonetically transparent technique for the automatic transcription of speech

Author: Morony Michael J.
Publication venue: The University of Edinburgh
Publication date: 01/01/1998
Field of study

Edinburgh Research Archive

Adaptation of reference patterns in word-based speech recognition

Author: McInnes Fergus Robert
Publication venue: The University of Edinburgh
Publication date: 01/01/1988
Field of study

Edinburgh Research Archive

NeuroFlux: memory-efficient CNN training using adaptive local learning

Author: Saikumar Dhananjay
Varghese Blesson
Publication venue: ACM
Publication date: 01/04/2024
Field of study

Efficient on-device Convolutional Neural Network (CNN) training in resource-constrained mobile and edge environments is an open challenge. Backpropagation is the standard approach adopted, but it is GPU memory intensive due to its strong inter-layer dependencies that demand intermediate activations across the entire CNN model to be retained in GPU memory. This necessitates smaller batch sizes to make training possible within the available GPU memory budget, but in turn, results in substantially high and impractical training time. We introduce NeuroFlux, a novel CNN training system tailored for memory-constrained scenarios. We develop two novel opportunities: firstly, adaptive auxiliary networks that employ a variable number of filters to reduce GPU memory usage, and secondly, block-specific adaptive batch sizes, which not only cater to the GPU memory constraints but also accelerate the training process. NeuroFlux segments a CNN into blocks based on GPU memory usage and further attaches an auxiliary network to each layer in these blocks. This disrupts the typical layer dependencies under a new training paradigm - 'adaptive local learning'. Moreover, NeuroFlux adeptly caches intermediate activations, eliminating redundant forward passes over previously trained blocks, further accelerating the training process. The results are twofold when compared to Backpropagation: on various hardware platforms, NeuroFlux demonstrates training speed-ups of 2.3× to 6.1× under stringent GPU memory budgets, and NeuroFlux generates streamlined models that have 10.9× to 29.4× fewer parameter

University of St. Andrews - Pure

St Andrews Research Repository

Automatic prosodic analysis for computer aided pronunciation teaching

Author: Bagshaw Paul Christopher
Publication venue: The University of Edinburgh
Publication date: 01/01/1994
Field of study

Correct pronunciation of spoken language requires the appropriate modulation of acoustic characteristics of speech to convey linguistic information at a suprasegmental level. Such prosodic modulation is a key aspect of spoken language and is an important component of foreign language learning, for purposes of both comprehension and intelligibility. Computer aided pronunciation teaching involves automatic analysis of the speech of a non-native talker in order to provide a diagnosis of the learner's performance in comparison with the speech of a native talker. This thesis describes research undertaken to automatically analyse the prosodic aspects of speech for computer aided pronunciation teaching. It is necessary to describe the suprasegmental composition of a learner's speech in order to characterise significant deviations from a native-like prosody, and to offer some kind of corrective diagnosis. Phonological theories of prosody aim to describe the suprasegmental composition of speech..

CiteSeerX

Edinburgh Research Archive

Audio-Visual Egocentric Action Recognition

Author: Kazakos Evangelos
Publication venue
Publication date: 21/06/2022
Field of study

Explore Bristol Research

Study of young infants as social beings

Author: Slyvester-Bradley Ben
Publication venue: The University of Edinburgh
Publication date: 01/01/1980
Field of study

In theories of development, an important but controversial question is whether or not young infants are social beings. For example, it is often argued that, while infants may appear to interact with adults, this is a mistaken impression until such a time as they have fulfilled certain theoretically defined criteria for sociability. The aims of this study were first, empirically to evaluate arguments for and against the view that infants have an EtXel,i sensitivity to other persons, and secondly, if such a sensitivity were found, to discover how it develops during the first six months of life. Both an experiment and detailed naturalistic observations were made to answer the first question. The experiment produced preliminary evidence that the behaviour of two- month -olds is consistently different with persons and with graspable objects. This finding was supported by fine -grain analysis of a filmed interaction between a two -month -old and her mother which produced conclusive evidence that young infants are sensitive not only to the form of others' actions but to the social significance of their actions, insofar as those actions affect the infant's immediate interests. Subsequent observations and experiments were made to find how social sensitivity or 'intersubjectivity' develops during the first six months of life. These involved comparisons between infants' behaviour when interacting with their mother, with strangers and with novel and familiar face -masks. Behaviour was recorded on video-tape for approximately four minutes in each condition, twice a month, between six and twenty -eight weeks of age. Findings showed that there is a peak of social interest between six and ten weeks of age which is followed by a decline. This decline was due to a general increase in infants' ability to take active control of their surroundings - typified by their increased interest in objects and in playing interpersonal games (as opposed to participating in 'conversational' adult-infant exchanges). Associated with this decline of interest was increased 'negativity' during interactions with the mother and with other stimuli (i.e. actions of refusing or shutting out contact with other entities). Twelve examples of negativity are described in detail. The thesis also includes a theoretical contribution to Lacan's and Winnicott's notion of 'mirroring', based on the analysis of maternal babytalk. This suggests that mirroring is not simply a social phenomenon but is also an ideological phenomenon and constitutes, therefore, a complex and salient form of social influence during early infancy. The thesis concludes with a Spinozan argument that, notwithstanding their innate sensitivity to other persons, the development of infants as persons should be viewed as a more all- embracing process than is usually connoted by the phrase 'social development'; namely, as just one expression of the essential process by which humans increase their power of self - determination

Edinburgh Research Archive

OpenGrey Repository

EMG-to-Speech: Direct Generation of Speech from Facial Electromyographic Signals

Author: Janke Matthias
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2016
Field of study

The general objective of this work is the design, implementation, improvement and evaluation of a system that uses surface electromyographic (EMG) signals and directly synthesizes an audible speech output: EMG-to-speech

KITopen

SEGREGATION OF SPEECH SIGNALS IN NOISY ENVIRONMENTS

Author: Vishnubhotla Srikanth
Publication venue
Publication date: 01/01/2011
Field of study

Automatic segregation of overlapping speech signals from single-channel recordings is a challenging problem in speech processing. Similarly, the problem of extracting speech signals from noisy speech is a problem that has attracted a variety of research for several years but is still unsolved. Speech extraction from noisy speech mixtures where the background interference could be either speech or noise is especially difficult when the task is to preserve perceptually salient properties of the recovered acoustic signals for use in human communication. In this work, we propose a speech segregation algorithm that can simultaneously deal with both background noise as well as interfering speech. We propose a feature-based, bottom-up algorithm which makes no assumptions about the nature of the interference or does not rely on any prior trained source models for speech extraction. As such, the algorithm should be applicable for a wide variety of problems, and also be useful for human communication since an aim of the system is to recover the target speech signals in the acoustic domain. The proposed algorithm can be compartmentalized into (1) a multi-pitch detection stage which extracts the pitch of the participating speakers, (2) a segregation stage which teases apart the harmonics of the participating sources, (3) a reliability and add-back stage which scales the estimates based on their reliability and adds back appropriate amounts of aperiodic energy for the unvoiced regions of speech and (4) a speaker assignment stage which assigns the extracted speech signals to their appropriate respective sources. The pitch of two overlapping speakers is extracted using a novel feature, the 2-D Average Magnitude Difference Function, which is also capable of giving a single pitch estimate when the input contains only one speaker. The segregation algorithm is based on a least squares framework relying on the estimated pitch values to give estimates of each speaker's contributions to the mixture. The reliability block is based on a non-linear function of the energy of the estimates, this non-linear function having been learnt from a variety of speech and noise data but being very generic in nature and applicability to different databases. With both single- and multiple- pitch extraction and segregation capabilities, the proposed algorithm is amenable to both speech-in-speech and speech-in-noise conditions. The algorithm is evaluated on several objective and subjective tests using both speech and noise interference from different databases. The proposed speech segregation system demonstrates performance comparable to or better than the state-of-the-art on most of the objective tasks. Subjective tests on the speech signals reconstructed by the algorithm, on normal hearing as well as users of hearing aids, indicate a significant improvement in the perceptual quality of the speech signal after being processed by our proposed algorithm, and suggest that the proposed segregation algorithm can be used as a pre-processing block within the signal processing of communication devices. The utility of the algorithm for both perceptual and automatic tasks, based on a single-channel solution, makes it a unique speech extraction tool and a first of its kind in contemporary technology

Digital Repository at the University of Maryland