294 research outputs found

    Random walk attachment graphs

    Get PDF
    We consider the random walk attachment graph introduced by Saramäki and Kaski and proposed as a mechanism to explain how behaviour similar to preferential attachment may appear requiring only local knowledge. We show that if the length of the random walk is fixed then the resulting graphs can have properties significantly different from those of preferential attachment graphs, and in particular that in the case where the random walks are of length 1 and each new vertex attaches to a single existing vertex the proportion of vertices which have degree 1 tends to 1, in contrast to preferential attachment models

    Smart Transcription

    Get PDF
    The Intelligent Voice Smart Transcript is an interactive HTML5 document that contains the audio, a speech transcription and the key topics from an audio recording. It is designed to enable a quick and efficient review of audio communications by encapsulating the recording with the speech transcript and topics within a single HTML5 file. This paper outlines the rationale for the design of the SmartTranscript user experience. The paper discusses the difficulties of audio review, how there is large potential for misinterpretation associated with reviewing transcripts in isolation, and how additional diarization and topic tagging components augment the audio review process

    An Experimental Analysis of Deep Learning Architectures for Supervised Speech Enhancement

    Get PDF
    Recent speech enhancement research has shown that deep learning techniques are very effective in removing background noise. Many deep neural networks are being proposed, showing promising results for improving overall speech perception. The Deep Multilayer Perceptron, Convolutional Neural Networks, and the Denoising Autoencoder are well-established architectures for speech enhancement; however, choosing between different deep learning models has been mainly empirical. Consequently, a comparative analysis is needed between these three architecture types in order to show the factors affecting their performance. In this paper, this analysis is presented by comparing seven deep learning models that belong to these three categories. The comparison includes evaluating the performance in terms of the overall quality of the output speech using five objective evaluation metrics and a subjective evaluation with 23 listeners; the ability to deal with challenging noise conditions; generalization ability; complexity; and, processing time. Further analysis is then provided while using two different approaches. The first approach investigates how the performance is affected by changing network hyperparameters and the structure of the data, including the Lombard effect. While the second approach interprets the results by visualizing the spectrogram of the output layer of all the investigated models, and the spectrograms of the hidden layers of the convolutional neural network architecture. Finally, a general evaluation is performed for supervised deep learning-based speech enhancement while using SWOC analysis, to discuss the technique’s Strengths, Weaknesses, Opportunities, and Challenges. The results of this paper contribute to the understanding of how different deep neural networks perform the speech enhancement task, highlight the strengths and weaknesses of each architecture, and provide recommendations for achieving better performance. This work facilitates the development of better deep neural networks for speech enhancement in the future

    A Mixed Reality Approach for dealing with the Video Fatigue of Online Meetings

    Get PDF
    Much of the issue with video meetings is the lack of naturalistic cues, together with the feeling of being observed all the time. Video calls take away most body language cues, but because the person is still visible, your brain still tries to compute that non-verbal language. It means that you’re working harder, trying to achieve the impossible. This impacts data retention and can lead to participants feeling unnecessarily tired. This project aims to transform the way online meetings happen, by turning off the camera and simplifying the information that our brains need to compute, thus preventing ‘Zoom fatigue’. The immersive solution we are developing, iVXR, consists of cutting-edge augmented reality technology, natural language processing, speech to text technologies and sub-real-time hardware acceleration using high performance computing

    A Frequency Bin Analysis of Distinctive Ranges Between Human and Deepfake Generated Voices

    Get PDF
    Deepfake technology has advanced rapidly in recent years. The widespread availability of deepfake audio technology has raised concerns about its potential misuse for malicious purposes, and a need for more robust countermeasure systems is becoming ever more important. Here we analyse the differences between human and deepfake audio and introduce a novel audio pre-processing approach. Our analysis aims to show the specific locations in the frequency spectrum where these artefacts and distinctions between human and deepfake audio can be found. Our approach emphasises specific frequency ranges that we show are transferable across synthetic speech datasets. In doing so, we explore the use of a bespoke filter bank derived from our analysis of the WaveFake dataset to exploit commonalities across algorithms. Our filter bank was constructed based on a frequency bin analysis of the WaveFake dataset, we apply this filter bank to adjust gain/attenuation to improve the effective signal-to-noise ratio, doing so we reduce the similarities while accentuating differences. We then take a baseline performing model and experiment with improving the performance using these frequency ranges to show where these artefacts lie and if this knowledge is transferable across mel-spectrum algorithms. We show that there exist exploitable commonalities between deepfake voice generation methods that generate audio in the mel-spectrum and that artefacts are left behind in similar frequency regions. Our approach is evaluated on the ASVSpoof 2019 Logical Access dataset of which the test set contains unseen generative methods to test the efficacy of our filter bank approach and transferability. Our experiments show that there is enhanced classification performance to be gained from utilizing these transferable frequency bands where there are more artefacts and distinctions. Our highest-performing model provided a 14.75% improvement in Equal Error Rate against our baseline model
    • …
    corecore