11 research outputs found

    Voice Analysis for Stress Detection and Application in Virtual Reality to Improve Public Speaking in Real-time: A Review

    Full text link
    Stress during public speaking is common and adversely affects performance and self-confidence. Extensive research has been carried out to develop various models to recognize emotional states. However, minimal research has been conducted to detect stress during public speaking in real time using voice analysis. In this context, the current review showed that the application of algorithms was not properly explored and helped identify the main obstacles in creating a suitable testing environment while accounting for current complexities and limitations. In this paper, we present our main idea and propose a stress detection computational algorithmic model that could be integrated into a Virtual Reality (VR) application to create an intelligent virtual audience for improving public speaking skills. The developed model, when integrated with VR, will be able to detect excessive stress in real time by analysing voice features correlated to physiological parameters indicative of stress and help users gradually control excessive stress and improve public speaking performanceComment: 41 pages, 7 figures, 4 table

    Secure Automatic Speaker Verification Systems

    Get PDF
    The growing number of voice-enabled devices and applications consider automatic speaker verification (ASV) a fundamental component. However, maximum outreach for ASV in critical domains e.g., financial services and health care, is not possible unless we overcome security breaches caused by voice cloning, and replayed audios collectively known as the spoofing attacks. The audio spoofing attacks over ASV systems on one hand strictly limit the usability of voice-enabled applications; and on the other hand, the counterfeiter also remains untraceable. Therefore, to overcome these vulnerabilities, a secure ASV (SASV) system is presented in this dissertation. The proposed SASV system is based on the concept of novel sign modified acoustic local ternary pattern (sm-ALTP) features and asymmetric bagging-based classifier-ensemble. The proposed audio representation approach clusters the high and low-frequency components in audio frames by normally distributing frequency components against a convex function. Then, the neighborhood statistics are applied to capture the user specific vocal tract information. This information is then utilized by the classifier ensemble that is based on the concept of weighted normalized voting rule to detect various spoofing attacks. Contrary to the existing ASV systems, the proposed SASV system not only detects the conventional spoofing attacks (i.e. voice cloning, and replays), but also the new attacks that are still unexplored by the research community and a requirement of the future. In this regard, a concept of cloned replays is presented in this dissertation, where, replayed audios contains the microphone characteristics as well as the voice cloning artifacts. This depicts the scenario when voice cloning is applied in real-time. The voice cloning artifacts suppresses the microphone characteristics thus fails replay detection modules and similarly with the amalgamation of microphone characteristics the voice cloning detection gets deceived. Furthermore, the proposed scheme can be utilized to obtain a possible clue against the counterfeiter through voice cloning algorithm detection module that is also a novel concept proposed in this dissertation. The voice cloning algorithm detection module determines the voice cloning algorithm used to generate the fake audios. Overall, the proposed SASV system simultaneously verifies the bonafide speakers and detects the voice cloning attack, cloning algorithm used to synthesize cloned audio (in the defined settings), and voice-replay attacks over the ASVspoof 2019 dataset. In addition, the proposed method detects the voice replay and cloned voice replay attacks over the VSDC dataset. Rigorous experimentation against state-of-the-art approaches also confirms the robustness of the proposed research

    Convolutional Neural Networks and their Application in Cancer Diagnosis based on RNA-Sequencing

    Get PDF
    Η έκφραση γονιδίων αποτελεί τη μελέτη της λειτουργίας της γονιδιακής μεταγραφής, κατά την οποία συνθέτονται γονιδιακά προϊόντα, είδη RNA ή πρωτεΐνες. Η μελέτη της παρέχει την κατανόηση των κυτταρικών λειτουργιών, όπως η κυτταρική διαφοροποίηση και οι μη φυσιολογικές παθολογικές λειτουργίες. Ο καρκίνος αποτελεί μία γενετική ασθένεια όπου γενετικές παραλλαγές προκαλούν μη φυσιολογικές λειτουργίες στα γονίδια και τροποποιούν την έκφραση τους. Οι πρωτεΐνες, οι οποίες αποτελούν το τελικό αποτέλεσμα της έκφρασης γονιδίων, καθορίζουν τους φαινοτύπους και τις βιολογικές λειτουργίες. Συνεπώς, η ανίχνευση των επιπέδων έκφρασης γονιδίων δύναται να χρησιμοποιηθεί στη διάγνωση, την πρόγνωση, ακόμα και την επιλογή της θεραπείας του καρκίνου. Σε αυτή την πτυχιακή θα αναλυθεί η θεωρία και οι εφαρμογές της Βαθειάς Μάθησης. Στη συνέχεια, θα εφαρμοστεί η Βαθειά Μάθηση και πιο συγκεκριμένα ένα Συνελικτικό Νευρωνικό Δίκτυο, ως μέσο για τη διάγνωση πολλαπλών τύπων καρκίνου (κατηγοριοποίηση καρκίνων) χρησιμοποιώντας δεδομένα έκφρασης γονιδίων, και πιο συγκεκριμένα αλληλουχίες RNA. Τα δεδομένα του «The Cancer Genome Atlas» (TCGA) αποτελούνται από αλληλουχίες RNA. Θα επεξεργαστούν σε πρώτο επίπεδο και μετά θα μετατραπούν σε πολλαπλές δισδιάστατες εικόνες. Οι εικόνες αυτές θα εισαχθούν σε ένα Συνελικτικό Νευρωνικό Δίκτυο, το οποίο θα τις κατηγοριοποιήσει σε 33 τύπους καρκίνου, αποσκοπώντας στην διάγνωση με τη μέγιστη δυνατή ακρίβεια.Gene expression analysis is the study of the way genes are transcribed to synthesize functional gene products, functional RNA species, or protein products. Its study can provide insights of cellular processes, such as cellular differentiation and abnormal pathological processes. Cancer is a genetic disease where genetic variations cause abnormally functioning genes that appear to alter expression. Proteins, being the final products of gene expression, define the phenotypes and biological processes. Therefore, detecting gene expression levels can be used for cancer diagnosis, prognosis, and even treatment prediction. This thesis will be analyzing the theory and applications of Deep Learning. It will then apply Deep Learning (DL) and in particular a Convolutional Neural Network (CNN) as a means for the diagnosis of multiple cancer types (pan-cancer classification) using gene expression data and specifically RNA-sequencing. The Cancer Genome Atlas (TCGA) data, which consists of RNA-sequencing, will be preprocessed and then embedded into multiple two-dimensional (2D) images. These images will then be applied to a Convolutional Neural Network which will classify them into 33 types of cancer, in an attempt to achieve the highest possible diagnosis accuracy

    Acoustic Modelling for Under-Resourced Languages

    Get PDF
    Automatic speech recognition systems have so far been developed only for very few languages out of the 4,000-7,000 existing ones. In this thesis we examine methods to rapidly create acoustic models in new, possibly under-resourced languages, in a time and cost effective manner. For this we examine the use of multilingual models, the application of articulatory features across languages, and the automatic discovery of word-like units in unwritten languages

    Austronesian and other languages of the Pacific and South-east Asia : an annotated catalogue of theses and dissertations

    Get PDF

    Speech Signal Processing in Application of Soft Computing Methods

    Get PDF
    Cílem této diplomové práce je prozkoumat odvětví Soft Computingových metod v oblasti zpracování řečového signálu, nalézt vhodnou metodu pro potlačení šumu a tu následně použít v simulaci a praktické realizaci. Vybraná metoda bude popsána teoreticky i matematicky a její implementace bude použita pro potlačení okolního šumu z hlasového signálu pro ovládání chytré domácnosti Smart Home založené na sběrnicovém systému KNX. Hlasové řízení provozně technických funkcí tohoto komunikačního sběrnicového systému je předpokladem pro jednodušší správu domácnosti, vylučující jinak nutnou manuální manipulaci s ovládacím zařízením a to zejména pro seniory nebo postižené osoby. V domácnostech se ovšem nachází řada rušivých elementů, mezi které můžeme zařadit například šum domácích spotřebičů, nebo vlivy počasí, které mohou zapříčinit nefunkčnost tohoto ovládání a je tedy potřeba je odstranit. V dnešní době je k dispozici již množství filtrů schopných uspokojivě odstranit předem specifikovaný šum, avšak použití nelineárních adaptivních metod by mohlo tyto výsledky posunout na zcela jinou úroveň. Po úspěšné implementaci je nutné provést zhodnocení signálu, zda byla použitá metoda v potlačení šumu úspěšná.The aim of this diploma thesis is investigate the field of Soft Computing methods in Speech Signal Processing and find an appropriate method for noise suppression and use it in simulation and practical implementation. The chosen method will be theoretically and mathematically described and used in implementation to suppress the ambient noise from speech signal for controlling a Smart Home based on the KNX bus system. Voice control of the operational technical features of this communication bus system is a prerequisite for a simpler household management, eliminating otherwise necessary manual handling of the control device, especially for seniors or disabled persons. However, in households there are a number of disturbing elements, including for example the noise of household appliances or weather conditions that may cause this control to malfunction and noise need to be removed. Nowadays, a number of filters are already available to satisfactorily remove pre-specified noise, but the use of non-linear adaptive methods could shift this results to a completely different level. After successful implementation, it is necessary to evaluate the signal if the used noise suppression method has been successful.450 - Katedra kybernetiky a biomedicínského inženýrstvídobř

    Developing natural language processing instruments to study sociotechnical systems

    Get PDF
    Identifying temporal linguistic patterns and tracing social amplification across communities has always been vital to understanding modern sociotechnical systems. Now, well into the age of information technology, the growing digitization of text archives powered by machine learning systems has enabled an enormous number of interdisciplinary studies to examine the coevolution of language and culture. However, most research in that domain investigates formal textual records, such as books and newspapers. In this work, I argue that the study of conversational text derived from social media is just as important. I present four case studies to identify and investigate societal developments in longitudinal social media streams with high temporal resolution spanning over 100 languages. These case studies show how everyday conversations on social media encode a unique perspective that is often complementary to observations derived from more formal texts. This unique perspective improves our understanding of modern sociotechnical systems and enables future research in computational linguistics, social science, and behavioral science
    corecore