1,173 research outputs found

    ACOUSTIC SPEECH MARKERS FOR TRACKING CHANGES IN HYPOKINETIC DYSARTHRIA ASSOCIATED WITH PARKINSON’S DISEASE

    Get PDF
    Previous research has identified certain overarching features of hypokinetic dysarthria associated with Parkinson’s Disease and found it manifests differently between individuals. Acoustic analysis has often been used to find correlates of perceptual features for differential diagnosis. However, acoustic parameters that are robust for differential diagnosis may not be sensitive to tracking speech changes. Previous longitudinal studies have had limited sample sizes or variable lengths between data collection. This study focused on using acoustic correlates of perceptual features to identify acoustic markers able to track speech changes in people with Parkinson’s Disease (PwPD) over six months. The thesis presents how this study has addressed limitations of previous studies to make a novel contribution to current knowledge. Speech data was collected from 63 PwPD and 47 control speakers using an online podcast software at two time points, six months apart (T1 and T2). Recordings of a standard reading passage, minimal pairs, sustained phonation, and spontaneous speech were collected. Perceptual severity ratings were given by two speech and language therapists for T1 and T2, and acoustic parameters of voice, articulation and prosody were investigated. Two analyses were conducted: a) to identify which acoustic parameters can track perceptual speech changes over time and b) to identify which acoustic parameters can track changes in speech intelligibility over time. An additional attempt was made to identify if these parameters showed group differences for differential diagnosis between PwPD and control speakers at T1 and T2. Results showed that specific acoustic parameters in voice quality, articulation and prosody could differentiate between PwPD and controls, or detect speech changes between T1 and T2, but not both factors. However, specific acoustic parameters within articulation could detect significant group and speech change differences across T1 and T2. The thesis discusses these results, their implications, and the potential for future studies

    (b2023 to 2014) The UNBELIEVABLE similarities between the ideas of some people (2006-2016) and my ideas (2002-2008) in physics (quantum mechanics, cosmology), cognitive neuroscience, philosophy of mind, and philosophy (this manuscript would require a REVOLUTION in international academy environment!)

    Get PDF
    (b2023 to 2014) The UNBELIEVABLE similarities between the ideas of some people (2006-2016) and my ideas (2002-2008) in physics (quantum mechanics, cosmology), cognitive neuroscience, philosophy of mind, and philosophy (this manuscript would require a REVOLUTION in international academy environment!

    Audiovisual speech perception in cochlear implant patients

    Get PDF
    Hearing with a cochlear implant (CI) is very different compared to a normal-hearing (NH) experience, as the CI can only provide limited auditory input. Nevertheless, the central auditory system is capable of learning how to interpret such limited auditory input such that it can extract meaningful information within a few months after implant switch-on. The capacity of the auditory cortex to adapt to new auditory stimuli is an example of intra-modal plasticity — changes within a sensory cortical region as a result of altered statistics of the respective sensory input. However, hearing deprivation before implantation and restoration of hearing capacities after implantation can also induce cross-modal plasticity — changes within a sensory cortical region as a result of altered statistics of a different sensory input. Thereby, a preserved cortical region can, for example, support a deprived cortical region, as in the case of CI users which have been shown to exhibit cross-modal visual-cortex activation for purely auditory stimuli. Before implantation, during the period of hearing deprivation, CI users typically rely on additional visual cues like lip-movements for understanding speech. Therefore, it has been suggested that CI users show a pronounced binding of the auditory and visual systems, which may allow them to integrate auditory and visual speech information more efficiently. The projects included in this thesis investigate auditory, and particularly audiovisual speech processing in CI users. Four event-related potential (ERP) studies approach the matter from different perspectives, each with a distinct focus. The first project investigates how audiovisually presented syllables are processed by CI users with bilateral hearing loss compared to NH controls. Previous ERP studies employing non-linguistic stimuli and studies using different neuroimaging techniques found distinct audiovisual interactions in CI users. However, the precise timecourse of cross-modal visual-cortex recruitment and enhanced audiovisual interaction for speech related stimuli is unknown. With our ERP study we fill this gap, and we present differences in the timecourse of audiovisual interactions as well as in cortical source configurations between CI users and NH controls. The second study focuses on auditory processing in single-sided deaf (SSD) CI users. SSD CI patients experience a maximally asymmetric hearing condition, as they have a CI on one ear and a contralateral NH ear. Despite the intact ear, several behavioural studies have demonstrated a variety of beneficial effects of restoring binaural hearing, but there are only few ERP studies which investigate auditory processing in SSD CI users. Our study investigates whether the side of implantation affects auditory processing and whether auditory processing via the NH ear of SSD CI users works similarly as in NH controls. Given the distinct hearing conditions of SSD CI users, the question arises whether there are any quantifiable differences between CI user with unilateral hearing loss and bilateral hearing loss. In general, ERP studies on SSD CI users are rather scarce, and there is no study on audiovisual processing in particular. Furthermore, there are no reports on lip-reading abilities of SSD CI users. To this end, in the third project we extend the first study by including SSD CI users as a third experimental group. The study discusses both differences and similarities between CI users with bilateral hearing loss and CI users with unilateral hearing loss as well as NH controls and provides — for the first time — insights into audiovisual interactions in SSD CI users. The fourth project investigates the influence of background noise on audiovisual interactions in CI users and whether a noise-reduction algorithm can modulate these interactions. It is known that in environments with competing background noise listeners generally rely more strongly on visual cues for understanding speech and that such situations are particularly difficult for CI users. As shown in previous auditory behavioural studies, the recently introduced noise-reduction algorithm "ForwardFocus" can be a useful aid in such cases. However, the questions whether employing the algorithm is beneficial in audiovisual conditions as well and whether using the algorithm has a measurable effect on cortical processing have not been investigated yet. In this ERP study, we address these questions with an auditory and audiovisual syllable discrimination task. Taken together, the projects included in this thesis contribute to a better understanding of auditory and especially audiovisual speech processing in CI users, revealing distinct processing strategies employed to overcome the limited input provided by a CI. The results have clinical implications, as they suggest that clinical hearing assessments, which are currently purely auditory, should be extended to audiovisual assessments. Furthermore, they imply that rehabilitation including audiovisual training methods may be beneficial for all CI user groups for quickly achieving the most effective CI implantation outcome

    Perceptual Model-Driven Authoring of Plausible Vibrations from User Expectations for Virtual Environments

    Get PDF
    One of the central goals of design is the creation of experiences that are rated favorably in the intended application context. User expectations play an integral role in tactile product quality and tactile plausibility judgments alike. In the vibrotactile authoring process for virtual environments, vibra-tion is created to match the user’s expectations of the presented situational context. Currently, inefficient trial and error approaches attempt to match expectations implicitly. A more efficient, model-driven procedure based explicitly on tactile user expectations would thus be beneficial for author-ing vibrations. In everyday life, we are frequently exposed to various whole-body vibrations. Depending on their temporal and spectral proper-ties we intuitively associate specific perceptual properties such as “tin-gling”. This suggests a systematic relationship between physical parame-ters and perceptual properties. To communicate with potential users about such elicited or expected tactile properties, a standardized design language is proposed. It contains a set of sensory tactile perceptual attributes, which are sufficient to characterize the perceptual space of vibration encountered in everyday life. This design language enables the assessment of quantita-tive tactile perceptual specifications by laypersons that are elicited in situational contexts such as auditory-visual-tactile vehicle scenes. Howev-er, such specifications can also be assessed by providing only verbal de-scriptions of the content of these scenes. Quasi identical ratings observed for both presentation modes suggest that tactile user expectations can be quantified even before any vibration is presented. Such expected perceptu-al specifications are the prerequisite for a subsequent translation into phys-ical vibration parameters. Plausibility can be understood as a similarity judgment between elicited features and expected features. Thus, plausible vibration can be synthesized by maximizing the similarity of the elicited perceptual properties to the expected perceptual properties. Based on the observed relationships between vibration parameters and sensory tactile perceptual attributes, a 1-nearest-neighbor model and a regression model were built. The plausibility of the vibrations synthesized by these models in the context of virtual auditory-visual-tactile vehicle scenes was validat-ed in a perceptual study. The results demonstrated that the perceptual spec-ifications obtained with the design language are sufficient to synthesize vibrations, which are perceived as equally plausible as recorded vibrations in a given situational context. Overall, the demonstrated design method can be a new, more efficient tool for designers authoring vibrations for virtual environments or creating tactile feedback. The method enables further automation of the design process and thus potential time and cost reductions.:Preface III Abstract V Zusammenfassung VII List of Abbreviations XV 1 Introduction 1 1.1 General Introduction 1 1.1 Objectives of the Thesis 4 1.2 Structure of the Thesis 4 2. Tactile Perception in Real and Virtual Environments 7 2.1 Tactile Perception as a Multilayered Process 7 2.1.1 Physical Layer 8 2.1.2 Mechanoreceptor Layer 9 2.1.3 Sensory Layer 19 2.1.4 Affective Layer 26 2.2 Perception of Virtual Environments 29 2.2.1 The Place Illusion 29 2.2.2 The Plausibility Illusion 31 2.3 Approaches for the Authoring of Vibrations 38 2.3.1 Approaches on the Physical Layer 38 2.3.2 Approaches on the Mechanoreceptor Layer 40 2.3.3 Approaches on the Sensory Layer 40 2.3.4 Approaches on the Affective Layer 43 2.4 Summary 43 3. Research Concept 47 3.1 Research Questions 47 3.1.1 Foundations of the Research Concept 47 3.1.2 Research Concept 49 3.2 Limitations 50 4. Development of the Experimental Setup 53 4.1 Hardware 53 4.1.1 Optical Reproduction System 53 4.1.2 Acoustical Reproduction System 54 4.1.3 Whole-Body Vibration Reproduction System 56 4.2 Software 64 4.2.1 Combination of Reproduction Systems for Unimodal and Multimodal Presentation 64 4.2.2 Conducting Perceptual Studies 65 5. Assessment of a Sensory Tactile Design Language for Characterizing Vibration 67 5.1.1 Design Language Requirements 67 5.1.2 Method to Assess the Design Language 69 5.1.3 Goals of this Chapter 70 5.2 Tactile Stimuli 72 5.2.1 Generalization into Excitation Patterns 72 5.2.2 Definition of Parameter Values of the Excitation Patterns 75 5.2.3 Generation of the Stimuli 85 5.2.4 Summary 86 5.3 Assessment of the most relevant Sensory Tactile Perceptual Attributes 86 5.3.1 Experimental Design 87 5.3.2 Participants 88 5.3.3 Results 88 5.3.4 Aggregation and Prioritization 89 5.3.5 Summary 91 5.4 Identification of the Attributes forming the Design Language 92 5.4.1 Experimental Design 93 5.4.2 Participants 95 5.4.3 Results 95 5.4.4 Selecting the Elements of the Sensory Tactile Design Language 106 5.4.5 Summary 109 5.5 Summary and Discussion 109 5.5.1 Summary 109 5.5.2 Discussion 111 6. Quantification of Expected Properties with the Sensory Tactile Design Language 115 6.1 Multimodal Stimuli 116 6.1.1 Selection of the Scenes 116 6.1.2 Recording of the Scenes 117 6.1.3 Recorded Stimuli 119 6.2 Qualitative Communication in the Presence of Vibration 123 6.2.1 Experimental Design 123 6.2.2 Participants 124 6.2.3 Results 124 6.2.4 Summary 126 6.3 Quantitative Communication in the Presence of Vibration 126 6.3.1 Experimental Design 127 6.3.2 Participants 127 6.3.3 Results 127 6.3.4 Summary 129 6.4 Quantitative Communication in the Absence of Vibration 129 6.4.1 Experimental Design 130 6.4.2 Participants 132 6.4.3 Results 132 6.4.4 Summary 134 6.5 Summary and Discussion 135 7. Synthesis Models for the Translation of Sensory Tactile Properties into Vibration 137 7.1 Formalization of the Tactile Plausibility Illusion for Models 139 7.1.1 Formalization of Plausibility 139 7.1.2 Model Boundaries 143 7.2 Investigation of the Influence of Vibration Level on Attribute Ratings 144 7.2.1 Stimuli 145 7.2.2 Experimental Design 145 7.2.3 Participants 146 7.2.4 Results 146 7.2.5 Summary 148 7.3 Comparison of Modulated Vibration to Successive Impulse-like Vibration 148 7.3.1 Stimuli 149 7.3.2 Experimental Design 151 7.3.3 Participants 151 7.3.4 Results 151 7.3.5 Summary 153 7.4 Synthesis Based on the Discrete Estimates of a k-Nearest-Neighbor Classifier 153 7.4.1 Definition of the K-Nearest-Neighbor Classifier 154 7.4.2 Analysis Model 155 7.4.3 Synthesis Model 156 7.4.4 Interpolation of acceleration level for the vibration attribute profile pairs 158 7.4.5 Implementation of the Synthesis 159 7.4.6 Advantages and Disadvantages 164 7.5 Synthesis Based on the Quasi-Continuous Estimates of Regression Models 166 7.5.1 Overall Model Structure 168 7.5.2 Classification of the Excitation Pattern with a Support Vector Machine 171 7.5.3 General Approach to the Regression Models of each Excitation Pattern 178 7.5.4 Synthesis for the Impulse-like Excitation Pattern 181 7.5.5 Synthesis for the Bandlimited White Gaussian Noise Excitation Pattern 187 7.5.6 Synthesis for the Amplitude Modulated Sinusoidal Excitation Pattern 193 7.5.7 Synthesis for the Sinusoidal Excitation Pattern 199 7.5.8 Implementation of the Synthesis 205 7.5.9 Advantages and Disadvantages of the Approach 208 7.6 Validation of the Synthesis Models 210 7.6.1 Stimuli 212 7.6.2 Experimental Design 212 7.6.3 Participants 214 7.6.4 Results 214 7.6.5 Summary 219 7.7 Summary and Discussion 219 7.7.1 Summary 219 7.7.2 Discussion 222 8. General Discussion and Outlook 227 Acknowledgment 237 References 237Eines der zentralen Ziele des Designs von Produkten oder virtuellen Um-gebungen ist die Schaffung von Erfahrungen, die im beabsichtigten An-wendungskontext die Erwartungen der Benutzer erfüllen. Gegenwärtig versucht man im vibrotaktilen Authoring-Prozess mit ineffizienten Trial-and-Error-Verfahren, die Erwartungen an den dargestellten, virtuellen Situationskontext implizit zu erfüllen. Ein effizienteres, modellgetriebenes Verfahren, das explizit auf den taktilen Benutzererwartungen basiert, wäre daher von Vorteil. Im Alltag sind wir häufig verschiedenen Ganzkörper-schwingungen ausgesetzt. Abhängig von ihren zeitlichen und spektralen Eigenschaften assoziieren wir intuitiv bestimmte Wahrnehmungsmerkmale wie z.B. “kribbeln”. Dies legt eine systematische Beziehung zwischen physikalischen Parametern und Wahrnehmungsmerkmalen nahe. Um mit potentiellen Nutzern über hervorgerufene oder erwartete taktile Eigen-schaften zu kommunizieren, wird eine standardisierte Designsprache vor-geschlagen. Sie enthält eine Menge von sensorisch-taktilen Wahrneh-mungsmerkmalen, die hinreichend den Wahrnehmungsraum der im Alltag auftretenden Vibrationen charakterisieren. Diese Entwurfssprache ermög-licht die quantitative Beurteilung taktiler Wahrnehmungsmerkmale, die in Situationskontexten wie z.B. auditiv-visuell-taktilen Fahrzeugszenen her-vorgerufen werden. Solche Wahrnehmungsspezifikationen können jedoch auch bewertet werden, indem der Inhalt dieser Szenen verbal beschrieben wird. Quasi identische Bewertungen für beide Präsentationsmodi deuten darauf hin, dass die taktilen Benutzererwartungen quantifiziert werden können, noch bevor eine Vibration präsentiert wird. Die erwarteten Wahr-nehmungsspezifikationen sind die Voraussetzung für eine anschließende Übersetzung in physikalische Schwingungsparameter. Plausible Vibratio-nen können synthetisiert werden, indem die erwarteten Wahrnehmungs-merkmale hervorgerufen werden. Auf der Grundlage der beobachteten Beziehungen zwischen Schwingungs¬parametern und sensorisch-taktilen Wahrnehmungsmerkmalen wurden ein 1-Nearest-Neighbor-Modell und ein Regressionsmodell erstellt. Die Plausibilität der von diesen Modellen synthetisierten Schwingungen im Kontext virtueller, auditorisch-visuell-taktiler Fahrzeugszenen wurde in einer Wahrnehmungsstudie validiert. Die Ergebnisse zeigten, dass die mit der Designsprache gewonnenen Wahr-nehmungsspezifikationen ausreichen, um Schwingungen zu synthetisieren, die in einem gegebenen Situationskontext als ebenso plausibel empfunden werden wie aufgezeichnete Schwingungen. Die demonstrierte Entwurfsme-thode stellt ein neues, effizienteres Werkzeug für Designer dar, die Schwingungen für virtuelle Umgebungen erstellen oder taktiles Feedback für Produkte erzeugen.:Preface III Abstract V Zusammenfassung VII List of Abbreviations XV 1 Introduction 1 1.1 General Introduction 1 1.1 Objectives of the Thesis 4 1.2 Structure of the Thesis 4 2. Tactile Perception in Real and Virtual Environments 7 2.1 Tactile Perception as a Multilayered Process 7 2.1.1 Physical Layer 8 2.1.2 Mechanoreceptor Layer 9 2.1.3 Sensory Layer 19 2.1.4 Affective Layer 26 2.2 Perception of Virtual Environments 29 2.2.1 The Place Illusion 29 2.2.2 The Plausibility Illusion 31 2.3 Approaches for the Authoring of Vibrations 38 2.3.1 Approaches on the Physical Layer 38 2.3.2 Approaches on the Mechanoreceptor Layer 40 2.3.3 Approaches on the Sensory Layer 40 2.3.4 Approaches on the Affective Layer 43 2.4 Summary 43 3. Research Concept 47 3.1 Research Questions 47 3.1.1 Foundations of the Research Concept 47 3.1.2 Research Concept 49 3.2 Limitations 50 4. Development of the Experimental Setup 53 4.1 Hardware 53 4.1.1 Optical Reproduction System 53 4.1.2 Acoustical Reproduction System 54 4.1.3 Whole-Body Vibration Reproduction System 56 4.2 Software 64 4.2.1 Combination of Reproduction Systems for Unimodal and Multimodal Presentation 64 4.2.2 Conducting Perceptual Studies 65 5. Assessment of a Sensory Tactile Design Language for Characterizing Vibration 67 5.1.1 Design Language Requirements 67 5.1.2 Method to Assess the Design Language 69 5.1.3 Goals of this Chapter 70 5.2 Tactile Stimuli 72 5.2.1 Generalization into Excitation Patterns 72 5.2.2 Definition of Parameter Values of the Excitation Patterns 75 5.2.3 Generation of the Stimuli 85 5.2.4 Summary 86 5.3 Assessment of the most relevant Sensory Tactile Perceptual Attributes 86 5.3.1 Experimental Design 87 5.3.2 Participants 88 5.3.3 Results 88 5.3.4 Aggregation and Prioritization 89 5.3.5 Summary 91 5.4 Identification of the Attributes forming the Design Language 92 5.4.1 Experimental Design 93 5.4.2 Participants 95 5.4.3 Results 95 5.4.4 Selecting the Elements of the Sensory Tactile Design Language 106 5.4.5 Summary 109 5.5 Summary and Discussion 109 5.5.1 Summary 109 5.5.2 Discussion 111 6. Quantification of Expected Properties with the Sensory Tactile Design Language 115 6.1 Multimodal Stimuli 116 6.1.1 Selection of the Scenes 116 6.1.2 Recording of the Scenes 117 6.1.3 Recorded Stimuli 119 6.2 Qualitative Communication in the Presence of Vibration 123 6.2.1 Experimental Design 123 6.2.2 Participants 124 6.2.3 Results 124 6.2.4 Summary 126 6.3 Quantitative Communication in the Presence of Vibration 126 6.3.1 Experimental Design 127 6.3.2 Participants 127 6.3.3 Results 127 6.3.4 Summary 129 6.4 Quantitative Communication in the Absence of Vibration 129 6.4.1 Experimental Design 130 6.4.2 Participants 132 6.4.3 Results 132 6.4.4 Summary 134 6.5 Summary and Discussion 135 7. Synthesis Models for the Translation of Sensory Tactile Properties into Vibration 137 7.1 Formalization of the Tactile Plausibility Illusion for Models 139 7.1.1 Formalization of Plausibility 139 7.1.2 Model Boundaries 143 7.2 Investigation of the Influence of Vibration Level on Attribute Ratings 144 7.2.1 Stimuli 145 7.2.2 Experimental Design 145 7.2.3 Participants 146 7.2.4 Results 146 7.2.5 Summary 148 7.3 Comparison of Modulated Vibration to Successive Impulse-like Vibration 148 7.3.1 Stimuli 149 7.3.2 Experimental Design 151 7.3.3 Participants 151 7.3.4 Results 151 7.3.5 Summary 153 7.4 Synthesis Based on the Discrete Estimates of a k-Nearest-Neighbor Classifier 153 7.4.1 Definition of the K-Nearest-Neighbor Classifier 154 7.4.2 Analysis Model 155 7.4.3 Synthesis Model 156 7.4.4 Interpolation of acceleration level for the vibration attribute profile pairs 158 7.4.5 Implementation of the Synthesis 159 7.4.6 Advantages and Disadvantages 164 7.5 Synthesis Based on the Quasi-Continuous Estimates of Regression Models 166 7.5.1 Overall Model Structure 168 7.5.2 Classification of the Excitation Pattern with a Support Vector Machine 171 7.5.3 General Approach to the Regression Models of each Excitation Pattern 178 7.5.4 Synthesis for the Impulse-like Excitation Pattern 181 7.5.5 Synthesis for the Bandlimited White Gaussian Noise Excitation Pattern 187 7.5.6 Synthesis for the Amplitude Modulated Sinusoidal Excitation Pattern 193 7.5.7 Synthesis for the Sinusoidal Excitation Pattern 199 7.5.8 Implementation of the Synthesis 205 7.5.9 Advantages and Disadvantages of the Approach 208 7.6 Validation of the Synthesis Models 210 7.6.1 Stimuli 212 7.6.2 Experimental Design 212 7.6.3 Participants 214 7.6.4 Results 214 7.6.5 Summary 219 7.7 Summary and Discussion 219 7.7.1 Summary 219 7.7.2 Discussion 222 8. General Discussion and Outlook 227 Acknowledgment 237 References 23

    Proceedings of the 8th Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE 2023)

    Get PDF
    This volume gathers the papers presented at the Detection and Classification of Acoustic Scenes and Events 2023 Workshop (DCASE2023), Tampere, Finland, during 21–22 September 2023

    Perceptually Motivated, Intelligent Audio Mixing Approaches for Hearing Loss

    Get PDF
    The growing population of listeners with hearing loss, along with the limitations of current audio enhancement solutions, have created the need for novel approaches that take into consideration the perceptual aspects of hearing loss, while taking advantage of the benefits produced by intelligent audio mixing. The aim of this thesis is to explore perceptually motivated intelligent approaches to audio mixing for listeners with hearing loss, through the development of a hearing loss simulation and its use as a referencing tool in automatic audio mixing. To achieve this aim, a real-time hearing loss simulation was designed and tested for its accuracy and effectiveness through the conduction of listening studies with participants with real and simulated hearing loss. The simulation was then used by audio engineering students and professionals during mixing, in order to provide information on the techniques and practices used by engineers to combat the effects of hearing loss while mixing content through the simulation. The extracted practices were then used to inform the following automatic mixing approaches: a deep learning approach utilising a differentiable digital signal processing architecture, a knowledge-based approach to gain mixing utilising fuzzy logic, a genetic algorithm approach to equalisation and finally a combined system of the fuzzy mixer and genetic equaliser. The outputs of all four systems were analysed, and each approach’s strengths and weaknesses were discussed in the thesis. The results of this work present the potential of integrating perceptual information into intelligent audio mixing production for hearing loss, paving the way for further exploration of this approach’s capabilities

    Configurable EBEN: Extreme Bandwidth Extension Network to enhance body-conducted speech capture

    Full text link
    This paper presents a configurable version of Extreme Bandwidth Extension Network (EBEN), a Generative Adversarial Network (GAN) designed to improve audio captured with body-conduction microphones. We show that although these microphones significantly reduce environmental noise, this insensitivity to ambient noise happens at the expense of the bandwidth of the speech signal acquired by the wearer of the devices. The obtained captured signals therefore require the use of signal enhancement techniques to recover the full-bandwidth speech. EBEN leverages a configurable multiband decomposition of the raw captured signal. This decomposition allows the data time domain dimensions to be reduced and the full band signal to be better controlled. The multiband representation of the captured signal is processed through a U-Net-like model, which combines feature and adversarial losses to generate an enhanced speech signal. We also benefit from this original representation in the proposed configurable discriminators architecture. The configurable EBEN approach can achieve state-of-the-art enhancement results on synthetic data with a lightweight generator that allows real-time processing.Comment: Accepted in IEEE/ACM Transactions on Audio, Speech and Language Processing on 14/08/202
    • …
    corecore