17 research outputs found

    An improved StarGAN for emotional voice conversion: enhancing voice quality and data augmentation

    Get PDF
    Emotional Voice Conversion (EVC) aims to convert the emotional style of a source speech signal to a target style while preserving its content and speaker identity information. Previous emotional conversion studies do not disentangle emotional information from emotion-independent information that should be preserved, thus transforming it all in a monolithic manner and generating audio of low quality, with linguistic distortions. To address this distortion problem, we propose a novel StarGAN framework along with a two-stage training process that separates emotional features from those independent of emotion by using an autoencoder with two encoders as the generator of the Generative Adversarial Network (GAN). The proposed model achieves favourable results in both the objective evaluation and the subjective evaluation in terms of distortion, which reveals that the proposed model can effectively reduce distortion. Furthermore, in data augmentation experiments for end-to-end speech emotion recognition, the proposed StarGAN model achieves an increase of 2% in Micro-F1 and 5% in Macro-F1 compared to the baseline StarGAN model, which indicates that the proposed model is more valuable for data augmentation.Comment: Accepted by Interspeech 202

    An Overview of Affective Speech Synthesis and Conversion in the Deep Learning Era

    Get PDF
    Speech is the fundamental mode of human communication, and its synthesis has long been a core priority in human-computer interaction research. In recent years, machines have managed to master the art of generating speech that is understandable by humans. But the linguistic content of an utterance encompasses only a part of its meaning. Affect, or expressivity, has the capacity to turn speech into a medium capable of conveying intimate thoughts, feelings, and emotions -- aspects that are essential for engaging and naturalistic interpersonal communication. While the goal of imparting expressivity to synthesised utterances has so far remained elusive, following recent advances in text-to-speech synthesis, a paradigm shift is well under way in the fields of affective speech synthesis and conversion as well. Deep learning, as the technology which underlies most of the recent advances in artificial intelligence, is spearheading these efforts. In the present overview, we outline ongoing trends and summarise state-of-the-art approaches in an attempt to provide a comprehensive overview of this exciting field.Comment: Submitted to the Proceedings of IEE

    Personalised depression forecasting using mobile sensor data and ecological momentary assessment

    Get PDF
    Introduction Digital health interventions are an effective way to treat depression, but it is still largely unclear how patients’ individual symptoms evolve dynamically during such treatments. Data-driven forecasts of depressive symptoms would allow to greatly improve the personalisation of treatments. In current forecasting approaches, models are often trained on an entire population, resulting in a general model that works overall, but does not translate well to each individual in clinically heterogeneous, real-world populations. Model fairness across patient subgroups is also frequently overlooked. Personalised models tailored to the individual patient may therefore be promising. Methods We investigate different personalisation strategies using transfer learning, subgroup models, as well as subject-dependent standardisation on a newly-collected, longitudinal dataset of depression patients undergoing treatment with a digital intervention (N=65 patients recruited). Both passive mobile sensor data as well as ecological momentary assessments were available for modelling. We evaluated the models’ ability to predict symptoms of depression (Patient Health Questionnaire-2; PHQ-2) at the end of each day, and to forecast symptoms of the next day. Results In our experiments, we achieve a best mean-absolute-error (MAE) of 0.801 (25% improvement) for predicting PHQ-2 values at the end of the day with subject-dependent standardisation compared to a non-personalised baseline (MAE=1.062). For one day ahead-forecasting, we can improve the baseline of 1.539 by 12% to a MAE of 1.349 using a transfer learning approach with shared common layers. In addition, personalisation leads to fairer models at group-level. Discussion Our results suggest that personalisation using subject-dependent standardisation and transfer learning can improve predictions and forecasts, respectively, of depressive symptoms in participants of a digital depression intervention. We discuss technical and clinical limitations of this approach, avenues for future investigations, and how personalised machine learning architectures may be implemented to improve existing digital interventions for depression

    A robust welding seam identification method

    No full text
    Khyam, MO ORCiD: 0000-0002-1988-2328As an automatic welding process may experience some disturbances caused by, e.g., splashes and/or welding fumes, misalignments/poor positioning, thermally induced deformations, strong arc lights, diversified welding joints/grooves, etc., precisely identifying the welding seam has an great influence on the welding quality achieved. In this paper, a robust method for identifying this seam is proposed. Firstly, after a welding image obtained from a/the structured-light vision sensor is filtered, in a sufficiently small area, the extended Kalman filter (EKF) is used to search for the/its laser stripe in order to prevent possible disturbances. Secondly, to realize the extraction of the profile of welding seam, the least square method is used to fit a sequence of centroids determined by the scanning result of columns displayed on the tracking window. Thirdly, this profile is then qualitatively described and matched using a proposed character string method. Finally, the advantages of this method are compared with those of other approaches through repeated experiments

    A welding seam identification method based on cross-modal perception

    No full text
    Khyam, MO ORCiD: 0000-0002-1988-2328Purpose: As an automatic welding process may experience some disturbances caused by, for example, splashes and/or welding fumes, misalignments/poor positioning, thermally induced deformations, strong arc lights and diversified welding joints/grooves, precisely identifying the welding seam has a great influence on the welding quality. This paper aims to propose a robust method for identifying this seam based on cross-modal perception. Design/methodology/approach: First, after a welding image obtained from a structured-light vision sensor (here laser and vision are integrated into a cross-modal perception sensor) is filtered, in a sufficiently small area, the extended Kalman filter is used to prevent possible disturbances to search for its laser stripe. Second, to realize the extraction of the profile of welding seam, the least square method is used to fit a sequence of centroids determined by the scanning result of columns displayed on the tracking window. Third, this profile is then qualitatively described and matched using a proposed character string method. Findings: It is demonstrated that it maintains real time and is clearly superior in terms of accuracy and robustness, though its real-time performance is not the best. Originality/value: This paper proposes a robust method for automatically identifying and tracking a welding seam

    Catalytic Application and Mechanism Studies of Argentic Chloride Coupled Ag/Au Hollow Heterostructures: Considering the Interface Between Ag/Au Bimetals

    No full text
    Abstract For an economical use of solar energy, photocatalysts that are sufficiently efficient, stable, and capable of harvesting light are required. Composite heterostructures composed of noble metals and semiconductors exhibited the excellent in catalytic application. Here, 1D Ag/Au/AgCl hollow heterostructures are synthesized by galvanic replacement reaction (GRR) from Ag nanowires (NWs). The catalytic properties of these as-obtained Ag/Au/AgCl hollow heterostructures with different ratios are investigated by reducing 4-nitrophenol (Nip) into 4-aminophenol (Amp) in the presence of NaBH4, and the influence of AgCl semiconductor to the catalytic performances of Ag/Au bimetals is also investigated. These hollow heterostructures show the higher catalytic properties than pure Ag NWs, and the AgCl not only act as supporting materials, but the excess AgCl is also the obstacle for contact of Ag/Au bimetals with reactive species. Moreover, the photocatalytic performances of these hollow heterostructures are carried out by degradation of acid orange 7 (AO7) under UV and visible light. These Ag/Au/AgCl hollow heterostructures present the higher photocatalytic activities than pure Ag NWs and commercial TiO2 (P25), and the Ag/Au bimetals enhance the photocatalytic activity of AgCl semiconductor via the localized surface plasmon resonance (LSPR) and plasmon resonance energy transfer (PRET) mechanisms. The as-synthesized 1D Ag/Au/AgCl hollow heterostructures with multifunction could apply in practical environmental remedy by catalytic manners. Graphical abstrac

    Zinc Single Atom Confinement Effects on Catalysis in 1T-Phase Molybdenum Disulfide

    No full text
    Active sites are atomic sites within catalysts that drive reactions and are essential for catalysis. Spatially confining guest metals within active site microenvironments has been predicted to improve catalytic activity by altering the electronic states of active sites. Using the hydrogen evolution reaction (HER) as a model reaction, we show that intercalating zinc single atoms between layers of 1T-MoS2 (Zn SAs/1T-MoS2) enhances HER performance by decreasing the overpotential, charge transfer resistance, and kinetic barrier. The confined Zn atoms tetrahedrally coordinate to basal sulfur (S) atoms and expand the interlayer spacing of 1T-MoS2 by ∼3.4%. Under confinement, the Zn SAs donate electrons to coordinated S atoms, which lowers the free energy barrier of H* adsorption-desorption and enhances HER kinetics. In this work, which is applicable to all types of catalytic reactions and layered materials, HER performance is enhanced by controlling the coordination geometry and electronic states of transition metals confined within active-site microenvironments
    corecore