17 research outputs found
An improved StarGAN for emotional voice conversion: enhancing voice quality and data augmentation
Emotional Voice Conversion (EVC) aims to convert the emotional style of a
source speech signal to a target style while preserving its content and speaker
identity information. Previous emotional conversion studies do not disentangle
emotional information from emotion-independent information that should be
preserved, thus transforming it all in a monolithic manner and generating audio
of low quality, with linguistic distortions. To address this distortion
problem, we propose a novel StarGAN framework along with a two-stage training
process that separates emotional features from those independent of emotion by
using an autoencoder with two encoders as the generator of the Generative
Adversarial Network (GAN). The proposed model achieves favourable results in
both the objective evaluation and the subjective evaluation in terms of
distortion, which reveals that the proposed model can effectively reduce
distortion. Furthermore, in data augmentation experiments for end-to-end speech
emotion recognition, the proposed StarGAN model achieves an increase of 2% in
Micro-F1 and 5% in Macro-F1 compared to the baseline StarGAN model, which
indicates that the proposed model is more valuable for data augmentation.Comment: Accepted by Interspeech 202
An Overview of Affective Speech Synthesis and Conversion in the Deep Learning Era
Speech is the fundamental mode of human communication, and its synthesis has
long been a core priority in human-computer interaction research. In recent
years, machines have managed to master the art of generating speech that is
understandable by humans. But the linguistic content of an utterance
encompasses only a part of its meaning. Affect, or expressivity, has the
capacity to turn speech into a medium capable of conveying intimate thoughts,
feelings, and emotions -- aspects that are essential for engaging and
naturalistic interpersonal communication. While the goal of imparting
expressivity to synthesised utterances has so far remained elusive, following
recent advances in text-to-speech synthesis, a paradigm shift is well under way
in the fields of affective speech synthesis and conversion as well. Deep
learning, as the technology which underlies most of the recent advances in
artificial intelligence, is spearheading these efforts. In the present
overview, we outline ongoing trends and summarise state-of-the-art approaches
in an attempt to provide a comprehensive overview of this exciting field.Comment: Submitted to the Proceedings of IEE
Personalised depression forecasting using mobile sensor data and ecological momentary assessment
Introduction
Digital health interventions are an effective way to treat depression, but it is still largely unclear how patients’ individual symptoms evolve dynamically during such treatments. Data-driven forecasts of depressive symptoms would allow to greatly improve the personalisation of treatments. In current forecasting approaches, models are often trained on an entire population, resulting in a general model that works overall, but does not translate well to each individual in clinically heterogeneous, real-world populations. Model fairness across patient subgroups is also frequently overlooked. Personalised models tailored to the individual patient may therefore be promising.
Methods
We investigate different personalisation strategies using transfer learning, subgroup models, as well as subject-dependent standardisation on a newly-collected, longitudinal dataset of depression patients undergoing treatment with a digital intervention (N=65 patients recruited). Both passive mobile sensor data as well as ecological momentary assessments were available for modelling. We evaluated the models’ ability to predict symptoms of depression (Patient Health Questionnaire-2; PHQ-2) at the end of each day, and to forecast symptoms of the next day.
Results
In our experiments, we achieve a best mean-absolute-error (MAE) of 0.801 (25% improvement) for predicting PHQ-2 values at the end of the day with subject-dependent standardisation compared to a non-personalised baseline (MAE=1.062). For one day ahead-forecasting, we can improve the baseline of 1.539 by 12% to a MAE of 1.349 using a transfer learning approach with shared common layers. In addition, personalisation leads to fairer models at group-level.
Discussion
Our results suggest that personalisation using subject-dependent standardisation and transfer learning can improve predictions and forecasts, respectively, of depressive symptoms in participants of a digital depression intervention. We discuss technical and clinical limitations of this approach, avenues for future investigations, and how personalised machine learning architectures may be implemented to improve existing digital interventions for depression
A robust welding seam identification method
Khyam, MO ORCiD: 0000-0002-1988-2328As an automatic welding process may experience some disturbances caused by, e.g., splashes and/or welding fumes, misalignments/poor positioning, thermally induced deformations, strong arc lights, diversified welding joints/grooves, etc., precisely identifying the welding seam has an great influence on the welding quality achieved. In this paper, a robust method for identifying this seam is proposed. Firstly, after a welding image obtained from a/the structured-light vision sensor is filtered, in a sufficiently small area, the extended Kalman filter (EKF) is used to search for the/its laser stripe in order to prevent possible disturbances. Secondly, to realize the extraction of the profile of welding seam, the least square method is used to fit a sequence of centroids determined by the scanning result of columns displayed on the tracking window. Thirdly, this profile is then qualitatively described and matched using a proposed character string method. Finally, the advantages of this method are compared with those of other approaches through repeated experiments
A welding seam identification method based on cross-modal perception
Khyam, MO ORCiD: 0000-0002-1988-2328Purpose: As an automatic welding process may experience some disturbances caused by, for example, splashes and/or welding fumes, misalignments/poor positioning, thermally induced deformations, strong arc lights and diversified welding joints/grooves, precisely identifying the welding seam has a great influence on the welding quality. This paper aims to propose a robust method for identifying this seam based on cross-modal perception. Design/methodology/approach: First, after a welding image obtained from a structured-light vision sensor (here laser and vision are integrated into a cross-modal perception sensor) is filtered, in a sufficiently small area, the extended Kalman filter is used to prevent possible disturbances to search for its laser stripe. Second, to realize the extraction of the profile of welding seam, the least square method is used to fit a sequence of centroids determined by the scanning result of columns displayed on the tracking window. Third, this profile is then qualitatively described and matched using a proposed character string method. Findings: It is demonstrated that it maintains real time and is clearly superior in terms of accuracy and robustness, though its real-time performance is not the best. Originality/value: This paper proposes a robust method for automatically identifying and tracking a welding seam
Catalytic Application and Mechanism Studies of Argentic Chloride Coupled Ag/Au Hollow Heterostructures: Considering the Interface Between Ag/Au Bimetals
Abstract For an economical use of solar energy, photocatalysts that are sufficiently efficient, stable, and capable of harvesting light are required. Composite heterostructures composed of noble metals and semiconductors exhibited the excellent in catalytic application. Here, 1D Ag/Au/AgCl hollow heterostructures are synthesized by galvanic replacement reaction (GRR) from Ag nanowires (NWs). The catalytic properties of these as-obtained Ag/Au/AgCl hollow heterostructures with different ratios are investigated by reducing 4-nitrophenol (Nip) into 4-aminophenol (Amp) in the presence of NaBH4, and the influence of AgCl semiconductor to the catalytic performances of Ag/Au bimetals is also investigated. These hollow heterostructures show the higher catalytic properties than pure Ag NWs, and the AgCl not only act as supporting materials, but the excess AgCl is also the obstacle for contact of Ag/Au bimetals with reactive species. Moreover, the photocatalytic performances of these hollow heterostructures are carried out by degradation of acid orange 7 (AO7) under UV and visible light. These Ag/Au/AgCl hollow heterostructures present the higher photocatalytic activities than pure Ag NWs and commercial TiO2 (P25), and the Ag/Au bimetals enhance the photocatalytic activity of AgCl semiconductor via the localized surface plasmon resonance (LSPR) and plasmon resonance energy transfer (PRET) mechanisms. The as-synthesized 1D Ag/Au/AgCl hollow heterostructures with multifunction could apply in practical environmental remedy by catalytic manners. Graphical abstrac
Recommended from our members
Cu Promoted the Dynamic Evolution of Ni-Based Catalysts for Polyethylene Terephthalate Plastic Upcycling.
Upcycling plastic wastes into value-added chemicals is a promising approach to put end-of-life plastic wastes back into their ecocycle. As one of the polyesters that is used daily, polyethylene terephthalate (PET) plastic waste is employed here as the model substrate. Herein, a nickel (Ni)-based catalyst was prepared via electrochemically depositing copper (Cu) species on Ni foam (NiCu/NF). The NiCu/NF formed Cu/CuO and Ni/NiO/Ni(OH)2 core-shell structures before electrolysis and reconstructed into NiOOH and CuOOH/Cu(OH)2 active species during the ethylene glycol (EG) oxidation. After oxidation, the Cu and Ni species evolved into more reduced species. An indirect mechanism was identified as the main EG oxidation (EGOR) mechanism. In EGOR, NiCu60s/NF catalyst exhibited an optimal Faradaic efficiency (FE, 95.8%) and yield rate (0.70 mmol cm-2 h-1) for formate production. Also, over 80% FE of formate was achieved when a commercial PET plastic powder hydrolysate was applied. Furthermore, commercial PET plastic water bottle waste was employed as a substrate for electrocatalytic upcycling, and pure terephthalic acid (TPA) was recovered only after 1 h electrolysis. Lastly, density functional theory (DFT) calculation revealed that the key role of Cu was significantly reducing the Gibbs free-energy barrier (ΔG) of EGORs rate-determining step (RDS), promoting catalysts dynamic evolution, and facilitating the C-C bond cleavage
Zinc Single Atom Confinement Effects on Catalysis in 1T-Phase Molybdenum Disulfide
Active sites are atomic sites within catalysts that drive reactions and are essential for catalysis. Spatially confining guest metals within active site microenvironments has been predicted to improve catalytic activity by altering the electronic states of active sites. Using the hydrogen evolution reaction (HER) as a model reaction, we show that intercalating zinc single atoms between layers of 1T-MoS2 (Zn SAs/1T-MoS2) enhances HER performance by decreasing the overpotential, charge transfer resistance, and kinetic barrier. The confined Zn atoms tetrahedrally coordinate to basal sulfur (S) atoms and expand the interlayer spacing of 1T-MoS2 by ∼3.4%. Under confinement, the Zn SAs donate electrons to coordinated S atoms, which lowers the free energy barrier of H* adsorption-desorption and enhances HER kinetics. In this work, which is applicable to all types of catalytic reactions and layered materials, HER performance is enhanced by controlling the coordination geometry and electronic states of transition metals confined within active-site microenvironments