2,776 research outputs found
Diffusion-Based Mel-Spectrogram Enhancement for Personalized Speech Synthesis with Found Data
Creating synthetic voices with found data is challenging, as real-world
recordings often contain various types of audio degradation. One way to address
this problem is to pre-enhance the speech with an enhancement model and then
use the enhanced data for text-to-speech (TTS) model training. This paper
investigates the use of conditional diffusion models for generalized speech
enhancement, which aims at addressing multiple types of audio degradation
simultaneously. The enhancement is performed on the log Mel-spectrogram domain
to align with the TTS training objective. Text information is introduced as an
additional condition to improve the model robustness. Experiments on real-world
recordings demonstrate that the synthetic voice built on data enhanced by the
proposed model produces higher-quality synthetic speech, compared to those
trained on data enhanced by strong baselines. Code and pre-trained parameters
of the proposed enhancement model are available at
\url{https://github.com/dmse4tts/DMSE4TTS
Synthesis of 7-dehydrocholesterol through hexacarbonyl molybdenum catalyzed elimination reaction
The efficiency of hexacarbonyl molybdenum catalyzed elimination reaction of the allylic acetates has been improved by the presence of O,N-bis(trimethylsilyl) acetamide in the reaction medium. The methodology is particularly well employed for the elimination of 7-acetoxycholesterol-3-acetate(cholestrol-3,7-diacetate) for which the resulting product obtained was exclusively 5,7-homoannular diene(7-dehydrocholesterol-3-acetate). Good yield is achieved (up to 70 %) while decreasing the side products formation and reducing the costs as compared to the previously used procedures. Hexacarbonyl molybdenum elimination reaction is greatly influenced by the reaction temperature, at low as well as at high temperature low yield of the homoannular diene product is separated while at moderate conditions of temperature high products formation is observed. KEY WORDS: Hexacarbonyl molybdenum, Elimination, Deacetoxylation, 7-Dehydrocholesterol, BSA Bull. Chem. Soc. Ethiop. 2011, 25(2), 247-254
Reap success from persistence
The road to success is long and arduous. Almost all Nobel prize laureates experienced tremendous efforts and countless failures before they made their scientific breakthroughs. Hypothesis-driven, independent and critical thinking, passion, repeated experiments and repetitive failures and running in circles on the entire scientific process finally approved their hypotheses
An ASR-free Fluency Scoring Approach with Self-Supervised Learning
A typical fluency scoring system generally relies on an automatic speech
recognition (ASR) system to obtain time stamps in input speech for either the
subsequent calculation of fluency-related features or directly modeling speech
fluency with an end-to-end approach. This paper describes a novel ASR-free
approach for automatic fluency assessment using self-supervised learning (SSL).
Specifically, wav2vec2.0 is used to extract frame-level speech features,
followed by K-means clustering to assign a pseudo label (cluster index) to each
frame. A BLSTM-based model is trained to predict an utterance-level fluency
score from frame-level SSL features and the corresponding cluster indexes.
Neither speech transcription nor time stamp information is required in the
proposed system. It is ASR-free and can potentially avoid the ASR errors effect
in practice. Experimental results carried out on non-native English databases
show that the proposed approach significantly improves the performance in the
"open response" scenario as compared to previous methods and matches the
recently reported performance in the "read aloud" scenario.Comment: Accepted by ICASSP 202
Leveraging phone-level linguistic-acoustic similarity for utterance-level pronunciation scoring
Recent studies on pronunciation scoring have explored the effect of
introducing phone embeddings as reference pronunciation, but mostly in an
implicit manner, i.e., addition or concatenation of reference phone embedding
and actual pronunciation of the target phone as the phone-level pronunciation
quality representation. In this paper, we propose to use linguistic-acoustic
similarity to explicitly measure the deviation of non-native production from
its native reference for pronunciation assessment. Specifically, the deviation
is first estimated by the cosine similarity between reference phone embedding
and corresponding acoustic embedding. Next, a phone-level Goodness of
pronunciation (GOP) pre-training stage is introduced to guide this
similarity-based learning for better initialization of the aforementioned two
embeddings. Finally, a transformer-based hierarchical pronunciation scorer is
used to map a sequence of phone embeddings, acoustic embeddings along with
their similarity measures to predict the final utterance-level score.
Experimental results on the non-native databases suggest that the proposed
system significantly outperforms the baselines, where the acoustic and phone
embeddings are simply added or concatenated. A further examination shows that
the phone embeddings learned in the proposed approach are able to capture
linguistic-acoustic attributes of native pronunciation as reference.Comment: Accepted by ICASSP 202
Food protein-stabilized nanoemulsions as potential delivery systems for poorly water-soluble drugs: preparation, in vitro characterization, and pharmacokinetics in rats
Nanoemulsions stabilized by traditional emulsifiers raise toxicological concerns for long-term treatment. The present work investigates the potential of food proteins as safer stabilizers for nanoemulsions to deliver hydrophobic drugs. Nanoemulsions stabilized by food proteins (soybean protein isolate, whey protein isolate, β-lactoglobulin) were prepared by high-pressure homogenization. The toxicity of the nanoemulsions was tested in Caco-2 cells using the 3-(4,5-dimethylthiazol-2-yl)-2,5-diphenyltetrazoliumbromide viability assay. In vivo absorption in rats was also evaluated. Food protein-stabilized nanoemulsions, with small particle size and good size distribution, exhibited better stability and biocompatibility compared with nanoemulsions stabilized by traditional emulsifiers. Moreover, β-lactoglobulin had a better emulsifying capacity and biocompatibility than the other two food proteins. The pancreatic degradation of the proteins accelerated drug release. It is concluded that an oil/water nanoemulsion system with good biocompatibility can be prepared by using food proteins as emulsifiers, allowing better and more rapid absorption of lipophilic drugs
A mobile prototype-based localization approach using inertial navigation and acoustic tracking for underwater
During underwater operations, divers must determine their own trajectories using the Inertial Navigation System (INS) they carry to improve operational efficiency. However, the INS contains a sensor bias that is also incorporated into the quadratic integration process to obtain the displacement, resulting in trajectory drift of the divers during prolonged self-guidance. To overcome the above problem, other aids are needed to correct the accumulated error of the INS. The single-beacon Assisted Inertial Navigation (AIN) method can improve the flexibility of inertial error correction while simplifying the localization equipment, which is suitable for the INS cumulative error correction scenario of divers. However, most of the traditional single-beacon assisted correction methods do not consider the effect of acoustic line bending on hydroacoustic ranging, and at the same time, they do not consider the problem of singular or pathological coefficient matrices introduced by inertial navigation neighbor localization deviations. Based on the above two shortcomings, this paper uses the acoustic velocity profile for acoustic line tracking, combines the localization idea of Mobile Primitives (MP), and proposes an MP-based acoustic line tracking-Assisted Inertial Navigation Localization (AINL) method, which constructs a sliding time window (STW) by taking the historical positioning of divers as a virtual primitive, and combines the nonlinear optimization method for iterative optimization search as a means to improve the accuracy and stability of self-navigation of the divers
Electric field-induced transformations in bismuth sodium titanate-based materials
Electric field-induced transformations occur in a myriad of systems with a variegated phenomenology and have attracted widespread scientific interest due to their importance in many applications. The present review focuses on the electric field-induced transformations occurring in bismuth sodium titanate (BNT)-based materials, which are considered an important family of lead-free perovskites and represent possible alternatives to lead-based compounds for several applications. BNT-based systems are generally classified as relaxor ferroelectrics and are characterized by complex structures undergoing various electric field-driven phenomena. In this review, changes in crystal structure symmetry, domain configuration and macroscopic properties are discussed in relation to composition, temperature and electrical loading characteristics, including amplitude, frequency and DC biases. The coupling mechanisms between octahedral tilting with polarization and strain, and other microstructural features are identified as important factors mediating the local and overall electric field-induced response. The role of field-induced transformations on electrical fatigue is discussed by highlighting the effects of ergodicity on domain evolution and fatigue resistance in bipolar and unipolar cycles. The relevance of field-induced transformations in key applications, including energy storage capacitors, actuators, electrocaloric systems and photoluminescent devices is comprehensively discussed to identify materials design criteria. The review is concluded with an outlook for future research
- …