2,195 research outputs found
SC VALL-E: Style-Controllable Zero-Shot Text to Speech Synthesizer
Expressive speech synthesis models are trained by adding corpora with diverse
speakers, various emotions, and different speaking styles to the dataset, in
order to control various characteristics of speech and generate the desired
voice. In this paper, we propose a style control (SC) VALL-E model based on the
neural codec language model (called VALL-E), which follows the structure of the
generative pretrained transformer 3 (GPT-3). The proposed SC VALL-E takes input
from text sentences and prompt audio and is designed to generate controllable
speech by not simply mimicking the characteristics of the prompt audio but by
controlling the attributes to produce diverse voices. We identify tokens in the
style embedding matrix of the newly designed style network that represent
attributes such as emotion, speaking rate, pitch, and voice intensity, and
design a model that can control these attributes. To evaluate the performance
of SC VALL-E, we conduct comparative experiments with three representative
expressive speech synthesis models: global style token (GST) Tacotron2,
variational autoencoder (VAE) Tacotron2, and original VALL-E. We measure word
error rate (WER), F0 voiced error (FVE), and F0 gross pitch error (F0GPE) as
evaluation metrics to assess the accuracy of generated sentences. For comparing
the quality of synthesized speech, we measure comparative mean option score
(CMOS) and similarity mean option score (SMOS). To evaluate the style control
ability of the generated speech, we observe the changes in F0 and
mel-spectrogram by modifying the trained tokens. When using prompt audio that
is not present in the training data, SC VALL-E generates a variety of
expressive sounds and demonstrates competitive performance compared to the
existing models. Our implementation, pretrained models, and audio samples are
located on GitHub
Dynamical mean-field theory of Hubbard-Holstein model at half-filling: Zero temperature metal-insulator and insulator-insulator transitions
We study the Hubbard-Holstein model, which includes both the
electron-electron and electron-phonon interactions characterized by and
, respectively, employing the dynamical mean-field theory combined with
Wilson's numerical renormalization group technique. A zero temperature phase
diagram of metal-insulator and insulator-insulator transitions at half-filling
is mapped out which exhibits the interplay between and . As () is
increased, a metal to Mott-Hubbard insulator (bipolaron insulator) transition
occurs, and the two insulating states are distinct and can not be adiabatically
connected. The nature of and transitions between the three states are
discussed.Comment: 5 pages, 4 figures. Submitted to Physical Review Letter
Pyridoxine induced neuropathy by subcutaneous administration in dogs
To construct a sensory neuropathy model, excess pyridoxine (150 mg/kg s.i.d.) was injected subcutaneously in dogs over a period of 7 days. During the administrations period, the dogs experienced body weight reduction and proprioceptive loss involving the hindquarters. After pyridoxine administration was completed, electrophysiological recordings showed that the M wave remained at a normal state, but the H-reflex of the treated dogs disappeared at 7 days. The dorsal funiculus of L4 was disrupted irregularly in the axons and myelin with vacuolation. The dorsal root ganglia of L4, and sciatic and tibial nerves showed degenerative changes and vacuolation. However, the lateral and ventral funiculi of L4 showed a normal histopathologic pattern. Although this subcutaneous administration method did not cause systemic toxicity and effectively induced sensory neuropathy, this study confirmed the possibility of producing a pyridoxine-induced sensory neuropathy model in dogs with short-term administration
The Characteristics of Metallo-β-Lactamase-Producing Gram-Negative Bacilli Isolated from Sputum and Urine: A Single Center Experience in Korea
Metallo-β-lactamase (MBL) production usually results in high-level resistance to most β-lactams, and a rapid spread of MBL producing major gram-negative pathogens is a matter of particular concern worldwide. However, clinical data are scarce and most studies compared MBL producer (MP) with MBL non-producer (MNP) strains which included carbapenem susceptible isolates. Therefore, we collected clinical data of patients in whom imipenem-nonsusceptible Pseudomonas aeruginosa (PA) and Acinetobacter baumannii (AB) were isolated from sputum or urine, and investigated MBL production and the risk factors related with MBL acquisition. The antimicrobial susceptibility patterns were also compared between MPs and imipenem-nonsusceptible MNPs (INMNP). Among the 176 imipenem-nonsusceptible isolates, 12 MPs (6.8%) were identified. There was no identifiable risk factor that contributed to the acquisition of MPs when compared to INMNPs, and case-fatalities were not different between the two groups. The percentage of susceptible isolates was higher among MPs for piperacilin/tazobactam and fluoroquinolones while that of ceftazidime was higher in INMNPs (p < 0.05). As regards to aztreonam, which has been known to be a uniquely stable β-lactam against MBLs, susceptibility was preserved in only two isolates (16.7%) among MPs, and was not higher than that of INMNPs (23.2%). In conclusion, the contribution of MBLs to imipenem non-susceptibility in PA/ABs isolated from sputum and urine was relatively limited, and there was no significant risk factor associated with acquisition of MPs compared with INMNPs. However, limited susceptibility to aztreonam implies that MPs may hold additional resistance mechanisms, such as extended spectrum β-lactamases, AmpC β-lactamases, or other non-enzymatic mechanisms
A Hybrid Channel Estimation Scheme for OFDM Systems
Accurate channel information is indispensable
for coherent reception of OFDM signal. Although a Wienertype
channel estimation filter (CEF) is known optimum, it is
not easily employable due to large implementation
complexity. In practice, a moving average (MA)-type CEF
is often employed, but it may not provide robust
performance to the variation of channel condition. In this
paper, we propose a hybrid CEF that takes advantages of
both the Wiener and MA CEF, by alternatively employing
the CEF according to the channel condition. Simulation
results show that the proposed hybrid CEF scheme provides
near optimum performance, while significantly reducing
the implementation complexity compared to the long tap
Wiener CEF
- …