790 research outputs found
Outstanding supercapacitive properties of Mn-doped TiO2 micro/nanostructure porous film prepared by anodization method.
Mn-doped TiO2 micro/nanostructure porous film was prepared by anodizing a Ti-Mn alloy. The film annealed at 300 °C yields the highest areal capacitance of 1451.3 mF/cm(2) at a current density of 3 mA/cm(2) when used as a high-performance supercapacitor electrode. Areal capacitance retention is 63.7% when the current density increases from 3 to 20 mA/cm(2), and the capacitance retention is 88.1% after 5,000 cycles. The superior areal capacitance of the porous film is derived from the brush-like metal substrate, which could greatly increase the contact area, improve the charge transport ability at the oxide layer/metal substrate interface, and thereby significantly enhance the electrochemical activities toward high performance energy storage. Additionally, the effects of manganese content and specific surface area of the porous film on the supercapacitive performance were also investigated in this work
UNIT-DSR: Dysarthric Speech Reconstruction System Using Speech Unit Normalization
Dysarthric speech reconstruction (DSR) systems aim to automatically convert
dysarthric speech into normal-sounding speech. The technology eases
communication with speakers affected by the neuromotor disorder and enhances
their social inclusion. NED-based (Neural Encoder-Decoder) systems have
significantly improved the intelligibility of the reconstructed speech as
compared with GAN-based (Generative Adversarial Network) approaches, but the
approach is still limited by training inefficiency caused by the cascaded
pipeline and auxiliary tasks of the content encoder, which may in turn affect
the quality of reconstruction. Inspired by self-supervised speech
representation learning and discrete speech units, we propose a Unit-DSR
system, which harnesses the powerful domain-adaptation capacity of HuBERT for
training efficiency improvement and utilizes speech units to constrain the
dysarthric content restoration in a discrete linguistic space. Compared with
NED approaches, the Unit-DSR system only consists of a speech unit normalizer
and a Unit HiFi-GAN vocoder, which is considerably simpler without cascaded
sub-modules or auxiliary tasks. Results on the UASpeech corpus indicate that
Unit-DSR outperforms competitive baselines in terms of content restoration,
reaching a 28.2% relative average word error rate reduction when compared to
original dysarthric speech, and shows robustness against speed perturbation and
noise.Comment: Accepted to ICASSP 202
Exploiting Audio-Visual Features with Pretrained AV-HuBERT for Multi-Modal Dysarthric Speech Reconstruction
Dysarthric speech reconstruction (DSR) aims to transform dysarthric speech
into normal speech by improving the intelligibility and naturalness. This is a
challenging task especially for patients with severe dysarthria and speaking in
complex, noisy acoustic environments. To address these challenges, we propose a
novel multi-modal framework to utilize visual information, e.g., lip movements,
in DSR as extra clues for reconstructing the highly abnormal pronunciations.
The multi-modal framework consists of: (i) a multi-modal encoder to extract
robust phoneme embeddings from dysarthric speech with auxiliary visual
features; (ii) a variance adaptor to infer the normal phoneme duration and
pitch contour from the extracted phoneme embeddings; (iii) a speaker encoder to
encode the speaker's voice characteristics; and (iv) a mel-decoder to generate
the reconstructed mel-spectrogram based on the extracted phoneme embeddings,
prosodic features and speaker embeddings. Both objective and subjective
evaluations conducted on the commonly used UASpeech corpus show that our
proposed approach can achieve significant improvements over baseline systems in
terms of speech intelligibility and naturalness, especially for the speakers
with more severe symptoms. Compared with original dysarthric speech, the
reconstructed speech achieves 42.1\% absolute word error rate reduction for
patients with more severe dysarthria levels.Comment: Accepted by ICASSP 202
Overall PSD and Fractal Characteristics of Tight Oil Reservoirs: A Case Study of Lucaogou Formation in Junggar Basin, China
Lucaogou tight oil reservoir, located in the Junggar Basin, Northwest of China, is one of the typical tight oil reservoirs. Complex lithology leads to a wide pore size distribution (PSD), ranging from several nanometers to hundreds of micrometers. To better understand PSD and fractal features of Lucaogou tight oil reservoir, the experiment methods including scanning electron microscope (SEM), rate-controlled mercury injection (RMI) and pressure-controlled mercury injection (PMI) were performed on the six samples with different lithology. The results indicate that four types of pores exist in Lucaogou tight oil reservoir, including dissolution pores, clay dominated pores, microfractures and inter-granular pores. A combination of PMI and RMI was proposed to calculate the overall PSD of tight oil reservoirs, the overall pore radius of Lucaogou tight oil reservoir ranges from 3.6 nm to 500µm. The fractal analysis was carried out based on the PMI data. Fractal dimension (Fd) values varied between 2.843 and 2.913 with a mean value of 2.88. Fd increases with a decrease of quartz content and an increase of clay mineral content. Samples from tight oil reservoirs with smaller average pore radius have stronger complexity of pore structure. Fractal dimension shows negative correlations with porosity and permeability. In addition, fractal characteristics of different tight reservoirs were compared and analyzed
StyleSpeech: Self-supervised Style Enhancing with VQ-VAE-based Pre-training for Expressive Audiobook Speech Synthesis
The expressive quality of synthesized speech for audiobooks is limited by
generalized model architecture and unbalanced style distribution in the
training data. To address these issues, in this paper, we propose a
self-supervised style enhancing method with VQ-VAE-based pre-training for
expressive audiobook speech synthesis. Firstly, a text style encoder is
pre-trained with a large amount of unlabeled text-only data. Secondly, a
spectrogram style extractor based on VQ-VAE is pre-trained in a self-supervised
manner, with plenty of audio data that covers complex style variations. Then a
novel architecture with two encoder-decoder paths is specially designed to
model the pronunciation and high-level style expressiveness respectively, with
the guidance of the style extractor. Both objective and subjective evaluations
demonstrate that our proposed method can effectively improve the naturalness
and expressiveness of the synthesized speech in audiobook synthesis especially
for the role and out-of-domain scenarios.Comment: Accepted to ICASSP 202
The Efficacy of Chinese Herbal Medicine as an Adjunctive Therapy for Advanced Non-small Cell Lung Cancer: A Systematic Review and Meta-analysis
Many published studies reflect the growing application of complementary and alternative medicine, particularly Chinese herbal medicine (CHM) use in combination with conventional cancer therapy for advanced non-small cell lung cancer (NSCLC), but its efficacy remains largely unexplored. The purpose of this study is to evaluate the efficacy of CHM combined with conventional chemotherapy (CT) in the treatment of advanced NSCLC. Publications in 11 electronic databases were extensively searched, and 24 trials were included for analysis. A sum of 2,109 patients was enrolled in these studies, at which 1,064 patients participated in CT combined CHM and 1,039 in CT (six patients dropped out and were not reported the group enrolled). Compared to using CT alone, CHM combined with CT significantly increase one-year survival rate (RR = 1.36, 95% CI = 1.15-1.60, p = 0.0003). Besides, the combined therapy significantly increased immediate tumor response (RR = 1.36, 95% CI = 1.19-1.56, p<1.0E-5) and improved Karnofsky performance score (KPS) (RR = 2.90, 95% CI = 1.62-5.18, p = 0.0003). Combined therapy remarkably reduced the nausea and vomiting at toxicity grade of III-IV (RR = 0.24, 95% CI = 0.12-0.50, p = 0.0001) and prevented the decline of hemoglobin and platelet in patients under CT at toxicity grade of I-IV (RR = 0.64, 95% CI = 0.51-0.80, p<0.0001). Moreover, the herbs that are frequently used in NSCLC patients were identified. This systematic review suggests that CHM as an adjuvant therapy can reduce CT toxicity, prolong survival rate, enhance immediate tumor response, and improve KPS in advanced NSCLC patients. However, due to the lack of large-scale randomized clinical trials in the included studies, further larger scale trials are needed. © 2013 Li et al.published_or_final_versio
TIER: Text-Image Encoder-based Regression for AIGC Image Quality Assessment
Recently, AIGC image quality assessment (AIGCIQA), which aims to assess the
quality of AI-generated images (AIGIs) from a human perception perspective, has
emerged as a new topic in computer vision. Unlike common image quality
assessment tasks where images are derived from original ones distorted by
noise, blur, and compression, \textit{etc.}, in AIGCIQA tasks, images are
typically generated by generative models using text prompts. Considerable
efforts have been made in the past years to advance AIGCIQA. However, most
existing AIGCIQA methods regress predicted scores directly from individual
generated images, overlooking the information contained in the text prompts of
these images. This oversight partially limits the performance of these AIGCIQA
methods. To address this issue, we propose a text-image encoder-based
regression (TIER) framework. Specifically, we process the generated images and
their corresponding text prompts as inputs, utilizing a text encoder and an
image encoder to extract features from these text prompts and generated images,
respectively. To demonstrate the effectiveness of our proposed TIER method, we
conduct extensive experiments on several mainstream AIGCIQA databases,
including AGIQA-1K, AGIQA-3K, and AIGCIQA2023. The experimental results
indicate that our proposed TIER method generally demonstrates superior
performance compared to baseline in most cases.Comment: 12 pages, 8 figures. arXiv admin note: text overlap with
arXiv:2312.0589
- …