27 research outputs found
Bayes Risk Transducer: Transducer with Controllable Alignment Prediction
Automatic speech recognition (ASR) based on transducers is widely used. In
training, a transducer maximizes the summed posteriors of all paths. The path
with the highest posterior is commonly defined as the predicted alignment
between the speech and the transcription. While the vanilla transducer does not
have a prior preference for any of the valid paths, this work intends to
enforce the preferred paths and achieve controllable alignment prediction.
Specifically, this work proposes Bayes Risk Transducer (BRT), which uses a
Bayes risk function to set lower risk values to the preferred paths so that the
predicted alignment is more likely to satisfy specific desired properties. We
further demonstrate that these predicted alignments with intentionally designed
properties can provide practical advantages over the vanilla transducer.
Experimentally, the proposed BRT saves inference cost by up to 46% for
non-streaming ASR and reduces overall system latency by 41% for streaming ASR
AutoPrep: An Automatic Preprocessing Framework for In-the-Wild Speech Data
Recently, the utilization of extensive open-sourced text data has
significantly advanced the performance of text-based large language models
(LLMs). However, the use of in-the-wild large-scale speech data in the speech
technology community remains constrained. One reason for this limitation is
that a considerable amount of the publicly available speech data is compromised
by background noise, speech overlapping, lack of speech segmentation
information, missing speaker labels, and incomplete transcriptions, which can
largely hinder their usefulness. On the other hand, human annotation of speech
data is both time-consuming and costly. To address this issue, we introduce an
automatic in-the-wild speech data preprocessing framework (AutoPrep) in this
paper, which is designed to enhance speech quality, generate speaker labels,
and produce transcriptions automatically. The proposed AutoPrep framework
comprises six components: speech enhancement, speech segmentation, speaker
clustering, target speech extraction, quality filtering and automatic speech
recognition. Experiments conducted on the open-sourced WenetSpeech and our
self-collected AutoPrepWild corpora demonstrate that the proposed AutoPrep
framework can generate preprocessed data with similar DNSMOS and PDNSMOS scores
compared to several open-sourced TTS datasets. The corresponding TTS system can
achieve up to 0.68 in-domain speaker similarity
Lamellar structure change of waxy corn starch during gelatinization by time-resolved synchrotron SAXS
In situ experiment of synchrotron small- and wide-angle X-ray scattering (SAXS/WAXS) was used to study the lamellar structure change of starch during gelatinization. Waxy corn starch was used as a model material to exclude the effect of amylose. The thicknesses of crystalline (d), amorphous (d) regions of the lamella and the long period distance (d) were obtained based on a 1D linear correlation function. The SAXS and WAXS results reveal the multi-stage of gelatinization. Firstly, a preferable increase in the thickness of crystalline lamellae occurs because of the water penetration into the crystalline region. Then, the thickness of amorphous lamellae has a significant increase while that of crystalline lamellae decreases. Next, the thickness of amorphous lamellae starts to decrease probably due to the out-phasing of starch molecules from the lamellae. Finally, the thickness of amorphous lamellae decreases rapidly, with the formation of fractal gel on a larger scale (than that of the lamellae), which gradually decreases as the temperature further increases and is related to the concentration of starch molecular chains. This work system reveals the gelatinization mechanism of waxy corn starch and would be useful in starch amorphous materials processing
Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data
Pre-training speech models on large volumes of data has achieved remarkable
success. OpenAI Whisper is a multilingual multitask model trained on 680k hours
of supervised speech data. It generalizes well to various speech recognition
and translation benchmarks even in a zero-shot setup. However, the full
pipeline for developing such models (from data collection to training) is not
publicly accessible, which makes it difficult for researchers to further
improve its performance and address training-related issues such as efficiency,
robustness, fairness, and bias. This work presents an Open Whisper-style Speech
Model (OWSM), which reproduces Whisper-style training using an open-source
toolkit and publicly available data. OWSM even supports more translation
directions and can be more efficient to train. We will publicly release all
scripts used for data preparation, training, inference, and scoring as well as
pre-trained models and training logs to promote open science.Comment: Accepted at ASRU 202
Analysis and design of a hydraulic gripper based on compliant mechanisms
This thesis deals with the design of a gripper based on a compliant mechanism and driven by a hydrostatic actuator. The actuator is a rolling diaphragm type hydraulic cylinder used in robotics and is designed to solve the problems of hydrostatic friction and leakage found in conventional fluid power systems. In addition, the use of compliant mechanisms instead of conventional hinge structures reduces the complexity of assembly/maintenance and costs, and increases transmission efficiency.
The first part of the report describes the background and motivation for the project, and reviews the relevant literature on mechanical grippers. Additionally, this section illustrates the actuator used in the mechanical gripper and the compliant mechanism. The second part mainly analyzes the compliant mechanism, providing a theoretical basis for the design and experiments that follow. The third part introduces the kinematic design of the gripper modeled as a rigid linkage system. The fourth part focuses on the design of a compliant mechanisms gripper that is based on the previously studied kinematics, followed by the realization of a prototype of the gripper. The fifth part illustrates a set of experiments that aim at demonstrating and characterizing the prototype of the gripper. In the sixth section, conclusions are given, and the improvement aspects and future development possibilities of the gripper are discussed
A Hybrid Traffic Scheduling Strategy for Time-Sensitive Networking
The traffic scheduling mechanism in Time-Sensitive Networking (TSN) is the key to guaranteeing the deterministic transmission of traffic. However, when time-sensitive traffic and non-time-sensitive traffic are transmitted together, traffic scheduling conflicts are easy to occur in TSN. As a result, the deterministic transmission of time-sensitive traffic will be disrupted, and non-time-sensitive traffic may be preempted for a long time. To optimize the performance of multi-type hybrid traffic scheduling in TSN, we firstly establish a collaborative scheduling framework that incorporates Time Aware Shaping (TAS) and Cyclic Queuing and Forwarding (CQF) mechanisms. We then design a traffic shaping method in this framework based on Least Laxity First (LLF), which considers traffic characteristics to dynamically arrange the time slot injection sequence for different types of traffic. Finally, the traffic schedulability is evaluated based on the scheduling constraints of different types of traffic. Compared with the existing scheduling strategies, the proposed hybrid traffic scheduling strategy can schedule more non-time-sensitive traffic and achieve better delay performance of rate-constrained traffic in different hybrid traffic scenarios. When the number of flows is 100, the time slot injection ratio is increased by 24.3% compared with the LLF_TAS method
Thermal-independent properties of PIN-PMN-PT single-crystal linear-array ultrasonic transducers
In this paper, low-frequency 32-element lineararray ultrasonic transducers were designed and fabricated using both ternary Pb(In 1/2 Nb 1/2 )-Pb(Mg 1/3 Nb 2/3 )-PbTiO 3 (PIN-PMN-PT) and binary Pb(Mg 1/3 Nb 2/3 )-PbTiO 3 (PMNPT) single crystals. Performance of the array transducers was characterized as a function of temperature ranging from room temperature to 160°C. It was found that the array transducers fabricated using the PIN-PMN-PT single crystal were capable of satisfactory performance at 160°C, having a -6-dB bandwidth of 66% and an insertion loss of 37 dB. The results suggest that the potential of PIN-PMN-PT linear-array ultrasonic transducers for high-temperature ultrasonic transducer applications is promising