Search CORE

27 research outputs found

Bayes Risk Transducer: Transducer with Controllable Alignment Prediction

Author: Chen Hangting
Tian Jinchuan
Watanabe Shinji
Weng Chao
Yan Brian
Yu Dong
Yu Jianwei
Publication venue
Publication date: 19/08/2023
Field of study

Automatic speech recognition (ASR) based on transducers is widely used. In training, a transducer maximizes the summed posteriors of all paths. The path with the highest posterior is commonly defined as the predicted alignment between the speech and the transcription. While the vanilla transducer does not have a prior preference for any of the valid paths, this work intends to enforce the preferred paths and achieve controllable alignment prediction. Specifically, this work proposes Bayes Risk Transducer (BRT), which uses a Bayes risk function to set lower risk values to the preferred paths so that the predicted alignment is more likely to satisfy specific desired properties. We further demonstrate that these predicted alignments with intentionally designed properties can provide practical advantages over the vanilla transducer. Experimentally, the proposed BRT saves inference cost by up to 46% for non-streaming ASR and reduces overall system latency by 41% for streaming ASR

arXiv.org e-Print Archive

AutoPrep: An Automatic Preprocessing Framework for In-the-Wild Speech Data

Author: Bian Yanyao
Chen Hangting
Jiang Jiayi
Li Xiang
Liu Mengyang
Luo Yi
Tian Jinchuan
Wang Shuai
Yu Jianwei
Publication venue
Publication date: 25/09/2023
Field of study

Recently, the utilization of extensive open-sourced text data has significantly advanced the performance of text-based large language models (LLMs). However, the use of in-the-wild large-scale speech data in the speech technology community remains constrained. One reason for this limitation is that a considerable amount of the publicly available speech data is compromised by background noise, speech overlapping, lack of speech segmentation information, missing speaker labels, and incomplete transcriptions, which can largely hinder their usefulness. On the other hand, human annotation of speech data is both time-consuming and costly. To address this issue, we introduce an automatic in-the-wild speech data preprocessing framework (AutoPrep) in this paper, which is designed to enhance speech quality, generate speaker labels, and produce transcriptions automatically. The proposed AutoPrep framework comprises six components: speech enhancement, speech segmentation, speaker clustering, target speech extraction, quality filtering and automatic speech recognition. Experiments conducted on the open-sourced WenetSpeech and our self-collected AutoPrepWild corpora demonstrate that the proposed AutoPrep framework can generate preprocessed data with similar DNSMOS and PDNSMOS scores compared to several open-sourced TTS datasets. The corresponding TTS system can achieve up to 0.68 in-domain speaker similarity

arXiv.org e-Print Archive

Lamellar structure change of waxy corn starch during gelatinization by time-resolved synchrotron SAXS

Author: Kuang Qirong
Liang Yongri
Liu Xingxun
Tian Feng
Xie Fengwei
Xu Jinchuan
Zhou Sumei
Publication venue: 'Elsevier BV'
Publication date: 01/01/2017
Field of study

In situ experiment of synchrotron small- and wide-angle X-ray scattering (SAXS/WAXS) was used to study the lamellar structure change of starch during gelatinization. Waxy corn starch was used as a model material to exclude the effect of amylose. The thicknesses of crystalline (d), amorphous (d) regions of the lamella and the long period distance (d) were obtained based on a 1D linear correlation function. The SAXS and WAXS results reveal the multi-stage of gelatinization. Firstly, a preferable increase in the thickness of crystalline lamellae occurs because of the water penetration into the crystalline region. Then, the thickness of amorphous lamellae has a significant increase while that of crystalline lamellae decreases. Next, the thickness of amorphous lamellae starts to decrease probably due to the out-phasing of starch molecules from the lamellae. Finally, the thickness of amorphous lamellae decreases rapidly, with the formation of fractal gel on a larger scale (than that of the lamellae), which gradually decreases as the temperature further increases and is related to the concentration of starch molecular chains. This work system reveals the gelatinization mechanism of waxy corn starch and would be useful in starch amorphous materials processing

Crossref

Warwick Research Archives Portal Repository

University of Queensland eSpace

Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data

Author: Arora Siddhant
Berrebbi Dan
Chang Xuankai
Chen William
Jung Jee-weon
Li Xinjian
Maiti Soumi
Peng Yifan
Shakeel Muhammad
Sharma Roshan
Shi Jiatong
Sudo Yui
Tian Jinchuan
Watanabe Shinji
Yan Brian
Zhang Wangyou
Publication venue
Publication date: 24/10/2023
Field of study

Pre-training speech models on large volumes of data has achieved remarkable success. OpenAI Whisper is a multilingual multitask model trained on 680k hours of supervised speech data. It generalizes well to various speech recognition and translation benchmarks even in a zero-shot setup. However, the full pipeline for developing such models (from data collection to training) is not publicly accessible, which makes it difficult for researchers to further improve its performance and address training-related issues such as efficiency, robustness, fairness, and bias. This work presents an Open Whisper-style Speech Model (OWSM), which reproduces Whisper-style training using an open-source toolkit and publicly available data. OWSM even supports more translation directions and can be more efficient to train. We will publicly release all scripts used for data preparation, training, inference, and scoring as well as pre-trained models and training logs to promote open science.Comment: Accepted at ASRU 202

arXiv.org e-Print Archive

Genetic diversity in India and the inference of Eurasian population expansion

Crossref

Springer - Publisher Connector

PubMed Central

Analysis and design of a hydraulic gripper based on compliant mechanisms

Author: TIAN JINCHUAN
Publication venue: 'Pisa University Press'
Publication date: 28/05/2094
Field of study

This thesis deals with the design of a gripper based on a compliant mechanism and driven by a hydrostatic actuator. The actuator is a rolling diaphragm type hydraulic cylinder used in robotics and is designed to solve the problems of hydrostatic friction and leakage found in conventional fluid power systems. In addition, the use of compliant mechanisms instead of conventional hinge structures reduces the complexity of assembly/maintenance and costs, and increases transmission efficiency. The first part of the report describes the background and motivation for the project, and reviews the relevant literature on mechanical grippers. Additionally, this section illustrates the actuator used in the mechanical gripper and the compliant mechanism. The second part mainly analyzes the compliant mechanism, providing a theoretical basis for the design and experiments that follow. The third part introduces the kinematic design of the gripper modeled as a rigid linkage system. The fourth part focuses on the design of a compliant mechanisms gripper that is based on the previously studied kinematics, followed by the realization of a prototype of the gripper. The fifth part illustrates a set of experiments that aim at demonstrating and characterizing the prototype of the gripper. In the sixth section, conclusions are given, and the improvement aspects and future development possibilities of the gripper are discussed

Electronic Thesis and Dissertation Archive - Università di Pisa

A novel local differential privacy federated learning under multi-privacy regimes

Author: Chen Gaojie
Dang Shuping
Liu Chun
Tang Jinchuan
Tian Youliang
Publication venue
Publication date: 01/10/2023
Field of study

Explore Bristol Research

A Hybrid Traffic Scheduling Strategy for Time-Sensitive Networking

Author: Jinchuan Pei
Le Tian
Menglong Li
Yuxiang Hu
Ziyong Li
Publication venue: MDPI AG
Publication date: 01/11/2022
Field of study

The traffic scheduling mechanism in Time-Sensitive Networking (TSN) is the key to guaranteeing the deterministic transmission of traffic. However, when time-sensitive traffic and non-time-sensitive traffic are transmitted together, traffic scheduling conflicts are easy to occur in TSN. As a result, the deterministic transmission of time-sensitive traffic will be disrupted, and non-time-sensitive traffic may be preempted for a long time. To optimize the performance of multi-type hybrid traffic scheduling in TSN, we firstly establish a collaborative scheduling framework that incorporates Time Aware Shaping (TAS) and Cyclic Queuing and Forwarding (CQF) mechanisms. We then design a traffic shaping method in this framework based on Least Laxity First (LLF), which considers traffic characteristics to dynamically arrange the time slot injection sequence for different types of traffic. Finally, the traffic schedulability is evaluated based on the scheduling constraints of different types of traffic. Compared with the existing scheduling strategies, the proposed hybrid traffic scheduling strategy can schedule more non-time-sensitive traffic and achieve better delay performance of rate-constrained traffic in different hybrid traffic scenarios. When the number of flows is 100, the time slot injection ratio is increased by 24.3% compared with the LLF_TAS method

Directory of Open Access Journals

Thermal-independent properties of PIN-PMN-PT single-crystal linear-array ultrasonic transducers

Author: Chen Ruimin
Han Pengdi
Lam Kwok Ho
Shung K. Kirk
Tian Jian
Wu Jinchuan
Yao Liheng
Zhou Qifa
Publication venue: Institute of Electrical and Electronics Engineers
Publication date: 01/12/2012
Field of study

In this paper, low-frequency 32-element lineararray ultrasonic transducers were designed and fabricated using both ternary Pb(In 1/2 Nb 1/2 )-Pb(Mg 1/3 Nb 2/3 )-PbTiO 3 (PIN-PMN-PT) and binary Pb(Mg 1/3 Nb 2/3 )-PbTiO 3 (PMNPT) single crystals. Performance of the array transducers was characterized as a function of temperature ranging from room temperature to 160°C. It was found that the array transducers fabricated using the PIN-PMN-PT single crystal were capable of satisfactory performance at 160°C, having a -6-dB bandwidth of 66% and an insertion loss of 37 dB. The results suggest that the potential of PIN-PMN-PT linear-array ultrasonic transducers for high-temperature ultrasonic transducer applications is promising

PubMed Central

Enlighten