72 research outputs found
SpeechColab Leaderboard: An Open-Source Platform for Automatic Speech Recognition Evaluation
In the wake of the surging tide of deep learning over the past decade,
Automatic Speech Recognition (ASR) has garnered substantial attention, leading
to the emergence of numerous publicly accessible ASR systems that are actively
being integrated into our daily lives. Nonetheless, the impartial and
replicable evaluation of these ASR systems encounters challenges due to various
crucial subtleties. In this paper we introduce the SpeechColab Leaderboard, a
general-purpose, open-source platform designed for ASR evaluation. With this
platform: (i) We report a comprehensive benchmark, unveiling the current
state-of-the-art panorama for ASR systems, covering both open-source models and
industrial commercial services. (ii) We quantize how distinct nuances in the
scoring pipeline influence the final benchmark outcomes. These include nuances
related to capitalization, punctuation, interjection, contraction, synonym
usage, compound words, etc. These issues have gained prominence in the context
of the transition towards an End-to-End future. (iii) We propose a practical
modification to the conventional Token-Error-Rate (TER) evaluation metric, with
inspirations from Kolmogorov complexity and Normalized Information Distance
(NID). This adaptation, called modified-TER (mTER), achieves proper
normalization and symmetrical treatment of reference and hypothesis. By
leveraging this platform as a large-scale testing ground, this study
demonstrates the robustness and backward compatibility of mTER when compared to
TER. The SpeechColab Leaderboard is accessible at
https://github.com/SpeechColab/Leaderboar
Librispeech: An ASR corpus based on public domain audio books
This paper introduces a new corpus of read English speech, suitable for training and evaluating speech recognition systems. The Lib-riSpeech corpus is derived from audiobooks that are part of the Lib-riVox project, and contains 1000 hours of speech sampled at 16 kHz. We have made the corpus freely available for download, along with separately prepared language-model training data and pre-built lan-guage models. We show that acoustic models trained on LibriSpeech give lower error rate on the Wall Street Journal (WSJ) test sets than models trained on WSJ itself. We are also releasing Kaldi scripts that make it easy to build these systems
Quantifying the value of pronunciation lexicons for keyword search in low resource languages
ABSTRACT This paper quantifies the value of pronunciation lexicons in large vocabulary continuous speech recognition (LVCSR) systems that support keyword search (KWS) in low resource languages. Stateof-the-art LVCSR and KWS systems are developed for conversational telephone speech in Tagalog, and the baseline lexicon is augmented via three different grapheme-to-phoneme models that yield increasing coverage of a large Tagalog word-list. It is demonstrated that while the increased lexical coverage -or reduced out-of-vocabulary (OOV) rate -leads to only modest (ca 1%-4%) improvements in word error rate, the concomitant improvements in actual term weighted value are as much as 60%. It is also shown that incorporating the augmented lexicons into the LVCSR system before indexing speech is superior to using them post facto, e.g., for approximate phonetic matching of OOV keywords in pre-indexed lattices. These results underscore the disproportionate importance of automatic lexicon augmentation for KWS in morphologically rich languages, and advocate for using them early in the LVCSR stage. Index Terms-Speech Recognition, Keyword Search, Information Retrieval, Morphology, Speech Synthesis LOW-RESOURCE KEYWORD SEARCH Thanks in part to the falling costs of storage and transmission, large volumes of speech such as oral history archives [1, 2] and on-line lectures We are interested in improving KWS performance in a low resource setting, i.e. where some resources are available to develop The authors, listed here in alphabetical order, were supported by DARPA BOLT contract Nō HR0011-12-C-0015, and IARPA BABEL contract Nō W911NF-12-C-0015. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon. Disclaimer: The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of DARPA, IARPA, DoD/ARL or the U.S. Government. an LVCSR system -such as 10 hours of transcribed speech corresponding to about 100K words of transcribed text, and a pronunciation lexicon that covers the words in the training data -but accuracy is sufficiently low that considerable improvement in KWS performance is necessary before the system is usable for searching a speech collection. A fair amount of past research has been devoted to improving the acoustic models from un-transcribed speech The importance of pronunciation lexicons for LVCSR is not entirely underestimated. Several papers have addressed the problem of automatically generating pronunciations for out of vocabulary (OOV) words Two notable exceptions to this conventional wisdom are (i) accuracy on infrequent, content-bearing words, which are more likely to be OOV, and (ii) accuracy in morphologically rich languages, e.g. Czech and Turkish. These exceptions come together in a detrimental fashion when developing KWS systems for a morphologically rich, low resource language such as Tagalog. This is the setting in which we will quantify the impact of increasing lexical coverage on the performance of a KWS system. We assume a transcribed corpus of 10 hours of Tagalog conversational telephone speech We first develop state-of-the-art LVCSR and KWS systems based on the given resources. We process and index a 10 hour search collection using the KWS system, and measure KWS performance using a set of 355 Tagalog queries. We then explore three different methods for augmenting the 5.7K word lexicon to include additional words seen in the larger LM training corpus. The augmented lexicons are used to improve the KWS system in two different ways: reprocessing the speech with the larger lexicon, or using it during keyword search. The efficacy of the augmented lexicons is measured in terms of 8560 978-1-4799-0356-6/13/$31.0
scSTAR reveals hidden heterogeneity with a real-virtual cell pair structure across conditions in single-cell RNA sequencing data
Cell-state transition can reveal additional information from single-cell ribonucleic acid (RNA)-sequencing data in time-resolved biological phenomena. However, most of the current methods are based on the time derivative of the gene expression state, which restricts them to the short-term evolution of cell states. Here, we present single-cell State Transition Across-samples of RNA-seq data (scSTAR), which overcomes this limitation by constructing a paired-cell projection between biological conditions with an arbitrary time span by maximizing the covariance between two feature spaces using partial least square and minimum squared error methods. In mouse ageing data, the response to stress in CD4+ memory T cell subtypes was found to be associated with ageing. A novel Treg subtype characterized by mTORC activation was identified to be associated with antitumour immune suppression, which was confirmed by immunofluorescence microscopy and survival analysis in 11 cancers from The Cancer Genome Atlas Program. On melanoma data, scSTAR improved immunotherapy-response prediction accuracy from 0.8 to 0.96
Exploring the shared molecular mechanism of microvascular and macrovascular complications in diabetes: Seeking the hub of circulatory system injury
BackgroundMicrovascular complications, such as diabetic retinopathy (DR) and diabetic nephropathy (DN), and macrovascular complications, referring to atherosclerosis (AS), are the main complications of diabetes. Blindness or fatal microvascular diseases are considered to be identified earlier than fatal macrovascular complications. Exploring the intrinsic relationship between microvascular and macrovascular complications and the hub of pathogenesis is of vital importance for prolonging the life span of patients with diabetes and improving the quality of life.Materials and methodsThe expression profiles of GSE28829, GSE30529, GSE146615 and GSE134998 were downloaded from the Gene Expression Omnibus database, which contained 29 atherosclerotic plaque samples, including 16 AS samples and 13 normal controls; 22 renal glomeruli and tubules samples from diabetes nephropathy including 12 DN samples and 10 normal controls; 73 lymphoblastoid cell line samples, including 52 DR samples and 21 normal controls. The microarray datasets were consolidated and DEGs were acquired and further analyzed by bioinformatics techniques including GSEA analysis, GO-KEGG functional clustering by R (version 4.0.5), PPI analysis by Cytoscape (version 3.8.2) and String database, miRNA analysis by Diana database, and hub genes analysis by Metascape database. The drug sensitivity of characteristic DEGs was analyzed.ResultA total of 3709, 4185 and 8086 DEGs were recognized in AS, DN, DR, respectively, with 1820, 1666, 888 upregulated and 1889, 2519, 7198 downregulated. GO and KEGG pathway analyses of DEGs and GSEA analysis of common differential genes demonstrated that these significant sites focused primarily on inflammation-oxidative stress and immune regulation pathways. PPI networks show the connection and regulation on top-250 significant sites of AS, DN, DR. MiRNA analysis explored the non-coding RNA upstream regulation network and significant pathway in AS, DN, DR. The joint analysis of multiple diseases shows the common influenced pathways of AS, DN, DR and explored the interaction between top-1000 DEGs at the same time.ConclusionIn the microvascular and macrovascular complications of diabetes, immune-mediated inflammatory response, chronic inflammation caused by endothelial cell activation and oxidative stress are the three links linking atherosclerosis, diabetes retinopathy and diabetes nephropathy together. Our study has clarified the intrinsic relationship and common tissue damage mechanism of microcirculation and circulatory system complications in diabetes, and explored the mechanism center of these two vascular complications. It has far-reaching clinical and social value for reducing the incidence of fatal events and early controlling the progress of disabling and fatal circulatory complications in diabetes
The invasion of tobacco mosaic virus RNA induces endoplasmic reticulum stress-related autophagy in HeLa cells
The ability of human cells to defend against viruses originating from distant species has long been ignored. Owing to the pressure of natural evolution and human exploration, some of these viruses may be able to invade human beings. If their ‘fresh’ host had no defences, the viruses could cause a serious pandemic, as seen with HIV, SARS (severe acute respiratory syndrome) and avian influenza virus that originated from chimpanzees, the common palm civet and birds, respectively. It is unknown whether the human immune system could tolerate invasion with a plant virus. To model such an alien virus invasion, we chose TMV (tobacco mosaic virus) and used human epithelial carcinoma cells (HeLa cells) as its ‘fresh’ host. We established a reliable system for transfecting TMV-RNA into HeLa cells and found that TMV-RNA triggered autophagy in HeLa cells as shown by the appearance of autophagic vacuoles, the conversion of LC3-I (light chain protein 3-I) to LC3-II, the up-regulated expression of Beclin1 and the accumulation of TMV protein on autophagosomal membranes. We observed suspected TMV virions in HeLa cells by TEM (transmission electron microscopy). Furthermore, we found that TMV-RNA was translated into CP (coat protein) in the ER (endoplasmic reticulum) and that TMV-positive RNA translocated from the cytoplasm to the nucleolus. Finally, we detected greatly increased expression of GRP78 (78 kDa glucose-regulated protein), a typical marker of ERS (ER stress) and found that the formation of autophagosomes was closely related to the expanded ER membrane. Taken together, our data indicate that HeLa cells used ERS and ERS-related autophagy to defend against TMV-RNA
Measuring ionization time lag of polar molecules with a calibrated attoclock
Electrons in atoms and molecules can not respond immediately to the action of intense laser field. There is a time lag (about 100 attoseconds) between instants of the field maximum and the ionization-rate maximum. This lag characterizes the response time of the electronic wave function to a strong-field ionization event and has important effects on dynamics of the ionized electron. For polar molecules with a large permanent dipole, the direct measurement or calculation of the absolute time lag is difficult. Here, a calibrated attoclock procedure, which is related to a simple Coulomb-induced temporal correction to electron trajectories, is proposed to measure the relative time lag of two different ionization events. Using this procedure, the relative lag of polar molecules in two consecutive half laser cycles can be probed with high time resolution
- …