124 research outputs found
Advances in All-Neural Speech Recognition
This paper advances the design of CTC-based all-neural (or end-to-end) speech
recognizers. We propose a novel symbol inventory, and a novel iterated-CTC
method in which a second system is used to transform a noisy initial output
into a cleaner version. We present a number of stabilization and initialization
methods we have found useful in training these networks. We evaluate our system
on the commonly used NIST 2000 conversational telephony test set, and
significantly exceed the previously published performance of similar systems,
both with and without the use of an external language model and decoding
technology
The Microsoft 2017 Conversational Speech Recognition System
We describe the 2017 version of Microsoft's conversational speech recognition
system, in which we update our 2016 system with recent developments in
neural-network-based acoustic and language modeling to further advance the
state of the art on the Switchboard speech recognition task. The system adds a
CNN-BLSTM acoustic model to the set of model architectures we combined
previously, and includes character-based and dialog session aware LSTM language
models in rescoring. For system combination we adopt a two-stage approach,
whereby subsets of acoustic models are first combined at the senone/frame
level, followed by a word-level voting via confusion networks. We also added a
confusion network rescoring step after system combination. The resulting system
yields a 5.1\% word error rate on the 2000 Switchboard evaluation set
The Microsoft 2016 Conversational Speech Recognition System
We describe Microsoft's conversational speech recognition system, in which we
combine recent developments in neural-network-based acoustic and language
modeling to advance the state of the art on the Switchboard recognition task.
Inspired by machine learning ensemble techniques, the system uses a range of
convolutional and recurrent neural networks. I-vector modeling and lattice-free
MMI training provide significant gains for all acoustic model architectures.
Language model rescoring with multiple forward and backward running RNNLMs, and
word posterior-based system combination provide a 20% boost. The best single
system uses a ResNet architecture acoustic model with RNNLM rescoring, and
achieves a word error rate of 6.9% on the NIST 2000 Switchboard task. The
combined system has an error rate of 6.2%, representing an improvement over
previously reported results on this benchmark task
Analysis of Radioactive Releases During Proposed Demolition Activities for the 224-U and 224-UA Buildings - Addendum
A post-demolition modeling analysis is conducted that compares during-demolition atmospheric concentration monitoring results with modeling results based on the actual meteorological conditions during the demolition activities. The 224-U and 224-UA Buildings that were located in the U-Plant UO3 complex in the 200 West Area of the Hanford Site were demolished during the summer of 2010. These facilities converted uranyl nitrate hexahydrate (UNH), a product of Hanford’s Plutonium-Uranium Extraction (PUREX) Plant, into uranium trioxide (UO3). This report is an addendum to a pre-demolition emission analysis and air dispersion modeling effort that was conducted for proposed demolition activities for these structures
Adversarial Reweighting for Speaker Verification Fairness
We address performance fairness for speaker verification using the
adversarial reweighting (ARW) method. ARW is reformulated for speaker
verification with metric learning, and shown to improve results across
different subgroups of gender and nationality, without requiring annotation of
subgroups in the training data. An adversarial network learns a weight for each
training sample in the batch so that the main learner is forced to focus on
poorly performing instances. Using a min-max optimization algorithm, this
method improves overall speaker verification fairness. We present three
different ARWformulations: accumulated pairwise similarity, pseudo-labeling,
and pairwise weighting, and measure their performance in terms of equal error
rate (EER) on the VoxCeleb corpus. Results show that the pairwise weighting
method can achieve 1.08% overall EER, 1.25% for male and 0.67% for female
speakers, with relative EER reductions of 7.7%, 10.1% and 3.0%, respectively.
For nationality subgroups, the proposed algorithm showed 1.04% EER for US
speakers, 0.76% for UK speakers, and 1.22% for all others. The absolute EER gap
between gender groups was reduced from 0.70% to 0.58%, while the standard
deviation over nationality groups decreased from 0.21 to 0.19
Recommended from our members
Sensitivity Analysis of Hardwired Parameters in GALE Codes
The U.S. Nuclear Regulatory Commission asked Pacific Northwest National Laboratory to provide a data-gathering plan for updating the hardwired data tables and parameters of the Gaseous and Liquid Effluents (GALE) codes to reflect current nuclear reactor performance. This would enable the GALE codes to make more accurate predictions about the normal radioactive release source term applicable to currently operating reactors and to the cohort of reactors planned for construction in the next few years. A sensitivity analysis was conducted to define the importance of hardwired parameters in terms of each parameter’s effect on the emission rate of the nuclides that are most important in computing potential exposures. The results of this study were used to compile a list of parameters that should be updated based on the sensitivity of these parameters to outputs of interest
Development of novel 2D and 3D correlative microscopy to characterise the composition and multiscale structure of suspended sediment aggregates.
Suspended cohesive sediments form aggregates or 'flocs' and are often closely associated with carbo, nutrients, pathogens and pollutants, which makes understanding their composition, transport and fate highly desirable. Accurate prediction of floc behaviour requires the quantification of 3-dimensional (3D) properties
(size, shoe and internal structure) that span several scales (i.e. nanometre [nm] to millimetre [mm]-scale). Traditional techniques (optical cameras and electron microscopy [EM]), however, can only provide 2-dimensional (2D) simplifications of 3D floc geometries. Additionally, the existence of a resolution gap between conventional
optical microscopy (COM) and transmission EM (TEM) prevents an understanding of how floc nm-scale constituents and internal structure influence mm-scale floc properties. Here, we develop a novel correlative imaging workflow combining 3D X-ray
micro-computed tomography (μCT), 3D focused ion beam nanotomography (FIB-nt) and 2D scanning EM (SEM) and TEM (STEM) which allows us to stabilise, visualise and quantify the composition and multi scale structure of sediment flocs for the first
time. This new technique allowed the quantification of 3D floc geometries, the identification of individual floc components (e.g., clays, non-clay minerals and bacteria), and characterisation of particle-particle and structural associations across scales.
This novel dataset demonstrates the truly complex structure of natural flocs at multiple scales. The integration of multiscale, state-of-the-art instrumentation/techniques offers the potential to generate fundamental new understanding of floc composition, structure and behaviour
Ozone field studies adjacent to a hvdc transmission test line
Field studies of atmospheric ozone concentrations adjacent to high voltage direct current (hvdc) transmission test lines were conducted at the Bonneville Power Administration's (BPA) hvdc Test Facility at The Dalles, Oregon. The transmission lines were operating at voltages from +- 400 to +- 600 kV during the field studies. The downwind ozone plumes were studied using a roving vertical profiling system. Ambient meteorological and test line parameters were also recorded to allow comparison of predicted and observed ozone plumes. For fair weather conditions no ozone plumes were evident. The absence of identifiable changes in ozone concentrations by the energized lines demonstrates the trivial nature of the ozone concentrations from the energized lines for fair weather conditions. Since corona loss from power lines is largest during precipitation, the ozone production is also largest. At the same time, the natural background atmospheric ozone concentrations are depressed by the scavenging effect of the precipitation. Therefore ozone production rates can best be measured during precipitation periods. With the exception of precipitation cases, the vertical profiles of ozone concentration demonstrated no discernible evidence of ozone plumes from the energized conductors. Ozone plumes, if any, were masked in the natural background ozone variability
Recommended from our members
Analysis of health impact inputs to the US Department of Energy's risk information system
The US Department of Energy (DOE) is in the process of completing a survey of environmental problems, referred to as the Environmental Survey, at their facilities across the country. The DOE Risk Information System (RIS) is being used to prioritize these environmental problems identified in the Environmental Survey's findings. This report contains a discussion of site-specific public health risk parameters and the rationale for their inclusion in the RIS. These parameters are based on computed potential impacts obtained with the Multimedia Environmental Pollutant Assessment System (MEPAS). MEPAS is a computer-based methodology for evaluating the potential exposures resulting from multimedia environmental transport of hazardous materials. This report has three related objectives: document the role of MEPAS in the RIS framework, report the results of the analysis of alternative risk parameters that led to the current RIS risk parameters, and describe analysis of uncertainties in the risk-related parameters. 20 refs., 17 figs., 10 tabs
- …