Search CORE

416 research outputs found

New Soft Biscuit Wheat for the Northern Region

Author: Dines John
Ellison Frank
Shah Shakir
Publication venue: Value Added Wheat CRC (Australia)
Publication date: 01/10/2001
Field of study

Established and supported under the Australian Government’s Cooperative Research Centre Progra

Sydney eScholarship

Measuring the gap between HMM-based ASR and TTS

Author: Dines John
King Simon
Yamagishi Junichi
Publication venue
Publication date: 01/01/2009
Field of study

The EMIME European project is conducting research in the development of technologies for mobile, personalised speech-to-speech translation systems. The hidden Markov model is being used as the underlying technology in both automatic speech recognition (ASR) and text-to-speech synthesis (TTS) components, thus, the investigation of unified statistical modelling approaches has become an implicit goal of our research. As one of the first steps towards this goal, we have been investigating commonalities and differences between HMM-based ASR and TTS. In this paper we present results and analysis of a series of experiments that have been conducted on English ASR and TTS systems, measuring their performance with respect to phone set and lexicon, acoustic feature type and dimensionality and HMM topology. Our results show that, although the fundamental statistical model may be essentially the same, optimal ASR and TTS performance often demands diametrically opposed system designs. This represents a major challenge to be addressed in the investigation of such unified modelling approaches

Infoscience - École polytechnique fédérale de Lausanne

CiteSeerX

Edinburgh Research Archive

Edinburgh Research Explorer

Direct optimisation of a multilayer perceptron for the estimation of cepstral mean and variance statistics

Author: Dines John
Vepa Jithendra
Publication venue: IDIAP
Publication date: 11/02/2010
Field of study

We propose an alternative means of training a multilayer perceptron for the task of speech activity detection based on a criterion to minimise the error in the estimation of mean and variance statistics for speech cepstrum based features using the Kullback-Leibler divergence. We present our baseline and proposed speech activity detection approaches for multi-channel meeting room recordings and demonstrate the effectiveness of the new criterion by comparing the two approaches when used to carry out cepstrum mean and variance normalisation of features used in our meeting ASR system

Infoscience - École polytechnique fédérale de Lausanne

Phonological Knowledge Guided HMM State Mapping for Cross-Lingual Speaker Adaptation

Author: Dines John
Liang Hui
Publication venue
Publication date: 06/07/2011
Field of study

Within the HMM state mapping-based cross-lingual speaker adaptation framework, the minimum Kullback-Leibler divergence criterion has been typically employed to measure the similarity of two average voice state distributions from two respective languages for state mapping construction. Considering that this simple criterion doesn't take any language-specific information into account, we propose a data-driven, phonological knowledge guided approach to strengthen the mapping construction -- state distributions from the two languages are clustered according to broad phonetic categories using decision trees and mapping rules are constructed only within each of the clusters. Objective evaluation of our proposed approach demonstrates reduction of mel-cepstral distortion and that mapping rules derived from a single training speaker generalize to other speakers, with subtle improvement being detected during subjective listening tests

Infoscience - École polytechnique fédérale de Lausanne

Decision tree clustering for KL-HMM

Author: Dines John
Imseng David
Publication venue: Idiap
Publication date: 19/12/2013
Field of study

Infoscience - École polytechnique fédérale de Lausanne

An Analysis of Language Mismatch in HMM State Mapping-Based Cross-Lingual Speaker Adaptation

Author: Dines John
Liang Hui
Publication venue: Idiap
Publication date: 26/08/2010
Field of study

This paper provides an in-depth analysis of the impacts of language mismatch on the performance of cross-lingual speaker adaptation. Our work confirms the influence of language mismatch between average voice distributions for synthesis and for transform estimation and the necessity of eliminating this mismatch in order to effectively utilize multiple transforms for cross-lingual speaker adaptation. Specifically, we show that language mismatch introduces unwanted language-specific information when estimating multiple transforms, thus making these transforms detrimental to adaptation performance. Our analysis demonstrates speaker characteristics should be separated from language characteristics in order to improve cross-lingual adaptation performance

Infoscience - École polytechnique fédérale de Lausanne

Epidemiology and Impact of Abdominal Oblique Injuries in Major and Minor League Baseball.

Author: Camp Christopher L
Cohen Steven B.
Conte Stan
D\u27 Angelo John
Dines Joshua S
Nguyen Joseph T
Thompson Matthew
Publication venue: Jefferson Digital Commons
Publication date: 01/03/2017
Field of study

BACKGROUND: Oblique injuries are known to be a common cause of time out of play for professional baseball players, and prior work has suggested that injury rates may be on the rise in Major League Baseball (MLB). PURPOSE: To better understand the current incidence of oblique injuries, determine their impact based on time out of play, and to identify common injury patterns that may guide future injury prevention programs. STUDY DESIGN: Descriptive epidemiological study. METHODS: Using the MLB Health and Injury Tracking System, all oblique injuries that resulted in time out of play in MLB and Minor League Baseball (MiLB) during the 2011 to 2015 seasons were identified. Player demographics such as age, position/role, and handedness were included. Injury-specific factors analyzed included the following: date of injury, timing during season, days missed, mechanism, side, treatment, and reinjury status. RESULTS: A total of 996 oblique injuries occurred in 259 (26%) MLB and 737 (74%) MiLB players. Although the injury rate was steady in MiLB, the MLB injury rate declined (P = .037). A total of 22,064 days were missed at a mean rate of 4413 days per season and 22.2 days per injury. The majority of these occurred during batting (n = 455, 46%) or pitching (n = 348, 35%), with pitchers losing 5 days more per injury than batters (P \u3c .001). The leading side was injured in 77% of cases and took 5 days longer to recover from than trailing side injuries (P = .009). Seventy-nine (7.9%) players received either a corticosteroid or platelet-rich plasma injection, and the mean recovery time was 11 days longer compared with those who did not receive an injection (P \u3c .001). CONCLUSION: Although the rate of abdominal oblique injuries is on the decline in MLB, this is not the case for MiLB, and these injuries continue to represent a significant source of time out of play in professional baseball. The vast majority of injuries occur on the lead side, and these injuries result in the greatest amount time out of play. The benefit of injections for the treatment of oblique injuries remains unknown

Jefferson Digital Commons

The 2005 AMI system for the transcription of speech in meetings

Author: Burget Lukas
Dines John
Gaurau Giulia
Hain Thomas
Karafiat Martin
Lincoln Mike
McCowan Iain
Moore Darren
Ordelman Roeland
Renals Steve
Wan Vincent
Publication venue: Springer
Publication date: 01/01/2005
Field of study

In this paper we describe the 2005 AMI system for the transcription\ud of speech in meetings used for participation in the 2005 NIST\ud RT evaluations. The system was designed for participation in the speech\ud to text part of the evaluations, in particular for transcription of speech\ud recorded with multiple distant microphones and independent headset\ud microphones. System performance was tested on both conference room\ud and lecture style meetings. Although input sources are processed using\ud different front-ends, the recognition process is based on a unified system\ud architecture. The system operates in multiple passes and makes use\ud of state of the art technologies such as discriminative training, vocal\ud tract length normalisation, heteroscedastic linear discriminant analysis,\ud speaker adaptation with maximum likelihood linear regression and minimum\ud word error rate decoding. In this paper we describe the system performance\ud on the official development and test sets for the NIST RT05s\ud evaluations. The system was jointly developed in less than 10 months\ud by a multi-site team and was shown to achieve very competitive performance

CiteSeerX

Edinburgh Research Explorer

University of Twente Research Information

Tracter: A Lightweight Dataflow Framework

Author: Dines John
Garner Philip N.
Publication venue: Idiap
Publication date: 26/08/2010
Field of study

Tracter is introduced as a dataflow framework particularly useful for speech recognition. It is designed to work on-line in real-time as well as off-line, and is the feature extraction means for the Juicer transducer based decoder. This paper places Tracter in context amongst the dataflow literature and other commercial and open source packages. Some design aspects and capabilities are discussed. Finally, a fairly large processing graph incorporating voice activity detection and feature extraction is presented as an example of Tracter's capabilites

Infoscience - École polytechnique fédérale de Lausanne