Search CORE

871 research outputs found

Bayesian Analysis of Phoneme Confusion Matrices

Author: Dahlquist Martin
Henter Gustav Eje
Leijon Arne
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 23/12/2015
Field of study

Hacking Smart Machines with Smarter Ones: How to Extract Meaningful Data from Machine Learning Classifiers

Author: Ateniese Giuseppe
Felici Giovanni
Mancini Luigi V.
Spognardi Angelo
Villani Antonio
Vitali Domenico
Publication venue
Publication date: 19/06/2013
Field of study

Machine Learning (ML) algorithms are used to train computers to perform a variety of complex tasks and improve with experience. Computers learn how to recognize patterns, make unintended decisions, or react to a dynamic environment. Certain trained machines may be more effective than others because they are based on more suitable ML algorithms or because they were trained through superior training sets. Although ML algorithms are known and publicly released, training sets may not be reasonably ascertainable and, indeed, may be guarded as trade secrets. While much research has been performed about the privacy of the elements of training sets, in this paper we focus our attention on ML classifiers and on the statistical information that can be unconsciously or maliciously revealed from them. We show that it is possible to infer unexpected but useful information from ML classifiers. In particular, we build a novel meta-classifier and train it to hack other classifiers, obtaining meaningful information about their training sets. This kind of information leakage can be exploited, for example, by a vendor to build more effective classifiers or to simply acquire trade secrets from a competitor's apparatus, potentially violating its intellectual property rights

arXiv.org e-Print Archive

CiteSeerX

Visual units and confusion modelling for automatic lip-reading

Author: Cox Stephen
Howell Dominic
Theobald Barry
Publication venue: 'Elsevier BV'
Publication date: 01/07/2016
Field of study

Automatic lip-reading (ALR) is a challenging task because the visual speech signal is known to be missing some important information, such as voicing. We propose an approach to ALR that acknowledges that this information is missing but assumes that it is substituted or deleted in a systematic way that can be modelled. We describe a system that learns such a model and then incorporates it into decoding, which is realised as a cascade of weighted finite-state transducers. Our results show a small but statistically significant improvement in recognition accuracy. We also investigate the issue of suitable visual units for ALR, and show that visemes are sub-optimal, not but because they introduce lexical ambiguity, but because the reduction in modelling units entailed by their use reduces accuracy

University of East Anglia digital repository

Time-Domain Isolated Phoneme Classification Using Reconstructed Phase Spaces

Author: Indrebo Kevin M
Johnson Michael T
Lindgren Andrew C.
Liu Xiaolin
Povinelli Richard J.
Ye Jinjin
Publication venue: e-Publications@Marquette
Publication date: 01/07/2005
Field of study

This paper introduces a novel time-domain approach to modeling and classifying speech phoneme waveforms. The approach is based on statistical models of reconstructed phase spaces, which offer significant theoretical benefits as representations that are known to be topologically equivalent to the state dynamics of the underlying production system. The lag and dimension parameters of the reconstruction process for speech are examined in detail, comparing common estimation heuristics for these parameters with corresponding maximum likelihood recognition accuracy over the TIMIT data set. Overall accuracies are compared with a Mel-frequency cepstral baseline system across five different phonetic classes within TIMIT, and a composite classifier using both cepstral and phase space features is developed. Results indicate that although the accuracy of the phase space approach by itself is still currently below that of baseline cepstral methods, a combined approach is capable of increasing speaker independent phoneme accuracy

epublications@Marquette

Video augmentation for improving audio speech recognition under noise

Author: British Machine Vision Conference (BMVC)
Cavallaro A
Gong S
Pachoud S
Publication venue
Publication date: 23/02/2015
Field of study

Queen Mary Research Online

A summary of the 2012 JHU CLSP Workshop on Zero Resource Speech Technologies and Models of Early Language Acquisition

Author: Bennett Erin
Borschinger Benjamin
Chiu Justin
Church Kenneth
Clark Pascal
Dunbar Ewan
Dupoux Emmanuel
Feldman Naomi
Fourtassi Abdallah
Goldwater Sharon
Harwath David
Hermansky Hynek
Jansen Aren
Johnson Mark
Khudanpur Sanjeev
Lee Chia-ying
Levin Keith
McGraw Ian
Metze Florian
Norouzian Atta
Peddinti Vijay
Richardson Rachel
Rose Richard
Schatz Thomas
Seltzer Mike
Thomas Samuel
Varadarajan Balakrishnan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2013
Field of study

We summarize the accomplishments of a multi-disciplinary workshop exploring the computational and scientific issues surrounding zero resource (unsupervised) speech technologies and related models of early language acquisition. Centered around the tasks of phonetic and lexical discovery, we consider unified evaluation metrics, present two new approaches for improving speaker independence in the absence of supervision, and evaluate the application of Bayesian word segmentation algorithms to automatic subword unit tokenizations. Finally, we present two strategies for integrating zero resource techniques into supervised settings, demonstrating the potential of unsupervised methods to improve mainstream technologies.5 page(s

Edinburgh Research Explorer

Macquarie University ResearchOnline