Search CORE

903 research outputs found

Towards Fast-Convergence, Low-Delay and Low-Complexity Network Optimization

Author: Shroff Ness
Wang Sinong
Publication venue
Publication date: 17/07/2017
Field of study

Distributed network optimization has been studied for well over a decade. However, we still do not have a good idea of how to design schemes that can simultaneously provide good performance across the dimensions of utility optimality, convergence speed, and delay. To address these challenges, in this paper, we propose a new algorithmic framework with all these metrics approaching optimality. The salient features of our new algorithm are three-fold: (i) fast convergence: it converges with only

O(\log(1/\epsilon))

iterations that is the fastest speed among all the existing algorithms; (ii) low delay: it guarantees optimal utility with finite queue length; (iii) simple implementation: the control variables of this algorithm are based on virtual queues that do not require maintaining per-flow information. The new technique builds on a kind of inexact Uzawa method in the Alternating Directional Method of Multiplier, and provides a new theoretical path to prove global and linear convergence rate of such a method without requiring the full rank assumption of the constraint matrix

arXiv.org e-Print Archive

Crossref

On Variable-Scale Piecewise Stationary Spectral Analysis of Speech Signals for ASR

Author: Bourlard Hervé
Tyagi Vivek
Wellekens Christian
Publication venue: IDIAP
Publication date: 10/03/2006
Field of study

It is often acknowledged that speech signals contain short-term and long-term temporal properties that are difficult to capture and model by using the usual fixed scale (typically 20ms) short time spectral analysis used in hidden Markov models (HMMs), based on piecewise stationarity and state conditional independence assumptions of acoustic vectors. For example, vowels are typically quasi-stationary over 40-80ms segments, while plosives typically require analysis below 20ms segments. Thus, fixed scale analysis is clearly sub-optimal for ``optimal'' time-frequency resolution and modeling of different stationary phones found in the speech signal. In the present paper, we investigate the potential advantages of using variable size analysis windows towards improving state-of-the-art speech recognition systems. Based on the usual assumption that the speech signal can be modeled by a varying autoregressive (AR) Gaussian process, we estimate the largest piecewise quasi-stationary speech segments, based on the likelihood that a segment was generated by the same AR process. This likelihood is estimated from the Linear Prediction (LP) residual error. Each of these quasi-stationary segments is then used as an analysis window from which spectral features are extracted. Such an approach thus results in variable scale time spectral analysis, adaptively estimating the largest possible analysis window size such that the signal remains quasi-stationary, thus the best temporal/frequency resolution tradeoff. Speech recognition experiments on the OGI Numbers95 database show that the proposed multi-scale piecewise stationary spectral analysis based features indeed yield improved recognition accuracy in clean conditions, compared to features based on minimum cross entropy spectrum as well as those based on fixed scale spectral analysis

Infoscience - École polytechnique fédérale de Lausanne

On Variable-Scale Piecewise Stationary Spectral Analysis of Speech Signals for ASR

Author: Bourlard Hervé
Tyagi Vivek
Wellekens Christian
Publication venue: 'Elsevier BV'
Publication date: 11/02/2010
Field of study

It is often acknowledged that speech signals contain short-term and long-term temporal properties that are difficult to capture and model by using the usual fixed scale (typically 20ms) short time spectral analysis used in hidden Markov models (HMMs), based on piecewise stationarity and state conditional independence assumptions of acoustic vectors. For example, vowels are typically quasi-stationary over 40-80ms segments, while plosive typically require analysis below 20ms segments. Thus, fixed scale analysis is clearly sub-optimal for ``optimal'' time-frequency resolution and modeling of different stationary phones found in the speech signal. In the present paper, we investigate the potential advantages of using variable size analysis windows towards improving state-of-the-art speech recognition systems. Based on the usual assumption that the speech signal can be modeled by a time-varying autoregressive (AR) Gaussian process, we estimate the largest piecewise quasi-stationary speech segments, based on the likelihood that a segment was generated by the same AR process. This likelihood is estimated from the Linear Prediction (LP) residual error. Each of these quasi-stationary segments is then used as an analysis window from which spectral features are extracted. Such an approach thus results in a variable scale time spectral analysis, adaptively estimating the largest possible analysis window size such that the signal remains quasi-stationary, thus the best temporal/frequency resolution tradeoff. The speech recognition experiments on the OGI Numbers95 database, show that the proposed variable-scale piecewise stationary spectral analysis based features indeed yield improved recognition accuracy in clean conditions, compared to features based on minimum cross entropy spectrum as well as those based on fixed scale spectral analysis

Infoscience - École polytechnique fédérale de Lausanne

CiteSeerX

Semismooth Newton Coordinate Descent Algorithm for Elastic-Net Penalized Huber Loss Regression and Quantile Regression

Author: Huang Jian
Yi Congrui
Publication venue
Publication date: 20/05/2016
Field of study

We propose an algorithm, semismooth Newton coordinate descent (SNCD), for the elastic-net penalized Huber loss regression and quantile regression in high dimensional settings. Unlike existing coordinate descent type algorithms, the SNCD updates each regression coefficient and its corresponding subgradient simultaneously in each iteration. It combines the strengths of the coordinate descent and the semismooth Newton algorithm, and effectively solves the computational challenges posed by dimensionality and nonsmoothness. We establish the convergence properties of the algorithm. In addition, we present an adaptive version of the "strong rule" for screening predictors to gain extra efficiency. Through numerical experiments, we demonstrate that the proposed algorithm is very efficient and scalable to ultra-high dimensions. We illustrate the application via a real data example

arXiv.org e-Print Archive

FigShare

Studies on noise robust automatic speech recognition

Author: Kurimo Mikko
Palomäki Kalle J.
Remes Ulpu
Publication venue: Teknillinen korkeakoulu
Publication date: 01/01/2009
Field of study

Noise in everyday acoustic environments such as cars, traffic environments, and cafeterias remains one of the main challenges in automatic speech recognition (ASR). As a research theme, it has received wide attention in conferences and scientific journals focused on speech technology. This article collection reviews both the classic and novel approaches suggested for noise robust ASR. The articles are literature reviews written for the spring 2009 seminar course on noise robust automatic speech recognition (course code T-61.6060) held at TKK

Aaltodoc Publication Archive

Solving Inverse Problems with Piecewise Linear Estimators: From Gaussian Mixture Models to Structured Sparsity

Author: Mallat Stéphane
Sapiro Guillermo
Yu Guoshen
Publication venue
Publication date: 01/01/2010
Field of study

A general framework for solving image inverse problems is introduced in this paper. The approach is based on Gaussian mixture models, estimated via a computationally efficient MAP-EM algorithm. A dual mathematical interpretation of the proposed framework with structured sparse estimation is described, which shows that the resulting piecewise linear estimate stabilizes the estimation when compared to traditional sparse inverse problem techniques. This interpretation also suggests an effective dictionary motivated initialization for the MAP-EM algorithm. We demonstrate that in a number of image inverse problems, including inpainting, zooming, and deblurring, the same algorithm produces either equal, often significantly better, or very small margin worse results than the best published ones, at a lower computational cost.Comment: 30 page

arXiv.org e-Print Archive

CiteSeerX

University of Minnesota Digital Conservancy

Stochastic resonance in chua's circuit driven by alpha-stable noise

Author: Yılmaz Serpil
Publication venue: Izmir Institute of Technology
Publication date: 01/01/2012
Field of study

Thesis (Master)--Izmir Institute of Technology, Electronics and Communication Engineering, Izmir, 2012Includes bibliographical references (leaves: 75-80)Text in English; Abstract: Turkish and Englishx, 80 leavesThe main aim of this thesis is to investigate the stochastic resonance (SR) in Chua's circuit driven by alpha-stable noise which has better approximation to a real-world signal than Gaussian distribution. SR is a phenomenon in which the response of a nonlinear system to a sub-threshold (weak) input signal is enhanced with the addition of an optimal amount of noise. There have been an increasing amount of applications based on SR in various fields. Almost all studies related to SR in chaotic systems assume that the noise is Gaussian, which leads researchers to investigate the cases in which the noise is non-Gaussian hence has infinite variance. In this thesis, the spectral power amplification which is used to quantify the SR has been evaluated through fractional lower order Wigner Ville distribution of the response of a system and analyzed for various parameters of alpha-stable noise. The results provide a visible SR effect in Chuaâ€™s circuit driven by symmetric and skewed-symmetric alpha-stable noise distributions. Furthermore, a series of simulations reveal that the mean residence time that is the average time spent by the trajectory in an attractor can vary depending on different alpha-stable noise parameters

ARTICULATORY INFORMATION FOR ROBUST SPEECH RECOGNITION

Author: Mitra Vikramjit
Publication venue
Publication date: 01/01/2010
Field of study

Current Automatic Speech Recognition (ASR) systems fail to perform nearly as good as human speech recognition performance due to their lack of robustness against speech variability and noise contamination. The goal of this dissertation is to investigate these critical robustness issues, put forth different ways to address them and finally present an ASR architecture based upon these robustness criteria. Acoustic variations adversely affect the performance of current phone-based ASR systems, in which speech is modeled as `beads-on-a-string', where the beads are the individual phone units. While phone units are distinctive in cognitive domain, they are varying in the physical domain and their variation occurs due to a combination of factors including speech style, speaking rate etc.; a phenomenon commonly known as `coarticulation'. Traditional ASR systems address such coarticulatory variations by using contextualized phone-units such as triphones. Articulatory phonology accounts for coarticulatory variations by modeling speech as a constellation of constricting actions known as articulatory gestures. In such a framework, speech variations such as coarticulation and lenition are accounted for by gestural overlap in time and gestural reduction in space. To realize a gesture-based ASR system, articulatory gestures have to be inferred from the acoustic signal. At the initial stage of this research an initial study was performed using synthetically generated speech to obtain a proof-of-concept that articulatory gestures can indeed be recognized from the speech signal. It was observed that having vocal tract constriction trajectories (TVs) as intermediate representation facilitated the gesture recognition task from the speech signal. Presently no natural speech database contains articulatory gesture annotation; hence an automated iterative time-warping architecture is proposed that can annotate any natural speech database with articulatory gestures and TVs. Two natural speech databases: X-ray microbeam and Aurora-2 were annotated, where the former was used to train a TV-estimator and the latter was used to train a Dynamic Bayesian Network (DBN) based ASR architecture. The DBN architecture used two sets of observation: (a) acoustic features in the form of mel-frequency cepstral coefficients (MFCCs) and (b) TVs (estimated from the acoustic speech signal). In this setup the articulatory gestures were modeled as hidden random variables, hence eliminating the necessity for explicit gesture recognition. Word recognition results using the DBN architecture indicate that articulatory representations not only can help to account for coarticulatory variations but can also significantly improve the noise robustness of ASR system

CiteSeerX

Digital Repository at the University of Maryland

The application of continuous state HMMs to an automatic speech recognition task

Author: Seivwright Chloe
Publication venue
Publication date: 01/12/2023
Field of study

Hidden Markov Models (HMMs) have been a popular choice for automatic speech recognition (ASR) for several decades due to their mathematical formulation and computational efficiency, which has consistently resulted in a better performance compared to other methods during this period. However, HMMs are based on the assumption of statistical independence among speech frames, which conflicts with the physiological basis of speech production. Consequently, researchers have produced a substantial amount of literature to extend the HMM model assumptions and incorporate dynamic properties of speech into the underlying model. One such approach involves segmental models, which addresses a frame-wise independence assumption. However, the computational inefficiencies associated with segmental models have limited their practical application. In recent years, there has been a shift from HMM-based systems to neural networks (NN) and deep learning approaches, which offer superior performance com- pared to conventional statistical models. However, as the complexity of neural models increases, so does the number of parameters involved, requiring a greater dependency on training data to optimise model parameters. This present study extends prior research on segmental HMMs by introducing a Segmental Continuous-State Hidden Markov Model (CSHMM) examining a resolution to the issue of inter-segmental continuity. This is an alternative approach when compared to contemporary speech modelling methods that rely on data-centric NN techniques, with the goal of establishing a statistical model that more accurately reflects the speech production process. The Continuous-State Segmental model offers a flexible mathematical framework which can impose a continuity constraint between adjoining segments addressing a fundamental drawback of conventional HMMs, namely, the independence assumption. Additionally, the CSHMM also benefits from a practical training and decoding algorithm which overcomes the computational inefficiency inherent in conventional decoding algorithms for traditional Segmental HMMs. This study has formulated four trajectory-based segmental models using a CSHMM framework. CSHMMs have not been extensively studied for ASR tasks due to the absence of open-source standardised speech tool-kits that enable convenient exploration of CSHMMs. As a result, to perform sufficient experiments in this study, training and decoding software has been developed, which can be accessed in (Seivwright, 2015). The experiments in this study report baseline phone recognition results for the four distinct Segmental CSHMM systems using the TIMIT database. These baseline results are compared against a simple Hidden Markov Model-Gaussian Mixture Model (HMM- GMM) system. In all experiments, a compact acoustic feature representation in the form of bottleneck features (BNF), is employed, motivated by an investigation into the BNFs and their relationship to articulatory properties. Although the proposed CSHMM systems do not surpass discrete-state HMMs in performance, this research has demonstrated a strong association between inter-segmental continuity and the corresponding phonetic categories being modelled. Furthermore, this thesis presents a method for achieving finer control over continuity between segments, which can be expanded to investigate co-articulation in the context of CSHMMs

University of Birmingham Research Archive, E-theses Repository

On the identification and parametric modelling of offshore dynamic systems

Author: Mandal Sukomal
Publication venue: UCL (University College London)
Publication date: 01/01/1992
Field of study

This thesis describes an investigation into the analysis methods arising from identification aspects of the theory of dynamic systems with application to full-scale offshore monitoring and marine environmental data including target spectra. Based on the input and output of the dynamic system, the System Identification (SI) techniques are used first to identify the model type and then to estimate the model parameters. This work also gives an understanding of how to obtain a meaningful matching between the target (power spectra or time series data sets) and SI models with minimal loss of information. The SI techniques, namely. Autoregressive (AR), Moving Average (MA) and Autoregressive Moving Average (ARMA) algorithms are formulated in the frequency domain and also in the time domain. The above models can only be economically applicable provided the model order is low in the sense that it is computationally efficient and the lower order model can most appropriately represent the offshore time series records or the target spectra. For this purpose, the orders of the above SI models are optimally selected by Least Squares Error, Akaike Information Criterion and Minimum Description Length methods. A novel model order reduction technique is established to obtain the reduced order ARMA model. At first estimations of higher order AR coefficients are determined using modified Yule-Walker equations and then the first and second order real modes and their energies are determined. Considering only the higher energy modes, the AR part of the reduced order ARMA model is obtained. The MA part of the reduced order ARMA model is determined based on partial fraction and recursive methods. This model order reduction technique can remove the spurious noise modes which are present in the time series data. Therefore, firstly using an initial optimal AR model and then a model order reduction technique, the time series data or target spectrum can be reduced to a few parameters which are the coefficients of the reduced order ARMA model. The above univariate SI models and model order reduction techniques are successfully applied for marine environmental and structural monitoring data, including ocean waves, semi-submersible heave motions, monohull crane vessel motions and theoretical (Pierson- Moskowitz and JONSWAP) spectra. Univariate SI models are developed based on the assumption that the offshore dynamic systems are stationary random processes. For nonstationary processes, such as, measurements of combined sea waves and swells, or coupled responses of offshore structures with short period and long period motions, the time series are modelled by the Autoregressive Integrated Moving Average algorithms. The multivariate autoregressive (MAR) algorithm is developed to reduce the time series wave data sets into MAR model parameters. The MAR algorithms are described by feedback weighting coefficients matrices and the driving noise vector. These are obtained based on the estimation of the partial correlation of the time series data sets. Here the appropriate model order is selected based on auto and cross correlations and multivariate Akaike information criterion methods. These algorithms are applied to estimate MAR power spectral density spectra and then phase and coherence spectra of two time series wave data sets collected at a North Sea location. The estimation of MAR power spectral densities are compared with spectral estimates computed from a two variable fast Fourier transform, which show good agreement

UCL Discovery