1,974 research outputs found

    Dialogue Act Modeling for Automatic Tagging and Recognition of Conversational Speech

    Get PDF
    We describe a statistical approach for modeling dialogue acts in conversational speech, i.e., speech-act-like units such as Statement, Question, Backchannel, Agreement, Disagreement, and Apology. Our model detects and predicts dialogue acts based on lexical, collocational, and prosodic cues, as well as on the discourse coherence of the dialogue act sequence. The dialogue model is based on treating the discourse structure of a conversation as a hidden Markov model and the individual dialogue acts as observations emanating from the model states. Constraints on the likely sequence of dialogue acts are modeled via a dialogue act n-gram. The statistical dialogue grammar is combined with word n-grams, decision trees, and neural networks modeling the idiosyncratic lexical and prosodic manifestations of each dialogue act. We develop a probabilistic integration of speech recognition with dialogue modeling, to improve both speech recognition and dialogue act classification accuracy. Models are trained and evaluated using a large hand-labeled database of 1,155 conversations from the Switchboard corpus of spontaneous human-to-human telephone speech. We achieved good dialogue act labeling accuracy (65% based on errorful, automatically recognized words and prosody, and 71% based on word transcripts, compared to a chance baseline accuracy of 35% and human accuracy of 84%) and a small reduction in word recognition error.Comment: 35 pages, 5 figures. Changes in copy editing (note title spelling changed

    A statistical simulation technique to develop and evaluate conversational agents

    Get PDF
    In this paper, we present a technique for developing user simulators which are able to interact and evaluate conversational agents. Our technique is based on a statistical model that is automatically learned from a dialog corpus. This model is used by the user simulator to provide the next answer taking into account the complete history of the interaction. The main objective of our proposal is not only to evaluate the conversational agent, but also to improve this agent by employing the simulated dialogs to learn a better dialog model. We have applied this technique to design and evaluate a conversational agent which provides academic information in a multi-agent system. The results of the evaluation show that the proposed user simulation methodology can be used not only to evaluate conversational agents but also to explore new enhanced dialog strategies, thereby allowing the conversational agent to reduce the time needed to complete the dialogs and automatically detect new valid paths to achieve each of the required objectives defined for the task.This work was supported in part by Projects MINECO TEC2012-37832-C02-01, CICYT TEC 2011-28626-C02-02, CAM CONTEXTS (S2009/TIC-1485).Publicad

    An application of deep learning for exchange rate forecasting

    Get PDF
    This paper examines the performance of several state-of-the-art deep learning techniques for exchange rate forecasting (deep feedforward network, convolutional network and a long short-term memory). On the one hand, the configuration of the different architectures is clearly detailed, as well as the tuning of the parameters and the regularisation techniques used to avoid overfitting. On the other hand, we design an out-of-sample forecasting experiment and evaluate the accuracy of three different deep neural networks to predict the US/UK foreign exchange rate in the days after the Brexit took effect. Of the three configurations, we obtain the best results with the deep feedforward architecture. When comparing the deep learning networks to time-series models used as a benchmark, the obtained results are highly dependent on the specific topology used in each case. Thus, although the three architectures generate more accurate predictions than the time-series models, the results vary considerably depending on the specific topology. These results hint at the potential of deep learning techniques, but they also highlight the importance of properly configuring, implementing and selecting the different topologies

    Viseme-based Lip-Reading using Deep Learning

    Get PDF
    Research in Automated Lip Reading is an incredibly rich discipline with so many facets that have been the subject of investigation including audio-visual data, feature extraction, classification networks and classification schemas. The most advanced and up-to-date lip-reading systems can predict entire sentences with thousands of different words and the majority of them use ASCII characters as the classification schema. The classification performance of such systems however has been insufficient and the need to cover an ever expanding range of vocabulary using as few classes as possible is challenge. The work in this thesis contributes to the area concerning classification schemas by proposing an automated lip reading model that predicts sentences using visemes as a classification schema. This is an alternative schema to using ASCII characters, which is the conventional class system used to predict sentences. This thesis provides a review of the current trends in deep learning- based automated lip reading and analyses a gap in the research endeavours of automated lip-reading by contributing towards work done in the region of classification schema. A whole new line of research is opened up whereby an alternative way to do lip-reading is explored and in doing so, lip-reading performance results for predicting s entences from a benchmark dataset are attained which improve upon the current state-of-the-art. In this thesis, a neural network-based lip reading system is proposed. The system is lexicon-free and uses purely visual cues. With only a limited number of visemes as classes to recognise, the system is designed to lip read sentences covering a wide range of vocabulary and to recognise words that may not be included in system training. The lip-reading system predicts sentences as a two-stage procedure with visemes being recognised as the first stage and words being classified as the second stage. This is such that the second-stage has to both overcome the one-to-many mapping problem posed in lip-reading where one set of visemes can map to several words, and the problem of visemes being confused or misclassified to begin with. To develop the proposed lip-reading system, a number of tasks have been performed in this thesis. These include the classification of continuous sequences of visemes; and the proposal of viseme-to-word conversion models that are both effective in their conversion performance of predicting words, and robust to the possibility of viseme confusion or misclassification. The initial system reported has been testified on the challenging BBC Lip Reading Sentences 2 (LRS2) benchmark dataset attaining a word accuracy rate of 64.6%. Compared with the state-of-the-art works in lip reading sentences reported at the time, the system had achieved a significantly improved performance. The lip reading system is further improved upon by using a language model that has been demonstrated to be effective at discriminating between homopheme words and being robust to incorrectly classified visemes. An improved performance in predicting spoken sentences from the LRS2 dataset is yielded with an attained word accuracy rate of 79.6% which is still better than another lip-reading system trained and evaluated on the the same dataset that attained a word accuracy rate 77.4% and it is to the best of our knowledge the next best observed result attained on LRS2

    Evaluation of different chrominance models in the detection and reconstruction of faces and hands using the growing neural gas network

    Get PDF
    Physical traits such as the shape of the hand and face can be used for human recognition and identification in video surveillance systems and in biometric authentication smart card systems, as well as in personal health care. However, the accuracy of such systems suffers from illumination changes, unpredictability, and variability in appearance (e.g. occluded faces or hands, cluttered backgrounds, etc.). This work evaluates different statistical and chrominance models in different environments with increasingly cluttered backgrounds where changes in lighting are common and with no occlusions applied, in order to get a reliable neural network reconstruction of faces and hands, without taking into account the structural and temporal kinematics of the hands. First a statistical model is used for skin colour segmentation to roughly locate hands and faces. Then a neural network is used to reconstruct in 3D the hands and faces. For the filtering and the reconstruction we have used the growing neural gas algorithm which can preserve the topology of an object without restarting the learning process. Experiments conducted on our own database but also on four benchmark databases (Stirling’s, Alicante, Essex, and Stegmann’s) and on deaf individuals from normal 2D videos are freely available on the BSL signbank dataset. Results demonstrate the validity of our system to solve problems of face and hand segmentation and reconstruction under different environmental conditions

    Bernoulli HMMs for Handwritten Text Recognition

    Full text link
    In last years Hidden Markov Models (HMMs) have received significant attention in the task off-line handwritten text recognition (HTR). As in automatic speech recognition (ASR), HMMs are used to model the probability of an observation sequence, given its corresponding text transcription. However, in contrast to what happens in ASR, in HTR there is no standard set of local features being used by most of the proposed systems. In this thesis we propose the use of raw binary pixels as features, in conjunction with models that deal more directly with the binary data. In particular, we propose the use of Bernoulli HMMs (BHMMs), that is, conventional HMMs in which Gaussian (mixture) distributions have been replaced by Bernoulli (mixture) probability functions. The objective is twofold: on the one hand, this allows us to better modeling the binary nature of text images (foreground/background) using BHMMs. On the other hand, this guarantees that no discriminative information is filtered out during feature extraction (most HTR available datasets can be easily binarized without a relevant loss of information). In this thesis, all the HMM theory required to develop a HMM based HTR toolkit is reviewed and adapted to the case of BHMMs. Specifically, we begin by defining a simple classifier based on BHMMs with Bernoulli probability functions at the states, and we end with an embedded Bernoulli mixture HMM recognizer for continuous HTR. Regarding the binary features, we propose a simple binary feature extraction process without significant loss of information. All input images are scaled and binarized, in order to easily reinterpret them as sequences of binary feature vectors. Two extensions are proposed to this basic feature extraction method: the use of a sliding window in order to better capture the context, and a repositioning method in order to better deal with vertical distortions. Competitive results were obtained when BHMMs and proposed methods were applied to well-known HTR databases. In particular, we ranked first at the Arabic Handwriting Recognition Competition organized during the 12th International Conference on Frontiers in Handwriting Recognition (ICFHR 2010), and at the Arabic Recognition Competition: Multi-font Multi-size Digitally Represented Text organized during the 11th International Conference on Document Analysis and Recognition (ICDAR 2011). In the last part of this thesis we propose a method for training BHMM classifiers using In last years Hidden Markov Models (HMMs) have received significant attention in the task off-line handwritten text recognition (HTR). As in automatic speech recognition (ASR), HMMs are used to model the probability of an observation sequence, given its corresponding text transcription. However, in contrast to what happens in ASR, in HTR there is no standard set of local features being used by most of the proposed systems. In this thesis we propose the use of raw binary pixels as features, in conjunction with models that deal more directly with the binary data. In particular, we propose the use of Bernoulli HMMs (BHMMs), that is, conventional HMMs in which Gaussian (mixture) distributions have been replaced by Bernoulli (mixture) probability functions. The objective is twofold: on the one hand, this allows us to better modeling the binary nature of text images (foreground/background) using BHMMs. On the other hand, this guarantees that no discriminative information is filtered out during feature extraction (most HTR available datasets can be easily binarized without a relevant loss of information). In this thesis, all the HMM theory required to develop a HMM based HTR toolkit is reviewed and adapted to the case of BHMMs. Specifically, we begin by defining a simple classifier based on BHMMs with Bernoulli probability functions at the states, and we end with an embedded Bernoulli mixture HMM recognizer for continuous HTR. Regarding the binary features, we propose a simple binary feature extraction process without significant loss of information. All input images are scaled and binarized, in order to easily reinterpret them as sequences of binary feature vectors. Two extensions are proposed to this basic feature extraction method: the use of a sliding window in order to better capture the context, and a repositioning method in order to better deal with vertical distortions. Competitive results were obtained when BHMMs and proposed methods were applied to well-known HTR databases. In particular, we ranked first at the Arabic Handwriting Recognition Competition organized during the 12th International Conference on Frontiers in Handwriting Recognition (ICFHR 2010), and at the Arabic Recognition Competition: Multi-font Multi-size Digitally Represented Text organized during the 11th International Conference on Document Analysis and Recognition (ICDAR 2011). In the last part of this thesis we propose a method for training BHMM classifiers using In last years Hidden Markov Models (HMMs) have received significant attention in the task off-line handwritten text recognition (HTR). As in automatic speech recognition (ASR), HMMs are used to model the probability of an observation sequence, given its corresponding text transcription. However, in contrast to what happens in ASR, in HTR there is no standard set of local features being used by most of the proposed systems. In this thesis we propose the use of raw binary pixels as features, in conjunction with models that deal more directly with the binary data. In particular, we propose the use of Bernoulli HMMs (BHMMs), that is, conventional HMMs in which Gaussian (mixture) distributions have been replaced by Bernoulli (mixture) probability functions. The objective is twofold: on the one hand, this allows us to better modeling the binary nature of text images (foreground/background) using BHMMs. On the other hand, this guarantees that no discriminative information is filtered out during feature extraction (most HTR available datasets can be easily binarized without a relevant loss of information). In this thesis, all the HMM theory required to develop a HMM based HTR toolkit is reviewed and adapted to the case of BHMMs. Specifically, we begin by defining a simple classifier based on BHMMs with Bernoulli probability functions at the states, and we end with an embedded Bernoulli mixture HMM recognizer for continuous HTR. Regarding the binary features, we propose a simple binary feature extraction process without significant loss of information. All input images are scaled and binarized, in order to easily reinterpret them as sequences of binary feature vectors. Two extensions are proposed to this basic feature extraction method: the use of a sliding window in order to better capture the context, and a repositioning method in order to better deal with vertical distortions. Competitive results were obtained when BHMMs and proposed methods were applied to well-known HTR databases. In particular, we ranked first at the Arabic Handwriting Recognition Competition organized during the 12th International Conference on Frontiers in Handwriting Recognition (ICFHR 2010), and at the Arabic Recognition Competition: Multi-font Multi-size Digitally Represented Text organized during the 11th International Conference on Document Analysis and Recognition (ICDAR 2011). In the last part of this thesis we propose a method for training BHMM classifiers using discriminative training criteria, instead of the conventionalMaximum Likelihood Estimation (MLE). Specifically, we propose a log-linear classifier for binary data based on the BHMM classifier. Parameter estimation of this model can be carried out using discriminative training criteria for log-linear models. In particular, we show the formulae for several MMI based criteria. Finally, we prove the equivalence between both classifiers, hence, discriminative training of a BHMM classifier can be carried out by obtaining its equivalent log-linear classifier. Reported results show that discriminative BHMMs clearly outperform conventional generative BHMMs.Giménez Pastor, A. (2014). Bernoulli HMMs for Handwritten Text Recognition [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/37978TESI

    On the use of high-level information in speaker and language recognition

    Full text link
    Actas de las IV Jornadas de Tecnología del Habla (JTH 2006)Automatic Speaker Recognition systems have been largely dominated by acoustic-spectral based systems, relying in proper modelling of the short-term vocal tract of speakers. However, there is scientific and intuitive evidence that speaker specific information is embedded in the speech signal in multiple short- and long-term characteristics. In this work, a multilevel speaker recognition system combining acoustic, phonotactic and prosodic subsystems is presented and assessed using NIST 2005 Speaker Recognition Evaluation data. For language recognition systems, the NIST 2005 Language Recognition Evaluation was selected to measure performance of a high-level language recognition systems

    A New Surrogating Algorithm by the Complex Graph Fourier Transform (CGFT)

    Full text link
    [EN] The essential step of surrogating algorithms is phase randomizing the Fourier transform while preserving the original spectrum amplitude before computing the inverse Fourier transform. In this paper, we propose a new method which considers the graph Fourier transform. In this manner, much more flexibility is gained to define properties of the original graph signal which are to be preserved in the surrogates. The complex case is considered to allow unconstrained phase randomization in the transformed domain, hence we define a Hermitian Laplacian matrix that models the graph topology, whose eigenvectors form the basis of a complex graph Fourier transform. We have shown that the Hermitian Laplacian matrix may have negative eigenvalues. We also show in the paper that preserving the graph spectrum amplitude implies several invariances that can be controlled by the selected Hermitian Laplacian matrix. The interest of surrogating graph signals has been illustrated in the context of scarcity of instances in classifier training.This research was funded by the Spanish Administration and the European Union under grant TEC2017-84743-P.Belda, J.; Vergara Domínguez, L.; Safont Armero, G.; Salazar Afanador, A.; Parcheta, Z. (2019). A New Surrogating Algorithm by the Complex Graph Fourier Transform (CGFT). Entropy. 21(8):1-18. https://doi.org/10.3390/e21080759S118218Schreiber, T., & Schmitz, A. (2000). Surrogate time series. Physica D: Nonlinear Phenomena, 142(3-4), 346-382. doi:10.1016/s0167-2789(00)00043-9Miralles, R., Vergara, L., Salazar, A., & Igual, J. (2008). Blind detection of nonlinearities in multiple-echo ultrasonic signals. IEEE Transactions on Ultrasonics, Ferroelectrics and Frequency Control, 55(3), 637-647. doi:10.1109/tuffc.2008.688Mandic, D. ., Chen, M., Gautama, T., Van Hulle, M. ., & Constantinides, A. (2008). On the characterization of the deterministic/stochastic and linear/nonlinear nature of time series. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 464(2093), 1141-1160. doi:10.1098/rspa.2007.0154Rios, R. A., Small, M., & de Mello, R. F. (2015). Testing for Linear and Nonlinear Gaussian Processes in Nonstationary Time Series. International Journal of Bifurcation and Chaos, 25(01), 1550013. doi:10.1142/s0218127415500133Borgnat, P., Flandrin, P., Honeine, P., Richard, C., & Xiao, J. (2010). Testing Stationarity With Surrogates: A Time-Frequency Approach. IEEE Transactions on Signal Processing, 58(7), 3459-3470. doi:10.1109/tsp.2010.2043971Shuman, D. I., Narang, S. K., Frossard, P., Ortega, A., & Vandergheynst, P. (2013). The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains. IEEE Signal Processing Magazine, 30(3), 83-98. doi:10.1109/msp.2012.2235192Sandryhaila, A., & Moura, J. M. F. (2013). Discrete Signal Processing on Graphs. IEEE Transactions on Signal Processing, 61(7), 1644-1656. doi:10.1109/tsp.2013.2238935Sandryhaila, A., & Moura, J. M. F. (2014). Big Data Analysis with Signal Processing on Graphs: Representation and processing of massive data sets with irregular structure. IEEE Signal Processing Magazine, 31(5), 80-90. doi:10.1109/msp.2014.2329213Pirondini, E., Vybornova, A., Coscia, M., & Van De Ville, D. (2016). A Spectral Method for Generating Surrogate Graph Signals. IEEE Signal Processing Letters, 23(9), 1275-1278. doi:10.1109/lsp.2016.2594072Sandryhaila, A., & Moura, J. M. F. (2014). Discrete Signal Processing on Graphs: Frequency Analysis. IEEE Transactions on Signal Processing, 62(12), 3042-3054. doi:10.1109/tsp.2014.2321121Shuman, D. I., Ricaud, B., & Vandergheynst, P. (2016). Vertex-frequency analysis on graphs. Applied and Computational Harmonic Analysis, 40(2), 260-291. doi:10.1016/j.acha.2015.02.005Dong, X., Thanou, D., Frossard, P., & Vandergheynst, P. (2016). Learning Laplacian Matrix in Smooth Graph Signal Representations. IEEE Transactions on Signal Processing, 64(23), 6160-6173. doi:10.1109/tsp.2016.2602809Perraudin, N., & Vandergheynst, P. (2017). Stationary Signal Processing on Graphs. IEEE Transactions on Signal Processing, 65(13), 3462-3477. doi:10.1109/tsp.2017.2690388Yu, G., & Qu, H. (2015). Hermitian Laplacian matrix and positive of mixed graphs. Applied Mathematics and Computation, 269, 70-76. doi:10.1016/j.amc.2015.07.045Gilbert, G. T. (1991). Positive Definite Matrices and Sylvester’s Criterion. The American Mathematical Monthly, 98(1), 44-46. doi:10.1080/00029890.1991.11995702Merris, R. (1994). Laplacian matrices of graphs: a survey. Linear Algebra and its Applications, 197-198, 143-176. doi:10.1016/0024-3795(94)90486-3Shapiro, H. (1991). A survey of canonical forms and invariants for unitary similarity. Linear Algebra and its Applications, 147, 101-167. doi:10.1016/0024-3795(91)90232-lFutorny, V., Horn, R. A., & Sergeichuk, V. V. (2017). Specht’s criterion for systems of linear mappings. Linear Algebra and its Applications, 519, 278-295. doi:10.1016/j.laa.2017.01.006Mazumder, R., & Hastie, T. (2012). The graphical lasso: New insights and alternatives. Electronic Journal of Statistics, 6(0), 2125-2149. doi:10.1214/12-ejs740Baba, K., Shibata, R., & Sibuya, M. (2004). PARTIAL CORRELATION AND CONDITIONAL CORRELATION AS MEASURES OF CONDITIONAL INDEPENDENCE. Australian New Zealand Journal of Statistics, 46(4), 657-664. doi:10.1111/j.1467-842x.2004.00360.xChen, X., Xu, M., & Wu, W. B. (2013). Covariance and precision matrix estimation for high-dimensional time series. The Annals of Statistics, 41(6), 2994-3021. doi:10.1214/13-aos1182Theiler, J., Eubank, S., Longtin, A., Galdrikian, B., & Doyne Farmer, J. (1992). Testing for nonlinearity in time series: the method of surrogate data. Physica D: Nonlinear Phenomena, 58(1-4), 77-94. doi:10.1016/0167-2789(92)90102-sSchreiber, T., & Schmitz, A. (1996). Improved Surrogate Data for Nonlinearity Tests. Physical Review Letters, 77(4), 635-638. doi:10.1103/physrevlett.77.635MAMMEN, E., NANDI, S., MAIWALD, T., & TIMMER, J. (2009). EFFECT OF JUMP DISCONTINUITY FOR PHASE-RANDOMIZED SURROGATE DATA TESTING. International Journal of Bifurcation and Chaos, 19(01), 403-408. doi:10.1142/s0218127409022968Lucio, J. H., Valdés, R., & Rodríguez, L. R. (2012). Improvements to surrogate data methods for nonstationary time series. Physical Review E, 85(5). doi:10.1103/physreve.85.056202Schreiber, T. (1998). Constrained Randomization of Time Series Data. Physical Review Letters, 80(10), 2105-2108. doi:10.1103/physrevlett.80.2105Prichard, D., & Theiler, J. (1994). Generating surrogate data for time series with several simultaneously measured variables. Physical Review Letters, 73(7), 951-954. doi:10.1103/physrevlett.73.951Belda, J., Vergara, L., Salazar, A., & Safont, G. (2018). Estimating the Laplacian matrix of Gaussian mixtures for signal processing on graphs. Signal Processing, 148, 241-249. doi:10.1016/j.sigpro.2018.02.017Belda, J., Vergara, L., Safont, G., & Salazar, A. (2018). Computing the Partial Correlation of ICA Models for Non-Gaussian Graph Signal Processing. Entropy, 21(1), 22. doi:10.3390/e21010022Liao, T. W. (2008). Classification of weld flaws with imbalanced class data. Expert Systems with Applications, 35(3), 1041-1052. doi:10.1016/j.eswa.2007.08.044Song, S.-J., & Shin, Y.-K. (2000). Eddy current flaw characterization in tubes by neural networks and finite element modeling. NDT & E International, 33(4), 233-243. doi:10.1016/s0963-8695(99)00046-8Bhattacharyya, S., Jha, S., Tharakunnel, K., & Westland, J. C. (2011). Data mining for credit card fraud: A comparative study. Decision Support Systems, 50(3), 602-613. doi:10.1016/j.dss.2010.08.008Mitra, S., & Acharya, T. (2007). Gesture Recognition: A Survey. IEEE Transactions on Systems, Man and Cybernetics, Part C (Applications and Reviews), 37(3), 311-324. doi:10.1109/tsmcc.2007.893280Dardas, N. H., & Georganas, N. D. (2011). Real-Time Hand Gesture Detection and Recognition Using Bag-of-Features and Support Vector Machine Techniques. IEEE Transactions on Instrumentation and Measurement, 60(11), 3592-3607. doi:10.1109/tim.2011.2161140Boashash, B. (1992). Estimating and interpreting the instantaneous frequency of a signal. I. Fundamentals. Proceedings of the IEEE, 80(4), 520-538. doi:10.1109/5.135376Horn, A. (1954). Doubly Stochastic Matrices and the Diagonal of a Rotation Matrix. American Journal of Mathematics, 76(3), 620. doi:10.2307/237270

    Gaussian tree constraints applied to acoustic linguistic functional data

    Get PDF
    Evolutionary models of languages are usually considered to take the form of trees. With the development of so-called tree constraints the plausibility of the tree model assumptions can be assessed by checking whether the moments of observed variables lie within regions consistent with Gaussian latent tree models. In our linguistic application, the data set comprises acoustic samples (audio recordings) from speakers of five Romance languages or dialects. The aim is to assess these functional data for compatibility with a hereditary tree model at the language level. A novel combination of canonical function analysis (CFA) with a separable covariance structure produces a representative basis for the data. The separable-CFA basis is formed of components which emphasize language differences whilst maintaining the integrity of the observational language-groupings. A previously unexploited Gaussian tree constraint is then applied to component-by-component projections of the data to investigate adherence to an evolutionary tree. The results highlight some aspects of Romance language speech that appear compatible with an evolutionary tree model but indicates that it would be inappropriate to model all features as such
    • …
    corecore