63 research outputs found
Model-Based Clustering and Classification of Functional Data
The problem of complex data analysis is a central topic of modern statistical
science and learning systems and is becoming of broader interest with the
increasing prevalence of high-dimensional data. The challenge is to develop
statistical models and autonomous algorithms that are able to acquire knowledge
from raw data for exploratory analysis, which can be achieved through
clustering techniques or to make predictions of future data via classification
(i.e., discriminant analysis) techniques. Latent data models, including mixture
model-based approaches are one of the most popular and successful approaches in
both the unsupervised context (i.e., clustering) and the supervised one (i.e,
classification or discrimination). Although traditionally tools of multivariate
analysis, they are growing in popularity when considered in the framework of
functional data analysis (FDA). FDA is the data analysis paradigm in which the
individual data units are functions (e.g., curves, surfaces), rather than
simple vectors. In many areas of application, the analyzed data are indeed
often available in the form of discretized values of functions or curves (e.g.,
time series, waveforms) and surfaces (e.g., 2d-images, spatio-temporal data).
This functional aspect of the data adds additional difficulties compared to the
case of a classical multivariate (non-functional) data analysis. We review and
present approaches for model-based clustering and classification of functional
data. We derive well-established statistical models along with efficient
algorithmic tools to address problems regarding the clustering and the
classification of these high-dimensional data, including their heterogeneity,
missing information, and dynamical hidden structure. The presented models and
algorithms are illustrated on real-world functional data analysis problems from
several application area
Positive Definite Kernels in Machine Learning
This survey is an introduction to positive definite kernels and the set of
methods they have inspired in the machine learning literature, namely kernel
methods. We first discuss some properties of positive definite kernels as well
as reproducing kernel Hibert spaces, the natural extension of the set of
functions associated with a kernel defined
on a space . We discuss at length the construction of kernel
functions that take advantage of well-known statistical models. We provide an
overview of numerous data-analysis methods which take advantage of reproducing
kernel Hilbert spaces and discuss the idea of combining several kernels to
improve the performance on certain tasks. We also provide a short cookbook of
different kernels which are particularly useful for certain data-types such as
images, graphs or speech segments.Comment: draft. corrected a typo in figure
Powerpoint Controller Using Speech Recognition
During presentation, it is hard to maintain the slide because we need to stand in front of the room and often not able to touch the computer. The presenters need to take attention at both their voice, and body language such eye contact, facial expression, posture, gesture, and body orientation. Microsoft PowerPoint is a simple but very useful tool to create digital presentation. Even though it is simple to use, but this application required the presenter to take control while using it, such as to star the slide show or moving it to the next slide. The purpose of this research is to minimize physical contact between user and the computer during the presentation by controlling the move of the slide using voice. This research will implement the Hidden Markov Model algorithm and Sphinx-4 library
Speech Recognition
Chapters in the first part of the book cover all the essential speech processing techniques for building robust, automatic speech recognition systems: the representation for speech signals and the methods for speech-features extraction, acoustic and language modeling, efficient algorithms for searching the hypothesis space, and multimodal approaches to speech recognition. The last part of the book is devoted to other speech processing applications that can use the information from automatic speech recognition for speaker identification and tracking, for prosody modeling in emotion-detection systems and in other speech processing applications that are able to operate in real-world environments, like mobile communication services and smart homes
Discriminative, generative, and imitative learning
Thesis (Ph. D.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2002.Includes bibliographical references (leaves 201-212).I propose a common framework that combines three different paradigms in machine learning: generative, discriminative and imitative learning. A generative probabilistic distribution is a principled way to model many machine learning and machine perception problems. Therein, one provides domain specific knowledge in terms of structure and parameter priors over the joint space of variables. Bayesian networks and Bayesian statistics provide a rich and flexible language for specifying this knowledge and subsequently refining it with data and observations. The final result is a distribution that is a good generator of novel exemplars. Conversely, discriminative algorithms adjust a possibly non-distributional model to data optimizing for a specific task, such as classification or prediction. This typically leads to superior performance yet compromises the flexibility of generative modeling. I present Maximum Entropy Discrimination (MED) as a framework to combine both discriminative estimation and generative probability densities. Calculations involve distributions over parameters, margins, and priors and are provably and uniquely solvable for the exponential family. Extensions include regression, feature selection, and transduction. SVMs are also naturally subsumed and can be augmented with, for example, feature selection, to obtain substantial improvements. To extend to mixtures of exponential families, I derive a discriminative variant of the Expectation-Maximization (EM) algorithm for latent discriminative learning (or latent MED).(cont.) While EM and Jensen lower bound log-likelihood, a dual upper bound is made possible via a novel reverse-Jensen inequality. The variational upper bound on latent log-likelihood has the same form as EM bounds, is computable efficiently and is globally guaranteed. It permits powerful discriminative learning with the wide range of contemporary probabilistic mixture models (mixtures of Gaussians, mixtures of multinomials and hidden Markov models). We provide empirical results on standardized data sets that demonstrate the viability of the hybrid discriminative-generative approaches of MED and reverse-Jensen bounds over state of the art discriminative techniques or generative approaches. Subsequently, imitative learning is presented as another variation on generative modeling which also learns from exemplars from an observed data source. However, the distinction is that the generative model is an agent that is interacting in a much more complex surrounding external world. It is not efficient to model the aggregate space in a generative setting. I demonstrate that imitative learning (under appropriate conditions) can be adequately addressed as a discriminative prediction task which outperforms the usual generative approach. This discriminative-imitative learning approach is applied with a generative perceptual system to synthesize a real-time agent that learns to engage in social interactive behavior.by Tony Jebara.Ph.D
Proceedings of the 35th International Workshop on Statistical Modelling : July 20- 24, 2020 Bilbao, Basque Country, Spain
466 p.The InternationalWorkshop on Statistical Modelling (IWSM) is a reference workshop in promoting statistical modelling, applications of Statistics for researchers, academics and industrialist in a broad sense. Unfortunately, the global COVID-19 pandemic has not allowed holding the 35th edition of the IWSM in Bilbao in July 2020. Despite the situation and following the spirit of the Workshop and the Statistical Modelling Society, we are delighted to bring you the proceedings book of extended abstracts
- …