15,749 research outputs found
A summary of the 2012 JHU CLSP Workshop on Zero Resource Speech Technologies and Models of Early Language Acquisition
We summarize the accomplishments of a multi-disciplinary workshop exploring the computational and scientific issues surrounding zero resource (unsupervised) speech technologies and related models of early language acquisition. Centered around the tasks of phonetic and lexical discovery, we consider unified evaluation metrics, present two new approaches for improving speaker independence in the absence of supervision, and evaluate the application of Bayesian word segmentation algorithms to automatic subword unit tokenizations. Finally, we present two strategies for integrating zero resource techniques into supervised settings, demonstrating the potential of unsupervised methods to improve mainstream technologies.5 page(s
Multivariate time series classification with temporal abstractions
The increase in the number of complex temporal datasets collected today has prompted the development of methods that extend classical machine learning and data mining methods to time-series data. This work focuses on methods for multivariate time-series classification. Time series classification is a challenging problem mostly because the number of temporal features that describe the data and are potentially useful for classification is enormous. We study and develop a temporal abstraction framework for generating multivariate time series features suitable for classification tasks. We propose the STF-Mine algorithm that automatically mines discriminative temporal abstraction patterns from the time series data and uses them to learn a classification model. Our experimental evaluations, carried out on both synthetic and real world medical data, demonstrate the benefit of our approach in learning accurate classifiers for time-series datasets. Copyright © 2009, Assocation for the Advancement of ArtdicaI Intelligence (www.aaai.org). All rights reserved
Analysis of Three-Dimensional Protein Images
A fundamental goal of research in molecular biology is to understand protein
structure. Protein crystallography is currently the most successful method for
determining the three-dimensional (3D) conformation of a protein, yet it
remains labor intensive and relies on an expert's ability to derive and
evaluate a protein scene model. In this paper, the problem of protein structure
determination is formulated as an exercise in scene analysis. A computational
methodology is presented in which a 3D image of a protein is segmented into a
graph of critical points. Bayesian and certainty factor approaches are
described and used to analyze critical point graphs and identify meaningful
substructures, such as alpha-helices and beta-sheets. Results of applying the
methodologies to protein images at low and medium resolution are reported. The
research is related to approaches to representation, segmentation and
classification in vision, as well as to top-down approaches to protein
structure prediction.Comment: See http://www.jair.org/ for any accompanying file
Bayesian Models for Unit Discovery on a Very Low Resource Language
Developing speech technologies for low-resource languages has become a very
active research field over the last decade. Among others, Bayesian models have
shown some promising results on artificial examples but still lack of in situ
experiments. Our work applies state-of-the-art Bayesian models to unsupervised
Acoustic Unit Discovery (AUD) in a real low-resource language scenario. We also
show that Bayesian models can naturally integrate information from other
resourceful languages by means of informative prior leading to more consistent
discovered units. Finally, discovered acoustic units are used, either as the
1-best sequence or as a lattice, to perform word segmentation. Word
segmentation results show that this Bayesian approach clearly outperforms a
Segmental-DTW baseline on the same corpus.Comment: Accepted to ICASSP 201
Recommended from our members
Covariate-assisted ranking and screening for large-scale two-sample inference
Two-sample multiple testing has a wide range of applications. The conventionalpractice first reduces the original observations to a vector of p-values and then chooses a cutoffto adjust for multiplicity. However, this data reduction step could cause significant loss ofinformation and thus lead to suboptimal testing procedures.We introduce a new framework fortwo-sample multiple testing by incorporating a carefully constructed auxiliary variable in inferenceto improve the power. A data-driven multiple-testing procedure is developed by employinga covariate-assisted ranking and screening (CARS) approach that optimally combines the informationfrom both the primary and the auxiliary variables. The proposed CARS procedureis shown to be asymptotically valid and optimal for false discovery rate control. The procedureis implemented in the R package CARS. Numerical results confirm the effectiveness of CARSin false discovery rate control and show that it achieves substantial power gain over existingmethods. CARS is also illustrated through an application to the analysis of a satellite imagingdata set for supernova detection
- …