49 research outputs found
Handwriting Recognition of Historical Documents with few labeled data
Historical documents present many challenges for offline handwriting
recognition systems, among them, the segmentation and labeling steps. Carefully
annotated textlines are needed to train an HTR system. In some scenarios,
transcripts are only available at the paragraph level with no text-line
information. In this work, we demonstrate how to train an HTR system with few
labeled data. Specifically, we train a deep convolutional recurrent neural
network (CRNN) system on only 10% of manually labeled text-line data from a
dataset and propose an incremental training procedure that covers the rest of
the data. Performance is further increased by augmenting the training set with
specially crafted multiscale data. We also propose a model-based normalization
scheme which considers the variability in the writing scale at the recognition
phase. We apply this approach to the publicly available READ dataset. Our
system achieved the second best result during the ICDAR2017 competition
Synchronous Alignment
In speaker verification, the maximum Likelihood between criterion is generally used to verify the claimed identity. This is done using two independent models, i.e. a Client model and a World model. It may be interesting to make both models share the same topology, which represent the phonetic underlying structure, and then to consider two different output distributions corresponding to the Client/World hypotheses. Based on this idea, a decoding algorithm and the corresponding training algorithm were derived. The first experiments show, on a significant telephone database, a small improvement with respect to the reference system, we can conclude that at least synchronous alignment provides equivalent results to the reference system with a reduced complexity decoding algorithm. Other important perspectives can be derived
Latent Semantic Indexing by Self-Organizing Map
An important problem for the information retrieval from spoken documents is how to extract those relevant documents which are poorly decoded by the speech recognizer. In this paper we propose a stochastic index for the documents based on the Latent Semantic Analysis (LSA) of the decoded document contents. The original LSA approach uses Singular Value Decomposition to reduce the dimensionality of the documents. As an alternative, we propose a computationally more feasible solution using Random Mapping (RM) and Self-Organizing Maps (SOM). The motivation for clustering the documents by SOM is to reduce the effect of recognition errors and to extract new characteristic index terms. Experimental indexing results are presented using relevance judgments for the retrieval results of test queries and using a document perplexity defined in this paper to measure the power of the index models
Direction of Arrival Estimation using EM-ESPRIT with nonuniform arrays
International audienceAbstract This paper deals with the problem of the Direction Of Arrival (DOA) estimation with nonuniform linear arrays. The proposed method is based on the Expectation Maximization method where ESPRIT is used in the maximization step. The key idea is to iteratively interpolate the data to a virtual uniform linear array in order to apply ESPRIT to estimate the DOA. The iterative approach allows to improve the interpolation using the previously estimated DOA. One of this method novelties lies in its capacity of dealing with any nonuniform array geometry. This technique manifests significant performance and computational advantages over previous algorithms such as Spectral MUSIC, EM-IQML and the method based on manifold separation technique. EM-ESPRIT is shown to be more robust to additive noise. Furthermore, EM-ESPRIT fully exploits the advantages of using a nonuniform array over a uniform array: simulations show that for the same aperture and with less number of sensors, the nonuniform array presents almost identical performance as the equivalent uniform array
Combining Wavelet-domain Hidden Markov Trees with Hidden Markov Models
In this paper, the concept of Wavelet-domain Hidden Markov Trees (WHMT) is introduced to Automatic Speech Recognition. WHMT are a convenient means to model the structure of wavelet feature vectors, as wavelet coefficients can be interpreted as nodes in a binary tree. By the introduction of hidden states in each node, non-Gaussian statistics inherent in wavelet features can be modeled. At the same time, correlations between neighboring coefficients in the time-frequency plane are accommodated. Phoneme probabilities obtained using the WHMT and wavelet features are then combined at the state level with those obtained by Gaussian distributions in conjunction with MFCCs, and fed into conventional Hidden Markov Models. Preliminary experiments show the potential advantages of this novel approach
CLIENT / WORLD MODEL SYNCHRONOUS ALIGNEMENT FOR SPEAKER VERIFICATION
In speaker verification, two independent stochastic models, i.e. a client model and a non-client (world) model, are generally used to verify the claimed identity using a likelihood ratio score. This paper investigates a variant of this approach based on a common hidden process for both models. In this framework, both models share the same topology, which is conditioned by the underlying phonetic structure of the utterance. Then, two different output distributions are defined corresponding to the client vs. world hypotheses. Based on this idea, a synchronous decoding algorithm and the corresponding training algorithm are derived. Our first experiments on the SESP telephone database indicate a slight improvement with respect to a baseline system using independent alignments. Moreover, synchronous alignment offers a reduced complexity during the decoding process. Interesting perspectives can be expected. Keywords : Stochastic Modeling, HMM, Synchronous Alignment, EM algorith
Behavior of a Bayesian adaptation method for incremental enrollment in speaker verification
Classical adaptation approaches are generally used for speaker or environment adaptation of speech recognition systems. In this paper, we use such techniques for the incremental training of client models in a speaker verification system. The initial model is trained on a very limited amount of data and then progressively updated with access data, using a segmental-EM procedure. In supervised mode (i.e. when access utterances are certified), the incremental approach yields equivalent performance to the batch one. We also investigate on the impact of various scenarios of impostor attacks during the incremental enrollment phase. All results are obtained with the Picassoft platform - the state-of-the-art speaker verification system developed in the PICASSO project
Behavior of a Bayesian adaptation method for incremental enrollment in speaker verification
Classical adaptation approaches are generally used for speaker or environment adaptation of speech recognition systems. In this paper, we use such techniques for the incremental training of client models in a speaker verification system. The initial model is trained on a very limited amount of data and then progressively updated with access data, using a segmental-EM procedure. In supervised mode (i.e. when access utterances are certified), the incremental approach yields equivalent performance to the batch one. We also investigate on the impact of various scenarios of impostor attacks during the incremental enrollment phase. All results are obtained with the Picassoft platform - the state-of-the-art speaker verification system developed in the PICASSO project