Search CORE

49 research outputs found

Handwriting Recognition of Historical Documents with few labeled data

Author: Chammas Edgard
Likforman-Sulem Laurence
Mokbel Chafic
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 10/11/2018
Field of study

Historical documents present many challenges for offline handwriting recognition systems, among them, the segmentation and labeling steps. Carefully annotated textlines are needed to train an HTR system. In some scenarios, transcripts are only available at the paragraph level with no text-line information. In this work, we demonstrate how to train an HTR system with few labeled data. Specifically, we train a deep convolutional recurrent neural network (CRNN) system on only 10% of manually labeled text-line data from a dataset and propose an incremental training procedure that covers the rest of the data. Performance is further increased by augmenting the training set with specially crafted multiscale data. We also propose a model-based normalization scheme which considers the variability in the writing scale at the recognition phase. We apply this approach to the publicly available READ dataset. Our system achieved the second best result during the ICDAR2017 competition

arXiv.org e-Print Archive

Crossref

Synchronous Alignment

Author: Mariéthoz Johnny
Mokbel Chafic
Publication venue: IDIAP
Publication date: 10/03/2006
Field of study

In speaker verification, the maximum Likelihood between criterion is generally used to verify the claimed identity. This is done using two independent models, i.e. a Client model and a World model. It may be interesting to make both models share the same topology, which represent the phonetic underlying structure, and then to consider two different output distributions corresponding to the Client/World hypotheses. Based on this idea, a decoding algorithm and the corresponding training algorithm were derived. The first experiments show, on a significant telephone database, a small improvement with respect to the reference system, we can conclude that at least synchronous alignment provides equivalent results to the reference system with a reduced complexity decoding algorithm. Other important perspectives can be derived

Infoscience - École polytechnique fédérale de Lausanne

Towards introducing long-term statistics in MUSE for robust speech recognition

Author: Kermorvant Christopher
Mokbel Chafic
Publication venue: Keystone, Colorado, USA
Publication date: 10/03/2006
Field of study

Infoscience - École polytechnique fédérale de Lausanne

Latent Semantic Indexing by Self-Organizing Map

Author: Kurimo Mikko
Mokbel Chafic
Publication venue: Cambridge, UK
Publication date: 10/03/2006
Field of study

An important problem for the information retrieval from spoken documents is how to extract those relevant documents which are poorly decoded by the speech recognizer. In this paper we propose a stochastic index for the documents based on the Latent Semantic Analysis (LSA) of the decoded document contents. The original LSA approach uses Singular Value Decomposition to reduce the dimensionality of the documents. As an alternative, we propose a computationally more feasible solution using Random Mapping (RM) and Self-Organizing Maps (SOM). The motivation for clustering the documents by SOM is to reduce the effect of recognition errors and to extract new characteristic index terms. Experimental indexing results are presented using relevance judgments for the retrieval results of test queries and using a document perplexity defined in this paper to measure the power of the index models

Infoscience - École polytechnique fédérale de Lausanne

Direction of Arrival Estimation using EM-ESPRIT with nonuniform arrays

Author: El Kassis Carine
Fleury Gilles
Mokbel Chafic
Picheral José
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 24/02/2012
Field of study

International audienceAbstract This paper deals with the problem of the Direction Of Arrival (DOA) estimation with nonuniform linear arrays. The proposed method is based on the Expectation Maximization method where ESPRIT is used in the maximization step. The key idea is to iteratively interpolate the data to a virtual uniform linear array in order to apply ESPRIT to estimate the DOA. The iterative approach allows to improve the interpolation using the previously estimated DOA. One of this method novelties lies in its capacity of dealing with any nonuniform array geometry. This technique manifests significant performance and computational advantages over previous algorithms such as Spectral MUSIC, EM-IQML and the method based on manifold separation technique. EM-ESPRIT is shown to be more robust to additive noise. Furthermore, EM-ESPRIT fully exploits the advantages of using a nonuniform array over a uniform array: simulations show that for the same aperture and with less number of sensors, the nonuniform array presents almost identical performance as the equivalent uniform array

Brain Imaging and Machine Learning for Brain-Computer Interface

Author: Chafic Mokbel
Gerard Chollet
Maha Khachab
Nicolas Saliba
Salim Kaakour
Publication venue: 'IntechOpen'
Publication date: 01/03/2010
Field of study

IntechOpen

Crossref

Combining Wavelet-domain Hidden Markov Trees with Hidden Markov Models

Author: Ben-Yacoub Souheil
Keller Katrin
Mokbel Chafic
Publication venue: IDIAP
Publication date: 10/03/2006
Field of study

In this paper, the concept of Wavelet-domain Hidden Markov Trees (WHMT) is introduced to Automatic Speech Recognition. WHMT are a convenient means to model the structure of wavelet feature vectors, as wavelet coefficients can be interpreted as nodes in a binary tree. By the introduction of hidden states in each node, non-Gaussian statistics inherent in wavelet features can be modeled. At the same time, correlations between neighboring coefficients in the time-frequency plane are accommodated. Phoneme probabilities obtained using the WHMT and wavelet features are then combined at the state level with those obtained by Gaussian distributions in conjunction with MFCCs, and fed into conventional Hidden Markov Models. Preliminary experiments show the potential advantages of this novel approach

Infoscience - École polytechnique fédérale de Lausanne

CLIENT / WORLD MODEL SYNCHRONOUS ALIGNEMENT FOR SPEAKER VERIFICATION

Author: Bimbot Frédéric
Genoud Dominique
Mariéthoz Johnny
Mokbel Chafic
Publication venue: Budapest, Hungary
Publication date: 10/03/2006
Field of study

In speaker verification, two independent stochastic models, i.e. a client model and a non-client (world) model, are generally used to verify the claimed identity using a likelihood ratio score. This paper investigates a variant of this approach based on a common hidden process for both models. In this framework, both models share the same topology, which is conditioned by the underlying phonetic structure of the utterance. Then, two different output distributions are defined corresponding to the client vs. world hypotheses. Based on this idea, a synchronous decoding algorithm and the corresponding training algorithm are derived. Our first experiments on the SESP telephone database indicate a slight improvement with respect to a baseline system using independent alignments. Moreover, synchronous alignment offers a reduced complexity during the decoding process. Interesting perspectives can be expected. Keywords : Stochastic Modeling, HMM, Synchronous Alignment, EM algorith

Infoscience - École polytechnique fédérale de Lausanne

Behavior of a Bayesian adaptation method for incremental enrollment in speaker verification

Author: Bimbot Frédéric
Fredouille Corinne
Hennebert Jean
Jaboulet Cédric
Mariéthoz Johnny
Mokbel Chafic
Publication venue: Istanbul, Turkey
Publication date: 10/03/2006
Field of study

Classical adaptation approaches are generally used for speaker or environment adaptation of speech recognition systems. In this paper, we use such techniques for the incremental training of client models in a speaker verification system. The initial model is trained on a very limited amount of data and then progressively updated with access data, using a segmental-EM procedure. In supervised mode (i.e. when access utterances are certified), the incremental approach yields equivalent performance to the batch one. We also investigate on the impact of various scenarios of impostor attacks during the incremental enrollment phase. All results are obtained with the Picassoft platform - the state-of-the-art speaker verification system developed in the PICASSO project

Infoscience - École polytechnique fédérale de Lausanne

Behavior of a Bayesian adaptation method for incremental enrollment in speaker verification

Author: Bimbot Frédéric
Fredouille Corinne
Hennebert Jean
Jaboulet Cédric
Mariéthoz Johnny
Mokbel Chafic
Publication venue: IDIAP
Publication date: 10/03/2006
Field of study

Infoscience - École polytechnique fédérale de Lausanne