1,848 research outputs found
Chinese Spoken Document Summarization Using Probabilistic Latent Topical Information
[[abstract]]The purpose of extractive summarization is to automatically select a number of indicative sentences, passages, or paragraphs from the original document according to a target summarization ratio and then sequence them to form a concise summary. In the paper, we proposed the use of probabilistic latent topical information for extractive summarization of spoken documents. Various kinds of modeling structures and learning approaches were extensively investigated. In addition, the summarization capabilities were verified by comparison with the conventional vector space model and latent semantic indexing model, as well as the HMM model. The experiments were performed on the Chinese broadcast news collected in Taiwan. Noticeable performance gains were obtained.
Geometric Learning of Hidden Markov Models via a Method of Moments Algorithm
We present a novel algorithm for learning the parameters of hidden Markov
models (HMMs) in a geometric setting where the observations take values in
Riemannian manifolds. In particular, we elevate a recent second-order method of
moments algorithm that incorporates non-consecutive correlations to a more
general setting where observations take place in a Riemannian symmetric space
of non-positive curvature and the observation likelihoods are Riemannian
Gaussians. The resulting algorithm decouples into a Riemannian Gaussian mixture
model estimation algorithm followed by a sequence of convex optimization
procedures. We demonstrate through examples that the learner can result in
significantly improved speed and numerical accuracy compared to existing
learners
Leveraging Language ID to Calculate Intermediate CTC Loss for Enhanced Code-Switching Speech Recognition
In recent years, end-to-end speech recognition has emerged as a technology
that integrates the acoustic, pronunciation dictionary, and language model
components of the traditional Automatic Speech Recognition model. It is
possible to achieve human-like recognition without the need to build a
pronunciation dictionary in advance. However, due to the relative scarcity of
training data on code-switching, the performance of ASR models tends to degrade
drastically when encountering this phenomenon. Most past studies have
simplified the learning complexity of the model by splitting the code-switching
task into multiple tasks dealing with a single language and then learning the
domain-specific knowledge of each language separately. Therefore, in this
paper, we attempt to introduce language identification information into the
middle layer of the ASR model's encoder. We aim to generate acoustic features
that imply language distinctions in a more implicit way, reducing the model's
confusion when dealing with language switching.Comment: Accepted to The 28th International Conference on Technologies and
Applications of Artificial Intelligence (TAAI), in Chinese languag
A discriminative HMM/N-gram-based retrieval approach for Mandarin spoken documents
In recent years, statistical modeling approaches have steadily gained in popularity in the field of information retrieval. This article presents an HMM/N-gram-based retrieval approach for Mandarin spoken documents. The underlying characteristics and the various structures of this approach were extensively investigated and analyzed. The retrieval capabilities were verified by tests with word- and syllable-level indexing features and comparisons to the conventional vector-space model approach. To further improve the discrimination capabilities of the HMMs, both the expectation-maximization (EM) and minimum classification error (MCE) training algorithms were introduced in training. Fusion of information via indexing word- and syllable-level features was also investigated. The spoken document retrieval experiments were performed on the Topic Detection and Tracking Corpora (TDT-2 and TDT-3). Very encouraging retrieval performance was obtained
- …