6,151 research outputs found
Context-Dependent Acoustic Modeling without Explicit Phone Clustering
Phoneme-based acoustic modeling of large vocabulary automatic speech
recognition takes advantage of phoneme context. The large number of
context-dependent (CD) phonemes and their highly varying statistics require
tying or smoothing to enable robust training. Usually, Classification and
Regression Trees are used for phonetic clustering, which is standard in Hidden
Markov Model (HMM)-based systems. However, this solution introduces a secondary
training objective and does not allow for end-to-end training. In this work, we
address a direct phonetic context modeling for the hybrid Deep Neural Network
(DNN)/HMM, that does not build on any phone clustering algorithm for the
determination of the HMM state inventory. By performing different
decompositions of the joint probability of the center phoneme state and its
left and right contexts, we obtain a factorized network consisting of different
components, trained jointly. Moreover, the representation of the phonetic
context for the network relies on phoneme embeddings. The recognition accuracy
of our proposed models on the Switchboard task is comparable and outperforms
slightly the hybrid model using the standard state-tying decision trees.Comment: Submitted to Interspeech 202
Multi-scale Discriminant Saliency with Wavelet-based Hidden Markov Tree Modelling
The bottom-up saliency, an early stage of humans' visual attention, can be
considered as a binary classification problem between centre and surround
classes. Discriminant power of features for the classification is measured as
mutual information between distributions of image features and corresponding
classes . As the estimated discrepancy very much depends on considered scale
level, multi-scale structure and discriminant power are integrated by employing
discrete wavelet features and Hidden Markov Tree (HMT). With wavelet
coefficients and Hidden Markov Tree parameters, quad-tree like label structures
are constructed and utilized in maximum a posterior probability (MAP) of hidden
class variables at corresponding dyadic sub-squares. Then, a saliency value for
each square block at each scale level is computed with discriminant power
principle. Finally, across multiple scales is integrated the final saliency map
by an information maximization rule. Both standard quantitative tools such as
NSS, LCC, AUC and qualitative assessments are used for evaluating the proposed
multi-scale discriminant saliency (MDIS) method against the well-know
information based approach AIM on its released image collection with
eye-tracking data. Simulation results are presented and analysed to verify the
validity of MDIS as well as point out its limitation for further research
direction.Comment: arXiv admin note: substantial text overlap with arXiv:1301.396
- …