14,134 research outputs found
Historical Document Image Segmentation with LDA-Initialized Deep Neural Networks
In this paper, we present a novel approach to perform deep neural networks
layer-wise weight initialization using Linear Discriminant Analysis (LDA).
Typically, the weights of a deep neural network are initialized with: random
values, greedy layer-wise pre-training (usually as Deep Belief Network or as
auto-encoder) or by re-using the layers from another network (transfer
learning). Hence, many training epochs are needed before meaningful weights are
learned, or a rather similar dataset is required for seeding a fine-tuning of
transfer learning. In this paper, we describe how to turn an LDA into either a
neural layer or a classification layer. We analyze the initialization technique
on historical documents. First, we show that an LDA-based initialization is
quick and leads to a very stable initialization. Furthermore, for the task of
layout analysis at pixel level, we investigate the effectiveness of LDA-based
initialization and show that it outperforms state-of-the-art random weight
initialization methods.Comment: 5 page
Improving Source Separation via Multi-Speaker Representations
Lately there have been novel developments in deep learning towards solving
the cocktail party problem. Initial results are very promising and allow for
more research in the domain. One technique that has not yet been explored in
the neural network approach to this task is speaker adaptation. Intuitively,
information on the speakers that we are trying to separate seems fundamentally
important for the speaker separation task. However, retrieving this speaker
information is challenging since the speaker identities are not known a priori
and multiple speakers are simultaneously active. There is thus some sort of
chicken and egg problem. To tackle this, source signals and i-vectors are
estimated alternately. We show that blind multi-speaker adaptation improves the
results of the network and that (in our case) the network is not capable of
adequately retrieving this useful speaker information itself
- …