340 research outputs found
Weakly-Supervised Temporal Localization via Occurrence Count Learning
We propose a novel model for temporal detection and localization which allows
the training of deep neural networks using only counts of event occurrences as
training labels. This powerful weakly-supervised framework alleviates the
burden of the imprecise and time-consuming process of annotating event
locations in temporal data. Unlike existing methods, in which localization is
explicitly achieved by design, our model learns localization implicitly as a
byproduct of learning to count instances. This unique feature is a direct
consequence of the model's theoretical properties. We validate the
effectiveness of our approach in a number of experiments (drum hit and piano
onset detection in audio, digit detection in images) and demonstrate
performance comparable to that of fully-supervised state-of-the-art methods,
despite much weaker training requirements.Comment: Accepted at ICML 201
Recommended from our members
Uncovering temporal structure in hippocampal output patterns.
Place cell activity of hippocampal pyramidal cells has been described as the cognitive substrate of spatial memory. Replay is observed during hippocampal sharp-wave-ripple-associated population burst events (PBEs) and is critical for consolidation and recall-guided behaviors. PBE activity has historically been analyzed as a phenomenon subordinate to the place code. Here, we use hidden Markov models to study PBEs observed in rats during exploration of both linear mazes and open fields. We demonstrate that estimated models are consistent with a spatial map of the environment, and can even decode animals' positions during behavior. Moreover, we demonstrate the model can be used to identify hippocampal replay without recourse to the place code, using only PBE model congruence. These results suggest that downstream regions may rely on PBEs to provide a substrate for memory. Additionally, by forming models independent of animal behavior, we lay the groundwork for studies of non-spatial memory
The benefits of acoustic perceptual information for speech processing systems
The frame-synchronized framework has dominated many speech processing systems, such as ASR and AED targeting human speech activities. These systems have little consideration for the science behind speech and treat the task as a simple statistical classification. The framework also assumes each feature vector to be equally important to the task. However, through some preliminary experiments, this study has found evidence that some concepts defined in speech perception theories such as auditory roughness and acoustic landmarks can act as heuristics to these systems and benefit them in multiple ways. Findings of acoustic landmarks hint that the idea of treating each frame equally might not be optimal. In some cases, landmark information can improve system accuracy through highlighting the more significant frames, or improve the acoustic model accuracy by training through MTL. Further investigation into the topic found experimental evidence suggesting that acoustic landmark information can also benefit end-to-end acoustic models trained through CTC loss. With the help of acoustic landmarks, CTC models can converge with less training data and achieve lower error rate. For the first time, positive results were collected on a mid-size ASR corpus (WSJ) for acoustic landmarks. The results indicate that audio perception information can benefit a broad range of audio processing systems
- …