2,212 research outputs found
Speech Communication
Contains reports on eight research projects.C.J. LeBel FellowshipSystems Development FoundationNational Institutes of Health (Grant 5 T32 NS07040)National Institutes of Health (Grant 5 R01 NS04332)National Science Foundation (Grant 1ST 80-17599)U.S. Navy - Office of Naval Research (Contract N00014-82-K-0727
Computer-Based Data Processing and Management for Blackfoot Phonetics and Phonology
More than half of the 6000 world languages have never been adequately described. We propose to create a database system to automatically capture and manage interested sound clips in Blackfoot (an endangered language spoken in Alberta, Canada, and Montana) for a phonetic and phonological analysis. Taking Blackfoot speeches as input, the system generates a list of audio clips containing a sequence of sounds or certain accent patterns based on research interests. Existing computational linguistic techniques such as information processing and artificial intelligence are extended to tackle issues specific to Blackfoot linguistics, and database techniques are adopted to support better data management and linguistic queries. This project is innovative because application of technology in Native American phonetics and phonology is underdeveloped. It enhances humanity with the digital framework to document and analyze endangered languages and can also benefit the research in other languages
Approximated and User Steerable tSNE for Progressive Visual Analytics
Progressive Visual Analytics aims at improving the interactivity in existing
analytics techniques by means of visualization as well as interaction with
intermediate results. One key method for data analysis is dimensionality
reduction, for example, to produce 2D embeddings that can be visualized and
analyzed efficiently. t-Distributed Stochastic Neighbor Embedding (tSNE) is a
well-suited technique for the visualization of several high-dimensional data.
tSNE can create meaningful intermediate results but suffers from a slow
initialization that constrains its application in Progressive Visual Analytics.
We introduce a controllable tSNE approximation (A-tSNE), which trades off speed
and accuracy, to enable interactive data exploration. We offer real-time
visualization techniques, including a density-based solution and a Magic Lens
to inspect the degree of approximation. With this feedback, the user can decide
on local refinements and steer the approximation level during the analysis. We
demonstrate our technique with several datasets, in a real-world research
scenario and for the real-time analysis of high-dimensional streams to
illustrate its effectiveness for interactive data analysis
Connectionist Temporal Modeling for Weakly Supervised Action Labeling
We propose a weakly-supervised framework for action labeling in video, where
only the order of occurring actions is required during training time. The key
challenge is that the per-frame alignments between the input (video) and label
(action) sequences are unknown during training. We address this by introducing
the Extended Connectionist Temporal Classification (ECTC) framework to
efficiently evaluate all possible alignments via dynamic programming and
explicitly enforce their consistency with frame-to-frame visual similarities.
This protects the model from distractions of visually inconsistent or
degenerated alignments without the need of temporal supervision. We further
extend our framework to the semi-supervised case when a few frames are sparsely
annotated in a video. With less than 1% of labeled frames per video, our method
is able to outperform existing semi-supervised approaches and achieve
comparable performance to that of fully supervised approaches.Comment: To appear in ECCV 201
- …