18,637 research outputs found
Personalized Acoustic Modeling by Weakly Supervised Multi-Task Deep Learning using Acoustic Tokens Discovered from Unlabeled Data
It is well known that recognizers personalized to each user are much more
effective than user-independent recognizers. With the popularity of smartphones
today, although it is not difficult to collect a large set of audio data for
each user, it is difficult to transcribe it. However, it is now possible to
automatically discover acoustic tokens from unlabeled personal data in an
unsupervised way. We therefore propose a multi-task deep learning framework
called a phoneme-token deep neural network (PTDNN), jointly trained from
unsupervised acoustic tokens discovered from unlabeled data and very limited
transcribed data for personalized acoustic modeling. We term this scenario
"weakly supervised". The underlying intuition is that the high degree of
similarity between the HMM states of acoustic token models and phoneme models
may help them learn from each other in this multi-task learning framework.
Initial experiments performed over a personalized audio data set recorded from
Facebook posts demonstrated that very good improvements can be achieved in both
frame accuracy and word accuracy over popularly-considered baselines such as
fDLR, speaker code and lightly supervised adaptation. This approach complements
existing speaker adaptation approaches and can be used jointly with such
techniques to yield improved results.Comment: 5 pages, 5 figures, published in IEEE ICASSP 201
Defining adaptation in a generic multi layer model : CAM: the GRAPPLE conceptual adaptation model
Authoring of Adaptive Hypermedia is a difficult and time consuming task. Reference models like LAOS and AHAM separate adaptation and content in different layers. Systems like AHA! offer graphical tools based on these models to allow authors to define adaptation without knowing any adaptation language. The adaptation that can be defined using such tools is still limited. Authoring systems like MOT are more flexible, but usability of adaptation specification is low. This paper proposes a more generic model which allows the adaptation to be defined in an arbitrary number of layers, where adaptation is expressed in terms of relationships between concepts. This model allows the creation of more powerful yet easier to use graphical authoring tools. This paper presents the structure of the Conceptual Adaptation Models used in adaptive applications created within the GRAPPLE adaptive learning environment, and their representation in a graphical authoring tool
Defining adaptation in a generic multi layer model : CAM: the GRAPPLE conceptual adaptation model
Authoring of Adaptive Hypermedia is a difficult and time consuming task. Reference models like LAOS and AHAM separate adaptation and content in different layers. Systems like AHA! offer graphical tools based on these models to allow authors to define adaptation without knowing any adaptation language. The adaptation that can be defined using such tools is still limited. Authoring systems like MOT are more flexible, but usability of adaptation specification is low. This paper proposes a more generic model which allows the adaptation to be defined in an arbitrary number of layers, where adaptation is expressed in terms of relationships between concepts. This model allows the creation of more powerful yet easier to use graphical authoring tools. This paper presents the structure of the Conceptual Adaptation Models used in adaptive applications created within the GRAPPLE adaptive learning environment, and their representation in a graphical authoring tool
Minimizing the impact of delay on live SVC-based HTTP adaptive streaming services
HTTP Adaptive Streaming (HAS) is becoming the de-facto standard for Over-The-Top video streaming services. Video content is temporally split into segments which are offered at multiple qualities to the clients. These clients autonomously select the quality layer matching the current state of the network through a quality selection heuristic. Recently, academia and industry have begun evaluating the feasibility of adopting layered video coding for HAS. Instead of downloading one file for a certain quality level, scalable video streaming requires downloading several interdependent layers to obtain the same quality. This implies that the base layer is always downloaded and is available for playout, even when throughput fluctuates and enhancement layers can not be downloaded in time. This layered video approach can help in providing better service quality assurance for video streaming. However, adopting scalable video coding for HAS also leads to other issues, since requesting multiple files over HTTP leads to an increased impact of the end-to-end delay and thus on the service provided to the client. This is even worse in a Live TV scenario where the drift on the live signal should be minimized, requiring smaller segment and buffer sizes. In this paper, we characterize the impact of delay on several measurement-based heuristics. Furthermore, we propose several ways to overcome the end-to-end delay issues, such as parallel and pipelined downloading of segment layers, to provide a higher quality for the video service
Embedding-Based Speaker Adaptive Training of Deep Neural Networks
An embedding-based speaker adaptive training (SAT) approach is proposed and
investigated in this paper for deep neural network acoustic modeling. In this
approach, speaker embedding vectors, which are a constant given a particular
speaker, are mapped through a control network to layer-dependent element-wise
affine transformations to canonicalize the internal feature representations at
the output of hidden layers of a main network. The control network for
generating the speaker-dependent mappings is jointly estimated with the main
network for the overall speaker adaptive acoustic modeling. Experiments on
large vocabulary continuous speech recognition (LVCSR) tasks show that the
proposed SAT scheme can yield superior performance over the widely-used
speaker-aware training using i-vectors with speaker-adapted input features
IDEALIST control and service management solutions for dynamic and adaptive flexi-grid DWDM networks
Wavelength Switched Optical Networks (WSON) were designed with the premise that all channels in a network have the same spectrum needs, based on the ITU-T DWDM grid. However, this rigid grid-based approach is not adapted to the spectrum requirements of the signals that are best candidates for long-reach transmission and high-speed data rates of 400Gbps and beyond. An innovative approach is to evolve the fixed DWDM grid to a flexible grid, in which the optical spectrum is partitioned into fixed-sized spectrum slices. This allows facilitating the required amount of optical bandwidth and spectrum for an elastic optical connection to be dynamically and adaptively allocated by assigning the necessary number of slices of spectrum. The ICT IDEALIST project will provide the architectural design, protocol specification, implementation, evaluation and standardization of a control plane and a network and service management system. This architecture and tools are necessary to introduce dynamicity, elasticity and adaptation in flexi-grid DWDM networks. This paper provides an overview of the objectives, framework, functional requirements and use cases of the elastic control plane and the adaptive network and service management system targeted in the ICT IDEALIST project
- …