Search CORE

18,637 research outputs found

Personalized Acoustic Modeling by Weakly Supervised Multi-Task Deep Learning using Acoustic Tokens Discovered from Unlabeled Data

Author: Chung Cheng-Tao
Lee Hung-Yi
Lee Lin-Shan
Wei Cheng-Kuan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 23/06/2017
Field of study

It is well known that recognizers personalized to each user are much more effective than user-independent recognizers. With the popularity of smartphones today, although it is not difficult to collect a large set of audio data for each user, it is difficult to transcribe it. However, it is now possible to automatically discover acoustic tokens from unlabeled personal data in an unsupervised way. We therefore propose a multi-task deep learning framework called a phoneme-token deep neural network (PTDNN), jointly trained from unsupervised acoustic tokens discovered from unlabeled data and very limited transcribed data for personalized acoustic modeling. We term this scenario "weakly supervised". The underlying intuition is that the high degree of similarity between the HMM states of acoustic token models and phoneme models may help them learn from each other in this multi-task learning framework. Initial experiments performed over a personalized audio data set recorded from Facebook posts demonstrated that very good improvements can be achieved in both frame accuracy and word accuracy over popularly-considered baselines such as fDLR, speaker code and lightly supervised adaptation. This approach complements existing speaker adaptation approaches and can be used jointly with such techniques to yield improved results.Comment: 5 pages, 5 figures, published in IEEE ICASSP 201

arXiv.org e-Print Archive

Crossref

Defining adaptation in a generic multi layer model : CAM: the GRAPPLE conceptual adaptation model

Author: Bra Paul M. E. de
Cristea Alexandra I.
Hendrix Maurice
Pechenizkiy M.
Smits David
Publication venue: University of Warwick. Department of Computer Science
Publication date: 01/01/2008
Field of study

Authoring of Adaptive Hypermedia is a difficult and time consuming task. Reference models like LAOS and AHAM separate adaptation and content in different layers. Systems like AHA! offer graphical tools based on these models to allow authors to define adaptation without knowing any adaptation language. The adaptation that can be defined using such tools is still limited. Authoring systems like MOT are more flexible, but usability of adaptation specification is low. This paper proposes a more generic model which allows the adaptation to be defined in an arbitrary number of layers, where adaptation is expressed in terms of relationships between concepts. This model allows the creation of more powerful yet easier to use graphical authoring tools. This paper presents the structure of the Conceptual Adaptation Models used in adaptive applications created within the GRAPPLE adaptive learning environment, and their representation in a graphical authoring tool

CiteSeerX

Repository TU/e

Pure OAI Repository

Warwick Research Archives Portal Repository

Coventry University Pure Portal

Defining adaptation in a generic multi layer model : CAM: the GRAPPLE conceptual adaptation model

Author: Bra Paul M. E. de
Cristea Alexandra I.
Hendrix Maurice
Pechenizkiy M.
Smits David
Publication venue: University of Warwick. Department of Computer Science
Publication date: 01/06/2008
Field of study

Warwick Research Archives Portal Repository

Minimizing the impact of delay on live SVC-based HTTP adaptive streaming services

Author: Bouten Niels
De Turck Filip
Famaey Jeroen
Latré Steven
Van Leekwijck W
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2013
Field of study

HTTP Adaptive Streaming (HAS) is becoming the de-facto standard for Over-The-Top video streaming services. Video content is temporally split into segments which are offered at multiple qualities to the clients. These clients autonomously select the quality layer matching the current state of the network through a quality selection heuristic. Recently, academia and industry have begun evaluating the feasibility of adopting layered video coding for HAS. Instead of downloading one file for a certain quality level, scalable video streaming requires downloading several interdependent layers to obtain the same quality. This implies that the base layer is always downloaded and is available for playout, even when throughput fluctuates and enhancement layers can not be downloaded in time. This layered video approach can help in providing better service quality assurance for video streaming. However, adopting scalable video coding for HAS also leads to other issues, since requesting multiple files over HTTP leads to an increased impact of the end-to-end delay and thus on the service provided to the client. This is even worse in a Live TV scenario where the drift on the live signal should be minimized, requiring smaller segment and buffer sizes. In this paper, we characterize the impact of delay on several measurement-based heuristics. Furthermore, we propose several ways to overcome the end-to-end delay issues, such as parallel and pipelined downloading of segment layers, to provide a higher quality for the video service

Ghent University Academic Bibliography

Institutional Repository Universiteit Antwerpen

Embedding-Based Speaker Adaptive Training of Deep Neural Networks

Author: Cui Xiaodong
Goel Vaibhava
Saon George
Publication venue
Publication date: 17/10/2017
Field of study

An embedding-based speaker adaptive training (SAT) approach is proposed and investigated in this paper for deep neural network acoustic modeling. In this approach, speaker embedding vectors, which are a constant given a particular speaker, are mapped through a control network to layer-dependent element-wise affine transformations to canonicalize the internal feature representations at the output of hidden layers of a main network. The control network for generating the speaker-dependent mappings is jointly estimated with the main network for the overall speaker adaptive acoustic modeling. Experiments on large vocabulary continuous speech recognition (LVCSR) tasks show that the proposed SAT scheme can yield superior performance over the widely-used speaker-aware training using i-vectors with speaker-adapted input features

arXiv.org e-Print Archive

Crossref

IDEALIST control and service management solutions for dynamic and adaptive flexi-grid DWDM networks

Author: Cassellas Ramon
Cugini Filippo
D'Errico Antonio
Gerstel Ori
Gonzalez De Dios Oscar
King Daniel
Lopez Victor
Munoz Raul
Sambo Nicola
Publication venue
Publication date: 03/06/2013
Field of study

Wavelength Switched Optical Networks (WSON) were designed with the premise that all channels in a network have the same spectrum needs, based on the ITU-T DWDM grid. However, this rigid grid-based approach is not adapted to the spectrum requirements of the signals that are best candidates for long-reach transmission and high-speed data rates of 400Gbps and beyond. An innovative approach is to evolve the fixed DWDM grid to a flexible grid, in which the optical spectrum is partitioned into fixed-sized spectrum slices. This allows facilitating the required amount of optical bandwidth and spectrum for an elastic optical connection to be dynamically and adaptively allocated by assigning the necessary number of slices of spectrum. The ICT IDEALIST project will provide the architectural design, protocol specification, implementation, evaluation and standardization of a control plane and a network and service management system. This architecture and tools are necessary to introduce dynamicity, elasticity and adaptation in flexi-grid DWDM networks. This paper provides an overview of the objectives, framework, functional requirements and use cases of the elastic control plane and the adaptive network and service management system targeted in the ICT IDEALIST project

Lancaster E-Prints