18,637 research outputs found

    Personalized Acoustic Modeling by Weakly Supervised Multi-Task Deep Learning using Acoustic Tokens Discovered from Unlabeled Data

    Full text link
    It is well known that recognizers personalized to each user are much more effective than user-independent recognizers. With the popularity of smartphones today, although it is not difficult to collect a large set of audio data for each user, it is difficult to transcribe it. However, it is now possible to automatically discover acoustic tokens from unlabeled personal data in an unsupervised way. We therefore propose a multi-task deep learning framework called a phoneme-token deep neural network (PTDNN), jointly trained from unsupervised acoustic tokens discovered from unlabeled data and very limited transcribed data for personalized acoustic modeling. We term this scenario "weakly supervised". The underlying intuition is that the high degree of similarity between the HMM states of acoustic token models and phoneme models may help them learn from each other in this multi-task learning framework. Initial experiments performed over a personalized audio data set recorded from Facebook posts demonstrated that very good improvements can be achieved in both frame accuracy and word accuracy over popularly-considered baselines such as fDLR, speaker code and lightly supervised adaptation. This approach complements existing speaker adaptation approaches and can be used jointly with such techniques to yield improved results.Comment: 5 pages, 5 figures, published in IEEE ICASSP 201

    Defining adaptation in a generic multi layer model : CAM: the GRAPPLE conceptual adaptation model

    Get PDF
    Authoring of Adaptive Hypermedia is a difficult and time consuming task. Reference models like LAOS and AHAM separate adaptation and content in different layers. Systems like AHA! offer graphical tools based on these models to allow authors to define adaptation without knowing any adaptation language. The adaptation that can be defined using such tools is still limited. Authoring systems like MOT are more flexible, but usability of adaptation specification is low. This paper proposes a more generic model which allows the adaptation to be defined in an arbitrary number of layers, where adaptation is expressed in terms of relationships between concepts. This model allows the creation of more powerful yet easier to use graphical authoring tools. This paper presents the structure of the Conceptual Adaptation Models used in adaptive applications created within the GRAPPLE adaptive learning environment, and their representation in a graphical authoring tool

    Defining adaptation in a generic multi layer model : CAM: the GRAPPLE conceptual adaptation model

    Get PDF
    Authoring of Adaptive Hypermedia is a difficult and time consuming task. Reference models like LAOS and AHAM separate adaptation and content in different layers. Systems like AHA! offer graphical tools based on these models to allow authors to define adaptation without knowing any adaptation language. The adaptation that can be defined using such tools is still limited. Authoring systems like MOT are more flexible, but usability of adaptation specification is low. This paper proposes a more generic model which allows the adaptation to be defined in an arbitrary number of layers, where adaptation is expressed in terms of relationships between concepts. This model allows the creation of more powerful yet easier to use graphical authoring tools. This paper presents the structure of the Conceptual Adaptation Models used in adaptive applications created within the GRAPPLE adaptive learning environment, and their representation in a graphical authoring tool

    Minimizing the impact of delay on live SVC-based HTTP adaptive streaming services

    Get PDF
    HTTP Adaptive Streaming (HAS) is becoming the de-facto standard for Over-The-Top video streaming services. Video content is temporally split into segments which are offered at multiple qualities to the clients. These clients autonomously select the quality layer matching the current state of the network through a quality selection heuristic. Recently, academia and industry have begun evaluating the feasibility of adopting layered video coding for HAS. Instead of downloading one file for a certain quality level, scalable video streaming requires downloading several interdependent layers to obtain the same quality. This implies that the base layer is always downloaded and is available for playout, even when throughput fluctuates and enhancement layers can not be downloaded in time. This layered video approach can help in providing better service quality assurance for video streaming. However, adopting scalable video coding for HAS also leads to other issues, since requesting multiple files over HTTP leads to an increased impact of the end-to-end delay and thus on the service provided to the client. This is even worse in a Live TV scenario where the drift on the live signal should be minimized, requiring smaller segment and buffer sizes. In this paper, we characterize the impact of delay on several measurement-based heuristics. Furthermore, we propose several ways to overcome the end-to-end delay issues, such as parallel and pipelined downloading of segment layers, to provide a higher quality for the video service

    Embedding-Based Speaker Adaptive Training of Deep Neural Networks

    Full text link
    An embedding-based speaker adaptive training (SAT) approach is proposed and investigated in this paper for deep neural network acoustic modeling. In this approach, speaker embedding vectors, which are a constant given a particular speaker, are mapped through a control network to layer-dependent element-wise affine transformations to canonicalize the internal feature representations at the output of hidden layers of a main network. The control network for generating the speaker-dependent mappings is jointly estimated with the main network for the overall speaker adaptive acoustic modeling. Experiments on large vocabulary continuous speech recognition (LVCSR) tasks show that the proposed SAT scheme can yield superior performance over the widely-used speaker-aware training using i-vectors with speaker-adapted input features

    IDEALIST control and service management solutions for dynamic and adaptive flexi-grid DWDM networks

    Get PDF
    Wavelength Switched Optical Networks (WSON) were designed with the premise that all channels in a network have the same spectrum needs, based on the ITU-T DWDM grid. However, this rigid grid-based approach is not adapted to the spectrum requirements of the signals that are best candidates for long-reach transmission and high-speed data rates of 400Gbps and beyond. An innovative approach is to evolve the fixed DWDM grid to a flexible grid, in which the optical spectrum is partitioned into fixed-sized spectrum slices. This allows facilitating the required amount of optical bandwidth and spectrum for an elastic optical connection to be dynamically and adaptively allocated by assigning the necessary number of slices of spectrum. The ICT IDEALIST project will provide the architectural design, protocol specification, implementation, evaluation and standardization of a control plane and a network and service management system. This architecture and tools are necessary to introduce dynamicity, elasticity and adaptation in flexi-grid DWDM networks. This paper provides an overview of the objectives, framework, functional requirements and use cases of the elastic control plane and the adaptive network and service management system targeted in the ICT IDEALIST project
    • …
    corecore