1,180,563 research outputs found

    English Conversational Telephone Speech Recognition by Humans and Machines

    Full text link
    One of the most difficult speech recognition tasks is accurate recognition of human to human communication. Advances in deep learning over the last few years have produced major speech recognition improvements on the representative Switchboard conversational corpus. Word error rates that just a few years ago were 14% have dropped to 8.0%, then 6.6% and most recently 5.8%, and are now believed to be within striking range of human performance. This then raises two issues - what IS human performance, and how far down can we still drive speech recognition error rates? A recent paper by Microsoft suggests that we have already achieved human performance. In trying to verify this statement, we performed an independent set of human performance measurements on two conversational tasks and found that human performance may be considerably better than what was earlier reported, giving the community a significantly harder goal to achieve. We also report on our own efforts in this area, presenting a set of acoustic and language modeling techniques that lowered the word error rate of our own English conversational telephone LVCSR system to the level of 5.5%/10.3% on the Switchboard/CallHome subsets of the Hub5 2000 evaluation, which - at least at the writing of this paper - is a new performance milestone (albeit not at what we measure to be human performance!). On the acoustic side, we use a score fusion of three models: one LSTM with multiple feature inputs, a second LSTM trained with speaker-adversarial multi-task learning and a third residual net (ResNet) with 25 convolutional layers and time-dilated convolutions. On the language modeling side, we use word and character LSTMs and convolutional WaveNet-style language models

    Software systems for modeling articulated figures

    Get PDF
    Research in computer animation and simulation of human task performance requires sophisticated geometric modeling and user interface tools. The software for a research environment should present the programmer with a powerful but flexible substrate of facilities for displaying and manipulating geometric objects, yet insure that future tools have a consistent and friendly user interface. Jack is a system which provides a flexible and extensible programmer and user interface for displaying and manipulating complex geometric figures, particularly human figures in a 3D working environment. It is a basic software framework for high-performance Silicon Graphics IRIS workstations for modeling and manipulating geometric objects in a general but powerful way. It provides a consistent and user-friendly interface across various applications in computer animation and simulation of human task performance. Currently, Jack provides input and control for applications including lighting specification and image rendering, anthropometric modeling, figure positioning, inverse kinematics, dynamic simulation, and keyframe animation

    Market forces, strategic management, HRM practices and organizational performance, a model based in european sample

    Get PDF
    This study uses structural equation modeling to test a model of the impact of human resources management practices on perceived organizational performance, on a large sample of European companies. The influences of competitive intensity, industry attractiveness and strategic management are considered in the model, and their direct and indirect influence on organizational performance is assessed. The model produced an adequate fit and results show that strategic management does influence human resource practices. Human resource flexibility practices and performance management have a positive impact on organizational performance, while training was not found to have a significant impact. A direct positive impact of competitive intensity and industry attractiveness on strategic management was supported by the data, as well as a direct positive effect of industry attractiveness on perceived organizational performance.

    Advanced interdisciplinary technologies

    Get PDF
    The following topics are presented in view graph form: (1) breakthrough trust (space research and technology assessment); (2) bionics (technology derivatives from biological systems); (3) biodynamics (modeling of human biomechanical performance based on anatomical data); and (4) tethered atmospheric research probes

    A review of contemporary techniques for measuring ergonomic wear comfort of protective and sport clothing

    Get PDF
    Protective and sport clothing is governed by protection requirements, performance, and comfort of the user. The comfort and impact performance of protective and sport clothing are typically subjectively measured, and this is a multifactorial and dynamic process. The aim of this review paper is to review the contemporary methodologies and approaches for measuring ergonomic wear comfort, including objective and subjective techniques. Special emphasis is given to the discussion of different methods, such as objective techniques, subjective techniques, and a combination of techniques, as well as a new biomechanical approach called modeling of skin. Literature indicates that there are four main techniques to measure wear comfort: subjective evaluation, objective measurements, a combination of subjective and objective techniques, and computer modeling of human–textile interaction. In objective measurement methods, the repeatability of results is excellent, and quantified results are obtained, but in some cases, such quantified results are quite different from the real perception of human comfort. Studies indicate that subjective analysis of comfort is less reliable than objective analysis because human subjects vary among themselves. Therefore, it can be concluded that a combination of objective and subjective measuring techniques could be the valid approach to model the comfort of textile materials

    Reliability-Informed Beat Tracking of Musical Signals

    Get PDF
    Abstract—A new probabilistic framework for beat tracking of musical audio is presented. The method estimates the time between consecutive beat events and exploits both beat and non-beat information by explicitly modeling non-beat states. In addition to the beat times, a measure of the expected accuracy of the estimated beats is provided. The quality of the observations used for beat tracking is measured and the reliability of the beats is automatically calculated. A k-nearest neighbor regression algorithm is proposed to predict the accuracy of the beat estimates. The performance of the beat tracking system is statistically evaluated using a database of 222 musical signals of various genres. We show that modeling non-beat states leads to a significant increase in performance. In addition, a large experiment where the parameters of the model are automatically learned has been completed. Results show that simple approximations for the parameters of the model can be used. Furthermore, the performance of the system is compared with existing algorithms. Finally, a new perspective for beat tracking evaluation is presented. We show how reliability information can be successfully used to increase the mean performance of the proposed algorithm and discuss how far automatic beat tracking is from human tapping. Index Terms—Beat-tracking, beat quality, beat-tracking reliability, k-nearest neighbor (k-NN) regression, music signal processing. I

    Probabilistic Modeling Paradigms for Audio Source Separation

    Get PDF
    This is the author's final version of the article, first published as E. Vincent, M. G. Jafari, S. A. Abdallah, M. D. Plumbley, M. E. Davies. Probabilistic Modeling Paradigms for Audio Source Separation. In W. Wang (Ed), Machine Audition: Principles, Algorithms and Systems. Chapter 7, pp. 162-185. IGI Global, 2011. ISBN 978-1-61520-919-4. DOI: 10.4018/978-1-61520-919-4.ch007file: VincentJafariAbdallahPD11-probabilistic.pdf:v\VincentJafariAbdallahPD11-probabilistic.pdf:PDF owner: markp timestamp: 2011.02.04file: VincentJafariAbdallahPD11-probabilistic.pdf:v\VincentJafariAbdallahPD11-probabilistic.pdf:PDF owner: markp timestamp: 2011.02.04Most sound scenes result from the superposition of several sources, which can be separately perceived and analyzed by human listeners. Source separation aims to provide machine listeners with similar skills by extracting the sounds of individual sources from a given scene. Existing separation systems operate either by emulating the human auditory system or by inferring the parameters of probabilistic sound models. In this chapter, the authors focus on the latter approach and provide a joint overview of established and recent models, including independent component analysis, local time-frequency models and spectral template-based models. They show that most models are instances of one of the following two general paradigms: linear modeling or variance modeling. They compare the merits of either paradigm and report objective performance figures. They also,conclude by discussing promising combinations of probabilistic priors and inference algorithms that could form the basis of future state-of-the-art systems
    corecore