15 research outputs found

    Vocal tecnology: A normalization approach

    Get PDF
    From the 1990s onwards the use of digital technology for voice and image transmission (GSM mobile telephones, satellite transmissions and Frame Relay and ATM networks) has brought about the convergence of information technology and telecommunications, leading to the birth of the ICT (Information & Communication Technologies) sector. Currently, internal telephone networks, LANs, internet connections and geographical data transmission networks are being unified in most organizations of a certain size

    The case for automatic higher-level features in forensic speaker recognition

    Get PDF
    Abstract Approaches from standard automatic speaker recognition, which rely on cepstral features, suffer the problem of lack of interpretability for forensic applications. But the growing practice of using "higher-level" features in automatic systems offers promise in this regard. We provide an overview of automatic higher-level systems and discuss potential advantages, as well as issues, for their use in the forensic context

    The case for automatic higher-level features in forensic speaker recognition

    Get PDF
    Abstract Approaches from standard automatic speaker recognition, which rely on cepstral features, suffer the problem of lack of interpretability for forensic applications. But the growing practice of using "higher-level" features in automatic systems offers promise in this regard. We provide an overview of automatic higher-level systems and discuss potential advantages, as well as issues, for their use in the forensic context

    Eighty Challenges Facing Speech Input/Output Technologies

    Get PDF
    ABSTRACT During the past three decades, we have witnessed remarkable progress in the development of speech input/output technologies. Despite these successes, we are far from reaching human capabilities of recognizing nearly perfectly the speech spoken by many speakers, under varying acoustic environments, with essentially unrestricted vocabulary. Synthetic speech still sounds stilted and robot-like, lacking in real personality and emotion. There are many challenges that will remain unmet unless we can advance our fundamental understanding of human communication -how speech is produced and perceived, utilizing our innate linguistic competence. This paper outlines some of these challenges, ranging from signal presentation and lexical access to language understanding and multimodal integration, and speculates on how these challenges could be met

    Exploration of small enrollment speaker verification on handheld devices

    Get PDF
    Thesis (M. Eng. and S.B.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2005.Includes bibliographical references (p. 77-78).This thesis explores the problem of robust speaker verification for handheld devices under the context of extremely limited training data. Although speaker verification technology is an area of great promise for security applications, the implementation of such a system on handheld devices presents its own unique challenges arising from the highly mobile nature of the devices. This work first independently analyzes the impact of a number of key factors, such as speech features, basic modeling techniques, as well as highly variable environmental/microphone conditions on speaker verification accuracy. We then present and evaluate methods for improving speaker verification robustness. In particular, we focus on normalization techniques, such as handset normalization (H-norm), zero normalization (Z-norm) as well as model training methodologies (multistyle training) to minimize the detrimental impact of highly variable environment and microphone conditions on speaker verification robustness.by Ram H. Woo.M.Eng.and S.B

    Robust audio-visual person verification using Web-camera video

    Get PDF
    Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2006.Includes bibliographical references (p. 61-62).This thesis examines the challenge of robust audio-visual person verification using data recorded in multiple environments with various lighting conditions, irregular visual backgrounds, and diverse background noise. Audio-visual person verification could prove to be very useful in both physical and logical access control security applications, but only if it can perform well in a variety of environments. This thesis first examines the factors that affect video-only person verification performance, including recording environment, amount of training data, and type of facial feature used. We then combine scores from audio and video verification systems to create a multi-modal verification system and compare its accuracy with that of either single-mode system.by Daniel Schultz.M.Eng
    corecore