874 research outputs found

    Glottal-synchronous speech processing

    No full text
    Glottal-synchronous speech processing is a field of speech science where the pseudoperiodicity of voiced speech is exploited. Traditionally, speech processing involves segmenting and processing short speech frames of predefined length; this may fail to exploit the inherent periodic structure of voiced speech which glottal-synchronous speech frames have the potential to harness. Glottal-synchronous frames are often derived from the glottal closure instants (GCIs) and glottal opening instants (GOIs). The SIGMA algorithm was developed for the detection of GCIs and GOIs from the Electroglottograph signal with a measured accuracy of up to 99.59%. For GCI and GOI detection from speech signals, the YAGA algorithm provides a measured accuracy of up to 99.84%. Multichannel speech-based approaches are shown to be more robust to reverberation than single-channel algorithms. The GCIs are applied to real-world applications including speech dereverberation, where SNR is improved by up to 5 dB, and to prosodic manipulation where the importance of voicing detection in glottal-synchronous algorithms is demonstrated by subjective testing. The GCIs are further exploited in a new area of data-driven speech modelling, providing new insights into speech production and a set of tools to aid deployment into real-world applications. The technique is shown to be applicable in areas of speech coding, identification and artificial bandwidth extension of telephone speec

    Wireless recording of the calls of Rousettus aegyptiacus and their reproduction using electrostatic transducers

    Get PDF
    Bats are capable of imaging their surroundings in great detail using echolocation. To apply similar methods to human engineering systems requires the capability to measure and recreate the signals used, and to understand the processing applied to returning echoes. In this work, the emitted and reflected echolocation signals of Rousettus aegyptiacus are recorded while the bat is in flight, using a wireless sensor mounted on the bat. The sensor is designed to replicate the acoustic gain control which bats are known to use, applying a gain to returning echoes that is dependent on the incurred time delay. Employing this technique allows emitted and reflected echolocation calls, which have a wide dynamic range, to be recorded. The recorded echoes demonstrate the complexity of environment reconstruction using echolocation. The sensor is also used to make accurate recordings of the emitted calls, and these calls are recreated in the laboratory using custom-built wideband electrostatic transducers, allied with a spectral equalization technique. This technique is further demonstrated by recreating multi-harmonic bioinspired FM chirps. The ability to record and accurately synthesize echolocation calls enables the exploitation of biological signals in human engineering systems for sonar, materials characterization and imaging

    Spectrotemporal Modulation Sensitivity in Hearing-Impaired Listeners

    Get PDF
    Speech is characterized by temporal and spectral modulations. Hearing-impaired (HI) listeners may have reduced spectrotemporal modulation (STM) sensitivity, which could affect their speech understanding. This study examined effects of hearing loss and absolute frequency on STM sensitivity and their relationship to speech intelligibility, frequency selectivity and temporal fine-structure (TFS) sensitivity. Sensitivity to STM applied to four-octave or one-octave noise carriers were measured for normal-hearing and HI listeners as a function of spectral modulation, temporal modulation and absolute frequency. Across-frequency variation in STM sensitivity suggests that broadband measurements do not sufficiently characterize performance. Results were simulated with a cortical STM-sensitivity model. No correlation was found between the reduced frequency selectivity required in the model to explain the HI STM data and more direct notched-noise estimates. Correlations between low-frequency and broadband STM performance, speech intelligibility and frequency-modulation sensitivity suggest that speech and STM processing may depend on the ability to use TFS

    Earth resources technology satellite spacecraft system design studies. Volume 2, book 1 - Subsystems studies Final report

    Get PDF
    Developing structure, payload, communication and data handling subsystems for ERT

    Treatise on Hearing: The Temporal Auditory Imaging Theory Inspired by Optics and Communication

    Full text link
    A new theory of mammalian hearing is presented, which accounts for the auditory image in the midbrain (inferior colliculus) of objects in the acoustical environment of the listener. It is shown that the ear is a temporal imaging system that comprises three transformations of the envelope functions: cochlear group-delay dispersion, cochlear time lensing, and neural group-delay dispersion. These elements are analogous to the optical transformations in vision of diffraction between the object and the eye, spatial lensing by the lens, and second diffraction between the lens and the retina. Unlike the eye, it is established that the human auditory system is naturally defocused, so that coherent stimuli do not react to the defocus, whereas completely incoherent stimuli are impacted by it and may be blurred by design. It is argued that the auditory system can use this differential focusing to enhance or degrade the images of real-world acoustical objects that are partially coherent. The theory is founded on coherence and temporal imaging theories that were adopted from optics. In addition to the imaging transformations, the corresponding inverse-domain modulation transfer functions are derived and interpreted with consideration to the nonuniform neural sampling operation of the auditory nerve. These ideas are used to rigorously initiate the concepts of sharpness and blur in auditory imaging, auditory aberrations, and auditory depth of field. In parallel, ideas from communication theory are used to show that the organ of Corti functions as a multichannel phase-locked loop (PLL) that constitutes the point of entry for auditory phase locking and hence conserves the signal coherence. It provides an anchor for a dual coherent and noncoherent auditory detection in the auditory brain that culminates in auditory accommodation. Implications on hearing impairments are discussed as well.Comment: 603 pages, 131 figures, 13 tables, 1570 reference

    Engineering evaluations and studies. Volume 3: Exhibit C

    Get PDF
    High rate multiplexes asymmetry and jitter, data-dependent amplitude variations, and transition density are discussed

    Toward an interpretive framework of two-dimensional speech-signal processing

    Get PDF
    Thesis (Ph. D.)--Harvard-MIT Division of Health Sciences and Technology, 2011.Cataloged from PDF version of thesis.Includes bibliographical references (p. 177-179).Traditional representations of speech are derived from short-time segments of the signal and result in time-frequency distributions of energy such as the short-time Fourier transform and spectrogram. Speech-signal models of such representations have had utility in a variety of applications such as speech analysis, recognition, and synthesis. Nonetheless, they do not capture spectral, temporal, and joint spectrotemporal energy fluctuations (or "modulations") present in local time-frequency regions of the time-frequency distribution. Inspired by principles from image processing and evidence from auditory neurophysiological models, a variety of twodimensional (2-D) processing techniques have been explored in the literature as alternative representations of speech; however, speech-based models are lacking in this framework. This thesis develops speech-signal models for a particular 2-D processing approach in which 2-D Fourier transforms are computed on local time-frequency regions of the canonical narrowband or wideband spectrogram; we refer to the resulting transformed space as the Grating Compression Transform (GCT). We argue for a 2-D sinusoidal-series amplitude modulation model of speech content in the spectrogram domain that relates to speech production characteristics such as pitch/noise of the source, pitch dynamics, formant structure and dynamics, and offset/onset content. Narrowband- and wideband-based models are shown to exhibit important distinctions in interpretation and oftentimes "dual" behavior. In the transformed GCT space, the modeling results in a novel taxonomy of signal behavior based on the distribution of formant and onset/offset content in the transformed space via source characteristics. Our formulation provides a speech-specific interpretation of the concept of "modulation" in 2-D processing in contrast to existing approaches that have done so either phenomenologically through qualitative analyses and/or implicitly through data-driven machine learning approaches. One implication of the proposed taxonomy is its potential for interpreting transformations of other time-frequency distributions such as the auditory spectrogram which is generally viewed as being "narrowband"/"wideband" in its low/high-frequency regions. The proposed signal model is evaluated in several ways. First, we perform analysis of synthetic speech signals to characterize its properties and limitations. Next, we develop an algorithm for analysis/synthesis of spectrograms using the model and demonstrate its ability to accurately represent real speech content. As an example application, we further apply the models in cochannel speaker separation, exploiting the GCT's ability to distribute speaker-specific content and often recover overlapping information through demodulation and interpolation in the 2-D GCT space. Specifically, in multi-pitch estimation, we demonstrate the GCT's ability to accurately estimate separate and crossing pitch tracks under certain conditions. Finally, we demonstrate the model's ability to separate mixtures of speech signals using both prior and estimated pitch information. Generalization to other speech-signal processing applications is proposed.by Tianyu Tom Wang.Ph.D

    Detecting and locating electronic devices using their unintended electromagnetic emissions

    Get PDF
    Electronically-initiated explosives can have unintended electromagnetic emissions which propagate through walls and sealed containers. These emissions, if properly characterized, enable the prompt and accurate detection of explosive threats. The following dissertation develops and evaluates techniques for detecting and locating common electronic initiators. The unintended emissions of radio receivers and microcontrollers are analyzed. These emissions are low-power radio signals that result from the device\u27s normal operation. In the first section, it is demonstrated that arbitrary signals can be injected into a radio receiver\u27s unintended emissions using a relatively weak stimulation signal. This effect is called stimulated emissions. The performance of stimulated emissions is compared to passive detection techniques. The novel technique offers a 5 to 10 dB sensitivity improvement over passive methods for detecting radio receivers. The second section develops a radar-like technique for accurately locating radio receivers. The radar utilizes the stimulated emissions technique with wideband signals. A radar-like system is designed and implemented in hardware. Its accuracy tested in a noisy, multipath-rich, indoor environment. The proposed radar can locate superheterodyne radio receivers with a root mean square position error less than 5 meters when the SNR is 15 dB or above. In the third section, an analytic model is developed for the unintended emissions of microcontrollers. It is demonstrated that these emissions consist of a periodic train of impulses. Measurements of an 8051 microcontroller validate this model. The model is used to evaluate the noise performance of several existing algorithms. Results indicate that the pitch estimation techniques have a 4 dB sensitivity improvement over epoch folding algorithms --Abstract, page iii

    Shuttle Ku-band and S-band communications implementations study

    Get PDF
    The interfaces between the Ku-band system and the TDRSS, between the S-band system and the TDRSS, GSTDN and SGLS networks, and between the S-band payload communication equipment and the other Orbiter avionic equipment were investigated. The principal activities reported are: (1) performance analysis of the payload narrowband bent-pipe through the Ku-band communication system; (2) performance evaluation of the TDRSS user constraints placed on the S-band and Ku-band communication systems; (3) assessment of the shuttle-unique S-band TDRSS ground station false lock susceptibility; (4) development of procedure to make S-band antenna measurements during orbital flight; (5) development of procedure to make RFI measurements during orbital flight to assess the performance degradation to the TDRSS S-band communication link; and (6) analysis of the payload interface integration problem areas

    Study to investigate and evaluate means of optimizing the Ku-band combined radar/communication functions for the space shuttle

    Get PDF
    The performance of the space shuttle orbiter's Ku-Band integrated radar and communications equipment is analyzed for the radar mode of operation. The block diagram of the rendezvous radar subsystem is described. Power budgets for passive target detection are calculated, based on the estimated values of system losses. Requirements for processing of radar signals in the search and track modes are examined. Time multiplexed, single-channel, angle tracking of passive scintillating targets is analyzed. Radar performance in the presence of main lobe ground clutter is considered and candidate techniques for clutter suppression are discussed. Principal system parameter drivers are examined for the case of stationkeeping at ranges comparable to target dimension. Candidate ranging waveforms for short range operation are analyzed and compared. The logarithmic error discriminant utilized for range, range rate and angle tracking is formulated and applied to the quantitative analysis of radar subsystem tracking loops
    corecore