869 research outputs found

    Toward an interpretive framework of two-dimensional speech-signal processing

    Get PDF
    Thesis (Ph. D.)--Harvard-MIT Division of Health Sciences and Technology, 2011.Cataloged from PDF version of thesis.Includes bibliographical references (p. 177-179).Traditional representations of speech are derived from short-time segments of the signal and result in time-frequency distributions of energy such as the short-time Fourier transform and spectrogram. Speech-signal models of such representations have had utility in a variety of applications such as speech analysis, recognition, and synthesis. Nonetheless, they do not capture spectral, temporal, and joint spectrotemporal energy fluctuations (or "modulations") present in local time-frequency regions of the time-frequency distribution. Inspired by principles from image processing and evidence from auditory neurophysiological models, a variety of twodimensional (2-D) processing techniques have been explored in the literature as alternative representations of speech; however, speech-based models are lacking in this framework. This thesis develops speech-signal models for a particular 2-D processing approach in which 2-D Fourier transforms are computed on local time-frequency regions of the canonical narrowband or wideband spectrogram; we refer to the resulting transformed space as the Grating Compression Transform (GCT). We argue for a 2-D sinusoidal-series amplitude modulation model of speech content in the spectrogram domain that relates to speech production characteristics such as pitch/noise of the source, pitch dynamics, formant structure and dynamics, and offset/onset content. Narrowband- and wideband-based models are shown to exhibit important distinctions in interpretation and oftentimes "dual" behavior. In the transformed GCT space, the modeling results in a novel taxonomy of signal behavior based on the distribution of formant and onset/offset content in the transformed space via source characteristics. Our formulation provides a speech-specific interpretation of the concept of "modulation" in 2-D processing in contrast to existing approaches that have done so either phenomenologically through qualitative analyses and/or implicitly through data-driven machine learning approaches. One implication of the proposed taxonomy is its potential for interpreting transformations of other time-frequency distributions such as the auditory spectrogram which is generally viewed as being "narrowband"/"wideband" in its low/high-frequency regions. The proposed signal model is evaluated in several ways. First, we perform analysis of synthetic speech signals to characterize its properties and limitations. Next, we develop an algorithm for analysis/synthesis of spectrograms using the model and demonstrate its ability to accurately represent real speech content. As an example application, we further apply the models in cochannel speaker separation, exploiting the GCT's ability to distribute speaker-specific content and often recover overlapping information through demodulation and interpolation in the 2-D GCT space. Specifically, in multi-pitch estimation, we demonstrate the GCT's ability to accurately estimate separate and crossing pitch tracks under certain conditions. Finally, we demonstrate the model's ability to separate mixtures of speech signals using both prior and estimated pitch information. Generalization to other speech-signal processing applications is proposed.by Tianyu Tom Wang.Ph.D

    Extraction and representation of semantic information in digital media

    Get PDF

    Early adductive reasoning for blind signal separation

    Full text link
    We demonstrate that explicit and systematic incorporation of abductive reasoning capabilities into algorithms for blind signal separation can yield significant performance improvements. Our formulated mechanisms apply to the output data of signal processing modules in order to conjecture the structure of time-frequency interactions between the signal components that are to be separated. The conjectured interactions are used to drive subsequent signal separation processes that are as a result less blind to the interacting signal components and, therefore, more effective. We refer to this type of process as early abductive reasoning (EAR); the ā€œearlyā€ refers to the fact that in contrast to classical Artificial Intelligence paradigms, the reasoning process here is utilized before the signal processing transformations are completed. We have used our EAR approach to formulate a practical algorithm that is more effective in realistically noisy conditions than reference algorithms that are representative of the current state of the art in two-speaker pitch tracking. Our algorithm uses the Blackboard architecture from Artificial Intelligence to control EAR and advanced signal processing modules. The algorithm has been implemented in MATLAB and successfully tested on a database of 570 mixture signals representing simultaneous speakers in a variety of real-world, noisy environments. With 0 dB Target-to-Masking Ratio (TMR) and no noise, the Gross Error Rate (GER) for our algorithm is 5% in comparison to the best GER performance of 11% among the reference algorithms. In diffuse noisy environments (such as street or restaurant environments), we find that our algorithm on the average outperforms the best reference algorithm by 9.4%. With directional noise, our algorithm also outperforms the best reference algorithm by 29%. The extracted pitch tracks from our algorithm were also used to carry out comb filtering for separating the harmonics of the two speakers from each other and from the other sound sources in the environment. The separated signals were evaluated subjectively by a set of 20 listeners to be of reasonable quality

    Transient absorption imaging of hemeprotein in fresh muscle fibers

    Get PDF
    2022 Summer.Includes bibliographical references.Mitochondrial diseases affect 1 in 4000 individuals in the U.S. among adults and children of all races and genders. Nevertheless, these diseases are hard to diagnose because they affect each person differently. Meanwhile the gold standard diagnosis methods are usually invasive and time- consuming. Therefore, a non-invasive and in-vivo diagnosis method is highly demanded in this area. Our goal is to develop a non-invasive diagnosis method based on the endogenous nonlinear optical effect of the live tissues. Mitochondrial disease is frequently the result of a defective electron transport chain (ETC). Our goal is to develop a non-invasive way to measure redox within the ETC, specifically, of cytochromes. Cytochromes are iron porphyrins that are essential to the ETC. Their redox states can indicate cellular oxygen consumption and mitochondrial ATP production. So being able to differentiate the redox states of cytochromes will offer us a method to characterize mitochondrial function. Meanwhile, Chergui's group found out that the two redox states of cytochrome c have different pump-probe spectroscopic responses, meaning that the transient absorption (TA) decay lifetime can be a potential molecular contrast for cytochrome redox state discrimination. Their research leads us to utilize the pump-probe spectroscopic idea to develop a time-resolved optical microscopic method to differentiate not only cytochromes from other chemical compounds but also reduced cytochromes from oxidized ones. This dissertation describes groundbreaking experiments where transient absorption is used to reveal excited-state lifetime differences between healthy controls and an animal model of mitochondrial disease, in addition to differences between reduced and oxidized ETC in isolated mitochondria and fresh preparations of muscle fibers. For our initial experiments, we built a pump-probe microscopic system with a fiber laser source, producing 530nm pump and 490nm probe using a 3.5kHz laser scanning rate. The pulse durations of pump and probe are both 800fs. For the preliminary results, we have successfully achieved TA decay contrast between reduced and oxidized cytochromes in solution form. Then we have achieved SNR enhanced pump-probe image of BGO crystal particles with the help of the software- based adaptive filter noise canceling method. We also have installed a FPGA-based adaptive filter to enhance the pump-probe signals of the electrophoresis gels that contain different mitochondrial respiratory chain supercomplexes. However, because the noise floor was still 30 dB higher than shot noise limit, cytochrome imaging in live tissues was still problematic. We then built another pump-probe microscope with a solid- state ultrafast laser source. In that way, we do not need to worry about laser relative intensity noise (RIN) anymore, since the noise floor of the solid-state laser source can reach the shot noise limit at MHz region. One other advantage of the new laser source is that it can provide one tunable laser output that can be directly converted to the probe pulse with tunable center wavelength. Its tunability can cover the entire visible spectrum. We realized a pump-probe microscopy with a 520nm pump pulse and a tunable probe pulse. The tunability on the probe arm allows us to explore better pump-probe contrast between two redox states. What's more, I will introduce my preliminary results of utilizing supercontinuum generation in a photonic crystal fiber (PCF) to realize tunability on pump wavelength. In that way, more possibilities will be unlocked. And the hyperspectral pump-probe microscope will be able to distinguish more molecules

    Audio source separation for music in low-latency and high-latency scenarios

    Get PDF
    Aquesta tesi proposa mĆØtodes per tractar les limitacions de les tĆØcniques existents de separaciĆ³ de fonts musicals en condicions de baixa i alta latĆØncia. En primer lloc, ens centrem en els mĆØtodes amb un baix cost computacional i baixa latĆØncia. Proposem l'Ćŗs de la regularitzaciĆ³ de Tikhonov com a mĆØtode de descomposiciĆ³ de l'espectre en el context de baixa latĆØncia. El comparem amb les tĆØcniques existents en tasques d'estimaciĆ³ i seguiment dels tons, que sĆ³n passos crucials en molts mĆØtodes de separaciĆ³. A continuaciĆ³ utilitzem i avaluem el mĆØtode de descomposiciĆ³ de l'espectre en tasques de separaciĆ³ de veu cantada, baix i percussiĆ³. En segon lloc, proposem diversos mĆØtodes d'alta latĆØncia que milloren la separaciĆ³ de la veu cantada, grĆ cies al modelatge de components especĆ­fics, com la respiraciĆ³ i les consonants. Finalment, explorem l'Ćŗs de correlacions temporals i anotacions manuals per millorar la separaciĆ³ dels instruments de percussiĆ³ i dels senyals musicals polifĆ²nics complexes.Esta tesis propone mĆ©todos para tratar las limitaciones de las tĆ©cnicas existentes de separaciĆ³n de fuentes musicales en condiciones de baja y alta latencia. En primer lugar, nos centramos en los mĆ©todos con un bajo coste computacional y baja latencia. Proponemos el uso de la regularizaciĆ³n de Tikhonov como mĆ©todo de descomposiciĆ³n del espectro en el contexto de baja latencia. Lo comparamos con las tĆ©cnicas existentes en tareas de estimaciĆ³n y seguimiento de los tonos, que son pasos cruciales en muchos mĆ©todos de separaciĆ³n. A continuaciĆ³n utilizamos y evaluamos el mĆ©todo de descomposiciĆ³n del espectro en tareas de separaciĆ³n de voz cantada, bajo y percusiĆ³n. En segundo lugar, proponemos varios mĆ©todos de alta latencia que mejoran la separaciĆ³n de la voz cantada, gracias al modelado de componentes que a menudo no se toman en cuenta, como la respiraciĆ³n y las consonantes. Finalmente, exploramos el uso de correlaciones temporales y anotaciones manuales para mejorar la separaciĆ³n de los instrumentos de percusiĆ³n y seƱales musicales polifĆ³nicas complejas.This thesis proposes specific methods to address the limitations of current music source separation methods in low-latency and high-latency scenarios. First, we focus on methods with low computational cost and low latency. We propose the use of Tikhonov regularization as a method for spectrum decomposition in the low-latency context. We compare it to existing techniques in pitch estimation and tracking tasks, crucial steps in many separation methods. We then use the proposed spectrum decomposition method in low-latency separation tasks targeting singing voice, bass and drums. Second, we propose several high-latency methods that improve the separation of singing voice by modeling components that are often not accounted for, such as breathiness and consonants. Finally, we explore using temporal correlations and human annotations to enhance the separation of drums and complex polyphonic music signals

    Micro/Nano Manufacturing

    Get PDF
    Micro manufacturing involves dealing with the fabrication of structures in the size range of 0.1 to 1000 Āµm. The scope of nano manufacturing extends the size range of manufactured features to even smaller length scalesā€”below 100 nm. A strict borderline between micro and nano manufacturing can hardly be drawn, such that both domains are treated as complementary and mutually beneficial within a closely interconnected scientific community. Both micro and nano manufacturing can be considered as important enablers for high-end products. This Special Issue of Applied Sciences is dedicated to recent advances in research and development within the field of micro and nano manufacturing. The included papers report recent findings and advances in manufacturing technologies for producing products with micro and nano scale features and structures as well as applications underpinned by the advances in these technologies

    Speech Synthesis Based on Hidden Markov Models

    Get PDF

    Nanoscale coherent diffractive imaging using high-harmonic XUV sources

    Get PDF
    Imaging using sources in the XUV and X-ray spectral range combines high resolution with longer penetration depth (compared to electron/ion microscopy) and found applications in many areas of science and technology. Coherent diffractive imaging (CDI) techniques, in addition, lift the performance limitation of conventional XUV/X-ray microscopes imposed by image forming optics and enable diffraction limited resolutions. Until recently, CDI techniques were mainly confined to large scale facilities e.g. synchrotrons and X-ray free electron lasers due to unavailability of suitable table-top XUV/X-ray sources. Table-top sources based on high-order harmonic generation (HHG) nowadays offer high and coherent photon flux which widened the accessibility of CDI techniques. So far, table-top CDI systems were not able to resolve sub-100 nm features using performance metrics that can qualify these systems for real world applications. In this work, CDI experiments with the highest resolutions in different modalities using a high flux fiber laser driven HHG source are presented. In conventional CDI, a record-high resolution of 13 nm is demonstrated together with the possibility of high speed acquisition with sub-30 nm resolution. In a holographic implementation of CDI, features with a half-distance of 23 nm are resolved which are the smallest features to ever be resolved with a table-top XUV/X-ray imaging system. Ptychographic imaging of extended samples is also performed using a reliable Rayleigh-like resolution metric and resolving of features as small as 2.5 wavelengths is demonstrated. These systems can find applications in material and biological sciences, study of ultrafast dynamics, imaging of semiconductor structures and EUV lithographic mask inspection

    Target recognition techniques for multifunction phased array radar

    Get PDF
    This thesis, submitted for the degree of Doctor of Philosophy at University College London, is a discussion and analysis of combined stepped-frequency and pulse-Doppler target recognition methods which enable a multifunction phased array radar designed for automatic surveillance and multi-target tracking to offer a Non Cooperative Target Recognition (NCTR) capability. The primary challenge is to investigate the feasibility of NCTR via the use of high range resolution profiles. Given stepped frequency waveforms effectively trade time for enhanced bandwidth, and thus resolution, attention is paid to the design of a compromise between resolution and dwell time. A secondary challenge is to investigate the additional benefits to overall target classification when the number of coherent pulses within an NCTR wavefrom is expanded to enable the extraction of spectral features which can help to differentiate particular classes of target. As with increased range resolution, the price for this extra information is a further increase in dwell time. The response to the primary and secondary challenges described above has involved the development of a number of novel techniques, which are summarized below: ā€¢ Design and execution of a series of experiments to further the understanding of multifunction phased array Radar NCTR techniques ā€¢ Development of a ā€˜Hybridā€™ stepped frequency technique which enables a significant extension of range profiles without the proportional trade in resolution as experienced with ā€˜Classicalā€™ techniques ā€¢ Development of an ā€˜end to endā€™ NCTR processing and visualization pipeline ā€¢ Use of ā€˜Doppler fractionā€™ spectral features to enable aircraft target classification via propulsion mechanism. Combination of Doppler fraction and physical length features to enable broad aircraft type classification. ā€¢ Optimization of NCTR method classification performance as a function of feature and waveform parameters. ā€¢ Generic waveform design tools to enable delivery of time costly NCTR waveforms within operational constraints. The thesis is largely based upon an analysis of experimental results obtained using the multifunction phased array radar MESAR2, based at BAE Systems on the Isle of Wight. The NCTR mode of MESAR2 consists of the transmission and reception of successive multi-pulse coherent bursts upon each target being tracked. Each burst is stepped in frequency resulting in an overall bandwidth sufficient to provide sub-metre range resolution. A sequence of experiments, (static trials, moving point target trials and full aircraft trials) are described and an analysis of the robustness of target length and Doppler spectra feature measurements from NCTR mode data recordings is presented. A recorded data archive of 1498 NCTR looks upon 17 different trials aircraft using five different varieties of stepped frequency waveform is used to determine classification performance as a function of various signal processing parameters and extent (numbers of pulses) of the data used. From analysis of the trials data, recommendations are made with regards to the design of an NCTR mode for an operational system that uses stepped frequency techniques by design choice
    • ā€¦
    corecore