887 research outputs found

    A combined statistical and machine learning approach for single channel speech enhancement

    Get PDF
    University of Minnesota Ph.D. dissertation. May 2015. Major: Electrical Engineering. Advisor: Zhi-Quan Luo. 1 computer file (PDF); ix, 116 pages.In this thesis, we study the single-channel speech enhancement problem, the goal of which is to recover a desired speech from a monaural noisy recording. Speech enhancement is a focal issue to study due to is widespread usage in speech-related applications, such as hearing aids, mobile communications, and speech recognition systems. Three speech enhancement algorithms are proposed. In the rst algorithm, the Wiener Non-negative Matrix Factorization (WNMF), we combine the traditional Wiener ltering and the NMF into a single optimization problem. The objective is to minimize the mean square error, similar to Wiener ltering, and the constraints ensure the enhanced speeches are sparsely representable by the speech model learned by NMF. WNMF is novel because it utilizes NMF to capture the speech-specific structure while simultaneously leveraging it, thus improving the Wiener filtering. For the second algorithm, we propose a Sparse Gaussian Mixture Model (SGMM) that extends the traditional NMF and the Gaussian model. SGMM better captures the complex structure of speech than the traditional NMF. To control for overrepresentation of SGMM, we impose sparsity in order to ensure that only a few Gaussian models are simultaneously active. Computationally, it is achieved by using a l0-norm in the constraint of the maximum-likelihood (ML) estimation. The contribution of SGMM is in solving the constrained ML estimation, which has a closed form update even with the non-convex and non-smooth l0-norm constraint. The final algorithm proposed is the Sparse NMF + Deep Neural Network (SNMF-DNN), in which we treat speech enhancement as a supervised regression problem - the goal being to estimate the optimal enhancement gain. SNMF, originally designed for source separation, is used to extract features from the noisy recording. DNN is subsequently trained to estimate the optimal enhancement gain. Although our system is simple and does not require any sophisticated handcrafted features, we are able to demonstrate a substantial improvement in both intelligibility and enhanced speech quality

    A circular elastic cylinder under its own weight

    Get PDF
    AbstractAn exact analysis of deformation and stress field in a finite circular elastic cylinder under its own weight is presented, with emphasis on the end effect. The problem is formulated on the basis of the state space formalism for axisymmetric deformation of a transversely isotropic body. Upon delineating the Hamiltonian characteristics of the formulation, a rigorous solution which satisfies the end conditions is determined by using eigenfunction expansion. The results show that the end effect is significant but confined to a local region near the base where the displacement and stress distributions are remarkably different from those according to the simplified solution that gives a uniaxial stress state. It is more pronounced in the cylinder with the bottom plane being perfectly bonded than in smooth contact with a rigid base

    A Flexible-Frame-Rate Vision-Aided Inertial Object Tracking System for Mobile Devices

    Full text link
    Real-time object pose estimation and tracking is challenging but essential for emerging augmented reality (AR) applications. In general, state-of-the-art methods address this problem using deep neural networks which indeed yield satisfactory results. Nevertheless, the high computational cost of these methods makes them unsuitable for mobile devices where real-world applications usually take place. In addition, head-mounted displays such as AR glasses require at least 90~FPS to avoid motion sickness, which further complicates the problem. We propose a flexible-frame-rate object pose estimation and tracking system for mobile devices. It is a monocular visual-inertial-based system with a client-server architecture. Inertial measurement unit (IMU) pose propagation is performed on the client side for high speed tracking, and RGB image-based 3D pose estimation is performed on the server side to obtain accurate poses, after which the pose is sent to the client side for visual-inertial fusion, where we propose a bias self-correction mechanism to reduce drift. We also propose a pose inspection algorithm to detect tracking failures and incorrect pose estimation. Connected by high-speed networking, our system supports flexible frame rates up to 120 FPS and guarantees high precision and real-time tracking on low-end devices. Both simulations and real world experiments show that our method achieves accurate and robust object tracking

    Integrin-mediated membrane blebbing is dependent on the NHE1 and NCX1 activities.

    Get PDF
    Integrin-mediated signal transduction and membrane blebbing have been well studied to modulate cell adhesion, spreading and migration^1-6^. However, the relationship between membrane blebbing and integrin signaling has not been explored. Here we show that integrin-ligand interaction induces membrane blebbing and membrane permeability change. We found that sodium-proton exchanger 1 (NHE1) and sodium-calcium exchanger 1 (NCX1) are located in the membrane blebbing sites and inhibition of NHE1 disrupts membrane blebbing and decreases membrane permeability change. However, inhibition of NCX1 enhances cell blebbing to cause cell swelling which is correlated with an intracellular sodium accumulation induced by NHE17. These data suggest that sodium influx induced by NHE1 is a driving force for membrane blebbing growth, while sodium efflux induced by NCX1 in a reverse mode causes membrane blebbing retraction. Together, these data reveal a novel function of NHE1 and NCX1 in membrane permeability change and blebbing and provide the link for integrin signaling and membrane blebbing

    BN-embedded monolayer graphene with tunable electronic and topological properties

    Full text link
    Finding an effective and controllable way to create a sizable energy gap in graphene-based systems has been a challenging topic of intensive research. We propose that the hybrid of boron nitride and graphene (h-BNC) at low BN doping serves as an ideal platform for band-gap engineering and valleytronic applications. We report a systematic first-principles study of the atomic configurations and band gap opening for energetically favorable BN patches embedded in graphene. Based on first-principles calculations, we construct a tight-binding model to simulate general doping configurations in large supercells. Unexpectedly, the calculations find a linear dependence of the band gap on the effective BN concentration at low doping, arising from an induced effective on-site energy difference at the two C sublattices as they are substituted by B and N dopants alternately. The significant and tunable band gap of a few hundred meVs, with preserved topological properties of graphene and feasible sample preparation in the laboratory, presents great opportunities to realize valley physics applications in graphene systems at room temperature

    An Exploration of Prompt Tuning on Generative Spoken Language Model for Speech Processing Tasks

    Full text link
    Speech representations learned from Self-supervised learning (SSL) models have been found beneficial for various speech processing tasks. However, utilizing SSL representations usually requires fine-tuning the pre-trained models or designing task-specific downstream models and loss functions, causing much memory usage and human labor. On the other hand, prompting in Natural Language Processing (NLP) is an efficient and widely used technique to leverage pre-trained language models (LMs). Nevertheless, such a paradigm is little studied in the speech community. We report in this paper the first exploration of the prompt tuning paradigm for speech processing tasks based on Generative Spoken Language Model (GSLM). Experiment results show that the prompt tuning technique achieves competitive performance in speech classification tasks with fewer trainable parameters than fine-tuning specialized downstream models. We further study the technique in challenging sequence generation tasks. Prompt tuning also demonstrates its potential, while the limitation and possible research directions are discussed in this paper.Comment: Submitted to Interspeech 202
    • …
    corecore