887 research outputs found
A combined statistical and machine learning approach for single channel speech enhancement
University of Minnesota Ph.D. dissertation. May 2015. Major: Electrical Engineering. Advisor: Zhi-Quan Luo. 1 computer file (PDF); ix, 116 pages.In this thesis, we study the single-channel speech enhancement problem, the goal of which is to recover a desired speech from a monaural noisy recording. Speech enhancement is a focal issue to study due to is widespread usage in speech-related applications, such as hearing aids, mobile communications, and speech recognition systems. Three speech enhancement algorithms are proposed. In the rst algorithm, the Wiener Non-negative Matrix Factorization (WNMF), we combine the traditional Wiener ltering and the NMF into a single optimization problem. The objective is to minimize the mean square error, similar to Wiener ltering, and the constraints ensure the enhanced speeches are sparsely representable by the speech model learned by NMF. WNMF is novel because it utilizes NMF to capture the speech-specific structure while simultaneously leveraging it, thus improving the Wiener filtering. For the second algorithm, we propose a Sparse Gaussian Mixture Model (SGMM) that extends the traditional NMF and the Gaussian model. SGMM better captures the complex structure of speech than the traditional NMF. To control for overrepresentation of SGMM, we impose sparsity in order to ensure that only a few Gaussian models are simultaneously active. Computationally, it is achieved by using a l0-norm in the constraint of the maximum-likelihood (ML) estimation. The contribution of SGMM is in solving the constrained ML estimation, which has a closed form update even with the non-convex and non-smooth l0-norm constraint. The final algorithm proposed is the Sparse NMF + Deep Neural Network (SNMF-DNN), in which we treat speech enhancement as a supervised regression problem - the goal being to estimate the optimal enhancement gain. SNMF, originally designed for source separation, is used to extract features from the noisy recording. DNN is subsequently trained to estimate the optimal enhancement gain. Although our system is simple and does not require any sophisticated handcrafted features, we are able to demonstrate a substantial improvement in both intelligibility and enhanced speech quality
A circular elastic cylinder under its own weight
AbstractAn exact analysis of deformation and stress field in a finite circular elastic cylinder under its own weight is presented, with emphasis on the end effect. The problem is formulated on the basis of the state space formalism for axisymmetric deformation of a transversely isotropic body. Upon delineating the Hamiltonian characteristics of the formulation, a rigorous solution which satisfies the end conditions is determined by using eigenfunction expansion. The results show that the end effect is significant but confined to a local region near the base where the displacement and stress distributions are remarkably different from those according to the simplified solution that gives a uniaxial stress state. It is more pronounced in the cylinder with the bottom plane being perfectly bonded than in smooth contact with a rigid base
A Flexible-Frame-Rate Vision-Aided Inertial Object Tracking System for Mobile Devices
Real-time object pose estimation and tracking is challenging but essential
for emerging augmented reality (AR) applications. In general, state-of-the-art
methods address this problem using deep neural networks which indeed yield
satisfactory results. Nevertheless, the high computational cost of these
methods makes them unsuitable for mobile devices where real-world applications
usually take place. In addition, head-mounted displays such as AR glasses
require at least 90~FPS to avoid motion sickness, which further complicates the
problem. We propose a flexible-frame-rate object pose estimation and tracking
system for mobile devices. It is a monocular visual-inertial-based system with
a client-server architecture. Inertial measurement unit (IMU) pose propagation
is performed on the client side for high speed tracking, and RGB image-based 3D
pose estimation is performed on the server side to obtain accurate poses, after
which the pose is sent to the client side for visual-inertial fusion, where we
propose a bias self-correction mechanism to reduce drift. We also propose a
pose inspection algorithm to detect tracking failures and incorrect pose
estimation. Connected by high-speed networking, our system supports flexible
frame rates up to 120 FPS and guarantees high precision and real-time tracking
on low-end devices. Both simulations and real world experiments show that our
method achieves accurate and robust object tracking
Integrin-mediated membrane blebbing is dependent on the NHE1 and NCX1 activities.
Integrin-mediated signal transduction and membrane blebbing have been well studied to modulate cell adhesion, spreading and migration^1-6^. However, the relationship between membrane blebbing and integrin signaling has not been explored. Here we show that integrin-ligand interaction induces membrane blebbing and membrane permeability change. We found that sodium-proton exchanger 1 (NHE1) and sodium-calcium exchanger 1 (NCX1) are located in the membrane blebbing sites and inhibition of NHE1 disrupts membrane blebbing and decreases membrane permeability change. However, inhibition of NCX1 enhances cell blebbing to cause cell swelling which is correlated with an intracellular sodium accumulation induced by NHE17. These data suggest that sodium influx induced by NHE1 is a driving force for membrane blebbing growth, while sodium efflux induced by NCX1 in a reverse mode causes membrane blebbing retraction. Together, these data reveal a novel function of NHE1 and NCX1 in membrane permeability change and blebbing and provide the link for integrin signaling and membrane blebbing
BN-embedded monolayer graphene with tunable electronic and topological properties
Finding an effective and controllable way to create a sizable energy gap in
graphene-based systems has been a challenging topic of intensive research. We
propose that the hybrid of boron nitride and graphene (h-BNC) at low BN doping
serves as an ideal platform for band-gap engineering and valleytronic
applications. We report a systematic first-principles study of the atomic
configurations and band gap opening for energetically favorable BN patches
embedded in graphene. Based on first-principles calculations, we construct a
tight-binding model to simulate general doping configurations in large
supercells. Unexpectedly, the calculations find a linear dependence of the band
gap on the effective BN concentration at low doping, arising from an induced
effective on-site energy difference at the two C sublattices as they are
substituted by B and N dopants alternately. The significant and tunable band
gap of a few hundred meVs, with preserved topological properties of graphene
and feasible sample preparation in the laboratory, presents great opportunities
to realize valley physics applications in graphene systems at room temperature
An Exploration of Prompt Tuning on Generative Spoken Language Model for Speech Processing Tasks
Speech representations learned from Self-supervised learning (SSL) models
have been found beneficial for various speech processing tasks. However,
utilizing SSL representations usually requires fine-tuning the pre-trained
models or designing task-specific downstream models and loss functions, causing
much memory usage and human labor. On the other hand, prompting in Natural
Language Processing (NLP) is an efficient and widely used technique to leverage
pre-trained language models (LMs). Nevertheless, such a paradigm is little
studied in the speech community. We report in this paper the first exploration
of the prompt tuning paradigm for speech processing tasks based on Generative
Spoken Language Model (GSLM). Experiment results show that the prompt tuning
technique achieves competitive performance in speech classification tasks with
fewer trainable parameters than fine-tuning specialized downstream models. We
further study the technique in challenging sequence generation tasks. Prompt
tuning also demonstrates its potential, while the limitation and possible
research directions are discussed in this paper.Comment: Submitted to Interspeech 202
- …