92,047 research outputs found
End to End Deep Neural Network Frequency Demodulation of Speech Signals
Frequency modulation (FM) is a form of radio broadcasting which is widely
used nowadays and has been for almost a century. We suggest a
software-defined-radio (SDR) receiver for FM demodulation that adopts an
end-to-end learning based approach and utilizes the prior information of
transmitted speech message in the demodulation process. The receiver detects
and enhances speech from the in-phase and quadrature components of its base
band version. The new system yields high performance detection for both
acoustical disturbances, and communication channel noise and is foreseen to
out-perform the established methods for low signal to noise ratio (SNR)
conditions in both mean square error and in perceptual evaluation of speech
quality score
Recommended from our members
A Prototype Toolkit For Evaluating Indoor Environmental Quality In Commercial Buildings
Measurement of building environmental parameters is often complex, expensive, and not easily proceduralized in a manner that covers all commercial buildings. Evaluating building indoor environmental quality performance is therefore not standard practice. This project developed a prototype toolkit that addressed existing barriers to widespread indoor environmental quality performance evaluation. A toolkit with both hardware and software elements was designed for practitioners around the indoor environmental quality requirements of the American Society of Heating, Refrigeration and Air Conditioning Engineers / Chartered Institution of Building Services / United States Green Building Council Performance Measurement Protocols. This unique toolkit was built on a wireless mesh network with a web-based data collection, analysis, and reporting application. The toolkit provided a fast, robust deployment of sensors, real-time data analysis, Performance Measurement Protocol-based analysis methods and a scorecard and report generation tools. A web-enabled Geographic Information System-based metadata collection system also reduced field-study deployment time. The toolkit was evaluated through three case studies, which were discussed in this report
Voice input/output capabilities at Perception Technology Corporation
Condensed resumes of key company personnel at the Perception Technology Corporation are presented. The staff possesses recognition, speech synthesis, speaker authentication, and language identification. Hardware and software engineers' capabilities are included
Objective dysphonia quantification in vocal fold paralysis: comparing nonlinear with classical measures
Clinical acoustic voice recording analysis is usually performed using classical perturbation measures including jitter, shimmer and noise-to-harmonic ratios. However, restrictive mathematical limitations of these measures prevent analysis for severely dysphonic voices. Previous studies of alternative nonlinear random measures addressed wide varieties of vocal pathologies. Here, we analyze a single vocal pathology cohort, testing the performance of these alternative measures alongside classical measures.

We present voice analysis pre- and post-operatively in unilateral vocal fold paralysis (UVFP) patients and healthy controls, patients undergoing standard medialisation thyroplasty surgery, using jitter, shimmer and noise-to-harmonic ratio (NHR), and nonlinear recurrence period density entropy (RPDE), detrended fluctuation analysis (DFA) and correlation dimension. Systematizing the preparative editing of the recordings, we found that the novel measures were more stable and hence reliable, than the classical measures, on healthy controls.

RPDE and jitter are sensitive to improvements pre- to post-operation. Shimmer, NHR and DFA showed no significant change (p > 0.05). All measures detect statistically significant and clinically important differences between controls and patients, both treated and untreated (p < 0.001, AUC > 0.7). Pre- to post-operation, GRBAS ratings show statistically significant and clinically important improvement in overall dysphonia grade (G) (AUC = 0.946, p < 0.001).

Re-calculating AUCs from other study data, we compare these results in terms of clinical importance. We conclude that, when preparative editing is systematized, nonlinear random measures may be useful UVFP treatment effectiveness monitoring tools, and there may be applications for other forms of dysphonia.

Using Transcoding for Hidden Communication in IP Telephony
The paper presents a new steganographic method for IP telephony called
TranSteg (Transcoding Steganography). Typically, in steganographic
communication it is advised for covert data to be compressed in order to limit
its size. In TranSteg it is the overt data that is compressed to make space for
the steganogram. The main innovation of TranSteg is to, for a chosen voice
stream, find a codec that will result in a similar voice quality but smaller
voice payload size than the originally selected. Then, the voice stream is
transcoded. At this step the original voice payload size is intentionally
unaltered and the change of the codec is not indicated. Instead, after placing
the transcoded voice payload, the remaining free space is filled with hidden
data. TranSteg proof of concept implementation was designed and developed. The
obtained experimental results are enclosed in this paper. They prove that the
proposed method is feasible and offers a high steganographic bandwidth.
TranSteg detection is difficult to perform when performing inspection in a
single network localisation.Comment: 17 pages, 16 figures, 4 table
- …