8,843 research outputs found
Proceedings of the Second International Mobile Satellite Conference (IMSC 1990)
Presented here are the proceedings of the Second International Mobile Satellite Conference (IMSC), held June 17-20, 1990 in Ottawa, Canada. Topics covered include future mobile satellite communications concepts, aeronautical applications, modulation and coding, propagation and experimental systems, mobile terminal equipment, network architecture and control, regulatory and policy considerations, vehicle antennas, and speech compression
Near-Instantaneously Adaptive HSDPA-Style OFDM Versus MC-CDMA Transceivers for WIFI, WIMAX, and Next-Generation Cellular Systems
Burts-by-burst (BbB) adaptive high-speed downlink packet access (HSDPA) style multicarrier systems are reviewed, identifying their most critical design aspects. These systems exhibit numerous attractive features, rendering them eminently eligible for employment in next-generation wireless systems. It is argued that BbB-adaptive or symbol-by-symbol adaptive orthogonal frequency division multiplex (OFDM) modems counteract the near instantaneous channel quality variations and hence attain an increased throughput or robustness in comparison to their fixed-mode counterparts. Although they act quite differently, various diversity techniques, such as Rake receivers and space-time block coding (STBC) are also capable of mitigating the channel quality variations in their effort to reduce the bit error ratio (BER), provided that the individual antenna elements experience independent fading. By contrast, in the presence of correlated fading imposed by shadowing or time-variant multiuser interference, the benefits of space-time coding erode and it is unrealistic to expect that a fixed-mode space-time coded system remains capable of maintaining a near-constant BER
New Directions in Subband Coding
Two very different subband coders are described. The first is a modified dynamic bit-allocation-subband coder (D-SBC) designed for variable rate coding situations and easily adaptable to noisy channel environments. It can operate at rates as low as 12 kb/s and still give good quality speech. The second coder is a 16-kb/s waveform coder, based on a combination of subband coding and vector quantization (VQ-SBC). The key feature of this coder is its short coding delay, which makes it suitable for real-time communication networks. The speech quality of both coders has been enhanced by adaptive postfiltering. The coders have been implemented on a single AT&T DSP32 signal processo
Improved compactly computable objective measures for predicting the acceptiability of speech communications systems
Issued as Monthly status reports [1-7], and Final report, Project no. E-21-61
BigEAR: Inferring the Ambient and Emotional Correlates from Smartphone-based Acoustic Big Data
This paper presents a novel BigEAR big data framework that employs
psychological audio processing chain (PAPC) to process smartphone-based
acoustic big data collected when the user performs social conversations in
naturalistic scenarios. The overarching goal of BigEAR is to identify moods of
the wearer from various activities such as laughing, singing, crying, arguing,
and sighing. These annotations are based on ground truth relevant for
psychologists who intend to monitor/infer the social context of individuals
coping with breast cancer. We pursued a case study on couples coping with
breast cancer to know how the conversations affect emotional and social well
being. In the state-of-the-art methods, psychologists and their team have to
hear the audio recordings for making these inferences by subjective evaluations
that not only are time-consuming and costly, but also demand manual data coding
for thousands of audio files. The BigEAR framework automates the audio
analysis. We computed the accuracy of BigEAR with respect to the ground truth
obtained from a human rater. Our approach yielded overall average accuracy of
88.76% on real-world data from couples coping with breast cancer.Comment: 6 pages, 10 equations, 1 Table, 5 Figures, IEEE International
Workshop on Big Data Analytics for Smart and Connected Health 2016, June 27,
2016, Washington DC, US
The intensity JND comes from Poisson neural noise: Implications for image coding
While the problems of image coding and audio coding have frequently
been assumed to have similarities, specific sets of relationships
have remained vague. One area where there should be a meaningful
comparison is with central masking noise estimates, which
define the codec's quantizer step size.
In the past few years, progress has been made on this problem
in the auditory domain (Allen and Neely, J. Acoust. Soc. Am.,
{\bf 102}, 1997, 3628-46; Allen, 1999, Wiley Encyclopedia of
Electrical and Electronics Engineering, Vol. 17, p. 422-437,
Ed. Webster, J.G., John Wiley \& Sons, Inc, NY).
It is possible that some useful insights might now be obtained
by comparing the auditory and visual cases.
In the auditory case it has been shown, directly from psychophysical
data, that below about 5 sones
(a measure of loudness, a unit of psychological intensity),
the loudness JND is proportional to the square root of the loudness
\DL(\L) \propto \sqrt{\L(I)}.
This is true for both wideband noise and tones, having
a frequency of 250 Hz or greater.
Allen and Neely interpret this to mean that the internal noise is
Poisson, as would be expected from neural point process noise.
It follows directly that the Ekman fraction (the relative loudness JND),
decreases as one over the square root of the loudness, namely
\DL/\L \propto 1/\sqrt{\L}.
Above {\L} = 5 sones, the relative loudness JND
\DL/\L \approx 0.03 (i.e., Ekman law).
It would be very interesting to know if this same
relationship holds for the visual case between brightness \B(I)
and the brightness JND \DB(I). This might be tested by measuring
both the brightness JND and the brightness as a function of
intensity, and transforming the intensity JND into a brightness JND, namely
\DB(I) = \B(I+ \DI) - \B(I)
\approx \DI \frac{d\B}{dI}.
If the Poisson nature of the loudness relation (below 5 sones)
is a general result of central neural noise, as is anticipated,
then one would expect that it would also hold in vision,
namely that \DB(\B) \propto \sqrt{\B(I)}.
%The history of this problem is fascinating, starting with Weber and Fechner.
It is well documented that the exponent in the S.S. Stevens' power
law is the same for loudness and brightness (Stevens, 1961)
\nocite{Stevens61a}
(i.e., both brightness \B(I) and loudness \L(I) are proportional to
). Furthermore, the brightness JND data are more like
Riesz's near miss data than recent 2AFC studies of JND measures
\cite{Hecht34,Gescheider97}
A Subband-Based SVM Front-End for Robust ASR
This work proposes a novel support vector machine (SVM) based robust
automatic speech recognition (ASR) front-end that operates on an ensemble of
the subband components of high-dimensional acoustic waveforms. The key issues
of selecting the appropriate SVM kernels for classification in frequency
subbands and the combination of individual subband classifiers using ensemble
methods are addressed. The proposed front-end is compared with state-of-the-art
ASR front-ends in terms of robustness to additive noise and linear filtering.
Experiments performed on the TIMIT phoneme classification task demonstrate the
benefits of the proposed subband based SVM front-end: it outperforms the
standard cepstral front-end in the presence of noise and linear filtering for
signal-to-noise ratio (SNR) below 12-dB. A combination of the proposed
front-end with a conventional front-end such as MFCC yields further
improvements over the individual front ends across the full range of noise
levels
Perceptual models in speech quality assessment and coding
The ever-increasing demand for good communications/toll
quality speech has created a renewed interest into the
perceptual impact of rate compression. Two general areas are
investigated in this work, namely speech quality assessment
and speech coding.
In the field of speech quality assessment, a model is
developed which simulates the processing stages of the
peripheral auditory system. At the output of the model a
"running" auditory spectrum is obtained. This represents
the auditory (spectral) equivalent of any acoustic sound such
as speech. Auditory spectra from coded speech segments serve
as inputs to a second model. This model simulates the
information centre in the brain which performs the speech
quality assessment. [Continues.
- …