92 research outputs found
A Comparison of Front-Ends for Bitstream-Based ASR over IP
Automatic speech recognition (ASR) is called to play a relevant role in the provision of spoken interfaces for IP-based applications. However, as a consequence of the transit of the speech signal over these particular networks, ASR systems need to face two new challenges: the impoverishment of the speech quality due to the compression needed to fit the channel capacity and the inevitable occurrence of packet losses.
In this framework, bitstream-based approaches that obtain the ASR feature vectors directly from the coded bitstream, avoiding the speech decoding process, have been proposed ([S.H. Choi, H.K. Kim, H.S. Lee, Speech recognition using quantized LSP parameters and their transformations in digital communications, Speech Commun. 30 (4) (2000) 223â233. A. Gallardo-AntolĂn, C. PelĂ ez-Moreno, F. DĂaz-de-MarĂa, Recognizing GSM digital speech, IEEE Trans. Speech Audio Process., to appear. H.K. Kim, R.V. Cox, R.C. Rose, Performance improvement of a bitstream-based front-end for wireless speech recognition in adverse environments, IEEE Trans. Speech Audio Process. 10 (8) (2002) 591â604. C. PelĂĄez-Moreno, A. Gallardo-AntolĂn, F. DĂaz-de-MarĂa, Recognizing voice over IP networks: a robust front-end for speech recognition on the WWW, IEEE Trans. Multimedia 3(2) (2001) 209â218], among others) to improve the robustness of ASR systems. LSP (Line Spectral Pairs) are the preferred set of parameters for the description of the speech spectral envelope in most of the modern speech coders. Nevertheless, LSP have proved to be unsuitable for ASR, and they must be transformed into cepstrum-type parameters. In this paper we comparatively evaluate the robustness of the most significant LSP to cepstrum transformations in a simulated VoIP (voice over IP) environment which includes two of the most popular codecs used in that network (G.723.1 and G.729) and several network conditions. In particular, we compare âpseudocepstrumâ [H.K. Kim, S.H. Choi, H.S. Lee, On approximating Line Spectral Frequencies to LPC cepstral coefficients, IEEE Trans. Speech Audio Process. 8 (2) (2000) 195â199], an approximated but straightforward transformation of LSP into LP cepstral coefficients, with a more computationally demanding but exact one. Our results show that pseudocepstrum is preferable when network conditions are good or computational resources low, while the exact procedure is recommended when network conditions become more adverse.Publicad
Graded quantization for multiple description coding of compressive measurements
Compressed sensing (CS) is an emerging paradigm for acquisition of compressed
representations of a sparse signal. Its low complexity is appealing for
resource-constrained scenarios like sensor networks. However, such scenarios
are often coupled with unreliable communication channels and providing robust
transmission of the acquired data to a receiver is an issue. Multiple
description coding (MDC) effectively combats channel losses for systems without
feedback, thus raising the interest in developing MDC methods explicitly
designed for the CS framework, and exploiting its properties. We propose a
method called Graded Quantization (CS-GQ) that leverages the democratic
property of compressive measurements to effectively implement MDC, and we
provide methods to optimize its performance. A novel decoding algorithm based
on the alternating directions method of multipliers is derived to reconstruct
signals from a limited number of received descriptions. Simulations are
performed to assess the performance of CS-GQ against other methods in presence
of packet losses. The proposed method is successful at providing robust coding
of CS measurements and outperforms other schemes for the considered test
metrics
Regular Topologies for Gigabit Wide-Area Networks
In general terms, this project aimed at the analysis and design of techniques for very high-speed networking. The formal objectives of the project were to: (1) Identify switch and network technologies for wide-area networks that interconnect a large number of users and can provide individual data paths at gigabit/s rates; (2) Quantitatively evaluate and compare existing and proposed architectures and protocols, identify their strength and growth potentials, and ascertain the compatibility of competing technologies; and (3) Propose new approaches to existing architectures and protocols, and identify opportunities for research to overcome deficiencies and enhance performance. The project was organized into two parts: 1. The design, analysis, and specification of techniques and protocols for very-high-speed network environments. In this part, SRI has focused on several key high-speed networking areas, including Forward Error Control (FEC) for high-speed networks in which data distortion is the result of packet loss, and the distribution of broadband, real-time traffic in multiple user sessions. 2. Congestion Avoidance Testbed Experiment (CATE). This part of the project was done within the framework of the DARTnet experimental T1 national network. The aim of the work was to advance the state of the art in benchmarking DARTnet's performance and traffic control by developing support tools for network experimentation, by designing benchmarks that allow various algorithms to be meaningfully compared, and by investigating new queueing techniques that better satisfy the needs of best-effort and reserved-resource traffic. This document is the final technical report describing the results obtained by SRI under this project. The report consists of three volumes: Volume 1 contains a technical description of the network techniques developed by SRI in the areas of FEC and multicast of real-time traffic. Volume 2 describes the work performed under CATE. Volume 3 contains the source code of all software developed under CATE
Beyond Transmitting Bits: Context, Semantics, and Task-Oriented Communications
Communication systems to date primarily aim at reliably communicating bit
sequences. Such an approach provides efficient engineering designs that are
agnostic to the meanings of the messages or to the goal that the message
exchange aims to achieve. Next generation systems, however, can be potentially
enriched by folding message semantics and goals of communication into their
design. Further, these systems can be made cognizant of the context in which
communication exchange takes place, providing avenues for novel design
insights. This tutorial summarizes the efforts to date, starting from its early
adaptations, semantic-aware and task-oriented communications, covering the
foundations, algorithms and potential implementations. The focus is on
approaches that utilize information theory to provide the foundations, as well
as the significant role of learning in semantics and task-aware communications.Comment: 28 pages, 14 figure
Error Resilient Video Coding Using Bitstream Syntax And Iterative Microscopy Image Segmentation
There has been a dramatic increase in the amount of video traffic over the Internet in past several years. For applications like real-time video streaming and video conferencing, retransmission of lost packets is often not permitted. Popular video coding standards such as H.26x and VPx make use of spatial-temporal correlations for compression, typically making compressed bitstreams vulnerable to errors. We propose several adaptive spatial-temporal error concealment approaches for subsampling-based multiple description video coding. These adaptive methods are based on motion and mode information extracted from the H.26x video bitstreams. We also present an error resilience method using data duplication in VPx video bitstreams.
A recent challenge in image processing is the analysis of biomedical images acquired using optical microscopy. Due to the size and complexity of the images, automated segmentation methods are required to obtain quantitative, objective and reproducible measurements of biological entities. In this thesis, we present two techniques for microscopy image analysis. Our first method, âJelly Fillingâ is intended to provide 3D segmentation of biological images that contain incompleteness in dye labeling. Intuitively, this method is based on filling disjoint regions of an image with jelly-like fluids to iteratively refine segments that represent separable biological entities. Our second method selectively uses a shape-based function optimization approach and a 2D marked point process simulation, to quantify nuclei by their locations and sizes. Experimental results exhibit that our proposed methods are effective in addressing the aforementioned challenges
Progressively communicating rich telemetry from autonomous underwater vehicles via relays
Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy at the Massachusetts Institute of Technology and the Woods Hole Oceanographic Institution June 2012As analysis of imagery and environmental data plays a greater role in mission construction
and execution, there is an increasing need for autonomous marine vehicles
to transmit this data to the surface. Without access to the data acquired by a
vehicle, surface operators cannot fully understand the state of the mission. Communicating
imagery and high-resolution sensor readings to surface observers remains
a significant challenge â as a result, current telemetry from free-roaming
autonomous marine vehicles remains limited to âheartbeatâ status messages, with
minimal scientific data available until after recovery. Increasing the challenge, longdistance
communication may require relaying data across multiple acoustic hops
between vehicles, yet fixed infrastructure is not always appropriate or possible.
In this thesis I present an analysis of the unique considerations facing telemetry
systems for free-roaming Autonomous Underwater Vehicles (AUVs) used in exploration.
These considerations include high-cost vehicle nodes with persistent storage
and significant computation capabilities, combined with human surface operators
monitoring each node. I then propose mechanisms for interactive, progressive
communication of data across multiple acoustic hops. These mechanisms include
wavelet-based embedded coding methods, and a novel image compression scheme
based on texture classification and synthesis. The specific characteristics of underwater
communication channels, including high latency, intermittent communication,
the lack of instantaneous end-to-end connectivity, and a broadcast medium,
inform these proposals. Human feedback is incorporated by allowing operators to
identify segments of data thatwarrant higher quality refinement, ensuring efficient
use of limited throughput. I then analyze the performance of these mechanisms
relative to current practices.
Finally, I present CAPTURE, a telemetry architecture that builds on this analysis.
CAPTURE draws on advances in compression and delay tolerant networking to
enable progressive transmission of scientific data, including imagery, across multiple acoustic hops. In concert with a physical layer, CAPTURE provides an endto-
end networking solution for communicating science data from autonomous marine
vehicles. Automatically selected imagery, sonar, and time-series sensor data
are progressively transmitted across multiple hops to surface operators. Human
operators can request arbitrarily high-quality refinement of any resource, up to an
error-free reconstruction. The components of this system are then demonstrated
through three field trials in diverse environments on SeaBED, OceanServer and
Bluefin AUVs, each in different software architectures.Thanks to the National Science Foundation, and the
National Oceanic and Atmospheric Administration for
their funding of my education and this work
- âŠ