103 research outputs found

    Android real-time audio communications over local wireless

    Full text link
    This paper describes an Android mobile application that allows voice communications through short-range wireless networks, mainly Bluetooth and Wi-Fi. The application is able to replicate as close as possible the behavior of a two-way radio device. The application is designed to receive audio streams from multiple devices simultaneously and to send them. The main design considerations of the application, such as audio recording and playing, audio coding or data transmission, are explained through the paper.Belda Ortega, R.; Arce Vila, P.; De Fez Lava, I.; Fraile Gil, F.; Guerri Cebollada, JC. (2012). Android real-time audio communications over local wireless. Waves. (4):35-42. http://hdl.handle.net/10251/57677S3542

    EVA Radio DRATS 2011 Report

    Get PDF
    In the Fall of 2011, National Aeronautics and Space Administration (NASA) Glenn Research Center (GRC) participated in the Desert Research and Technology Studies (DRATS) field experiments held near Flagstaff, Arizona. The objective of the DRATS outing is to provide analog mission testing of candidate technologies for space exploration, especially those technologies applicable to human exploration of extra- terrestrial rocky bodies. These activities are performed at locations with similarities to extra-terrestrial conditions. This report describes the Extravehicular Activity (EVA) Dual-Band Radio Communication System which was demonstrated during the 2011 outing. The EVA radio system is designed to transport both voice and telemetry data through a mobile ad hoc wireless network and employs a dual-band radio configuration. Some key characteristics of this system include: 1. Dual-band radio configuration. 2. Intelligent switching between two different capability wireless networks. 3. Self-healing network. 4. Simultaneous data and voice communication

    Characterization of speaker recognition in noisy channels

    Get PDF
    Speaker recognition is a frequently overlooked form of biometric security. Text-independent speaker identification is used by financial services, forensic experts, and human computer interaction developers to extract information that is transmitted along with a spoken message such as identity, gender, age, emotional state, etc. of a speaker. Speech features are classified as either low-level or high-level characteristics. Highlevel speech features are associated with syntax, dialect, and the overall meaning of a spoken message. In contrast, low-level features such as pitch, and phonemic spectra are associated much more with the physiology of the human vocal tract. It is these lowlevel features that are also the easiest and least computationally intensive characteristics of speech to extract. Once extracted, modern speaker recognition systems attempt to fit these features best to statistical classification models. One such widely used model is the Gaussian Mixture Model (GMM). The current standard of testing of speaker recognition systems is standardized by NIST in the often updated NIST Speaker Recognition Evaluation (NIST-SRE) standard. The results measured by the tests outlined in the standard are ultimately presented as Detection Error Tradeoff (DET) curves and detection cost function scores. A new method of measuring the effects of channel impediments on the quality of identifications made by Gaussian Mixture Model based speaker recognition systems will be presented in this thesis. With the exception of the NIST-SRE, no standardized or extensive testing of speaker recognition systems in noisy channels has been conducted. Thorough testing of speaker recognition systems will be conducted in channel model simulators. Additionally, the NIST-SRE error metric will be evaluated against a new proposed metric for gauging the performance and improvements of speaker recognition systems

    Interactive Real-Time Embedded Systems Education Infused with Applied Internet Telephony

    Get PDF
    The transition from traditional circuit-switched phone systems to modern packet-based Internet telephony networks demands tools to support Voice over Internet Protocol (VoIP) development. In this paper, we introduce the XinuPhone, an integrated hardware/software approach for educating users about VoIP technology on a real-time embedded platform. We propose modular course topics for design-oriented, hands-on laboratory exercises: filter design, timing, serial communications, interrupts and resource budgeting, network transmission, and system benchmarking. Our open-source software platform encourages development and testing of new CODECs alongside existing standards, unlike similar commercial solutions. Furthermore, the supporting hardware features inexpensive, readily available components designed specifically for educational and research users on a limited budget. The XinuPhone is especially good for experimenting with design trade-offs as well as interactions between real-time software and hardware components

    Quality of service for VoIP in wireless communications

    Get PDF
    Ever since telephone services were available to the public, technologies have evolved to more efficient methods of handling phone calls. Originally circuit switched networks were a breakthrough for voice services, but today most technologies have adopted packet switched networks, improving efficiency at a cost of Quality of Service (QoS). A good example of packet switched network is the Internet, a resource created to handle data over an Internet Protocol (IP) that can handle voice services, known as the Voice over the Internet Protocol (VoIP). The combination of wireless networks and free VoIP services is very popular, however its limitations in security and network overload are still a handicap for most practical applications. This thesis investigates network performance under VoIP sessions. The aim is to compare the performance of a variety of audio codecs that diminishes the impact of VoIP in the network. Therefore the contribution of this research is twofold: To study and analyse the extension of speech quality predictors by a new speech quality model to accurately estimate whether the network can handle a VoIP session or not and to implement a new application of network coding for VoIP to increase throughput. The analysis and study of speech quality predictors is based on the mathematical model developed by the E-model. A case study of an embedded Session Initiation Protocol (SIP) proxy, merged with a Media Gateway that bridges mobile networks to wired networks has been developed to understand its effects on QoS. Experimental speech quality measurements under wired and wireless scenarios were compared with the mathematical speech predictor resulting in an extended mathematical solution of the E-model. A new speech quality model for cascaded networks was designed and implemented out of this research. Provided that each channel is modelled by a Markov Chain packet loss model the methodology can predict expected speech quality and inform the QoS manager to take action. From a data rate perspective a VoIP session has a very specific characteristic; exchanged data between two end nodes is often symmetrical. This opens up a new opportunity for centralised VoIP sessions where network coding techniques can be applied to increase throughput performance at the channel. An application layer has been implemented based on network coding, fully compatible with existing protocols and successfully achieves the network capacity.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    DeepVoCoder: A CNN model for compression and coding of narrow band speech

    Get PDF
    This paper proposes a convolutional neural network (CNN)-based encoder model to compress and code speech signal directly from raw input speech. Although the model can synthesize wideband speech by implicit bandwidth extension, narrowband is preferred for IP telephony and telecommunications purposes. The model takes time domain speech samples as inputs and encodes them using a cascade of convolutional filters in multiple layers, where pooling is applied after some layers to downsample the encoded speech by half. The final bottleneck layer of the CNN encoder provides an abstract and compact representation of the speech signal. In this paper, it is demonstrated that this compact representation is sufficient to reconstruct the original speech signal in high quality using the CNN decoder. This paper also discusses the theoretical background of why and how CNN may be used for end-to-end speech compression and coding. The complexity, delay, memory requirements, and bit rate versus quality are discussed in the experimental results.Web of Science7750897508
    corecore