8,495 research outputs found
Demonstrating the feasibility of standardized application program interfaces that will allow mobile/portable terminals to receive services combining UMTS and DVB-T
Crucial to the commercial exploitation of any service combining UMTS and DVB-T is the availability of standardized APIâs adapted to the hybrid UMTS and DVB-T network and to the technical limitations of mobile/portable terminals. This paper describes work carried out in the European Commission Framework Program 5 (FP5) project CONFLUENT to demonstrate the feasibility of such Application Program Interfaces (APIâs) by enabling the reception of a Multimedia Home Platform (MHP) based application transmitted over DVB-T on five different terminals with parts of the service running on a mobile phone
Deep Learning for Audio Signal Processing
Given the recent surge in developments of deep learning, this article
provides a review of the state-of-the-art deep learning techniques for audio
signal processing. Speech, music, and environmental sound processing are
considered side-by-side, in order to point out similarities and differences
between the domains, highlighting general methods, problems, key references,
and potential for cross-fertilization between areas. The dominant feature
representations (in particular, log-mel spectra and raw waveform) and deep
learning models are reviewed, including convolutional neural networks, variants
of the long short-term memory architecture, as well as more audio-specific
neural network models. Subsequently, prominent deep learning application areas
are covered, i.e. audio recognition (automatic speech recognition, music
information retrieval, environmental sound detection, localization and
tracking) and synthesis and transformation (source separation, audio
enhancement, generative models for speech, sound, and music synthesis).
Finally, key issues and future questions regarding deep learning applied to
audio signal processing are identified.Comment: 15 pages, 2 pdf figure
Recommended from our members
NoiseSPY: a real-time mobile phone platform for urban noise monitoring and mapping
In this paper we present the design, implementation, evaluation, and user experiences of the NoiseSpy application, our sound sensing system that turns the mobile phone into a low-cost data logger for monitoring environmental noise. It allows users to explore a city area while collaboratively visualizing noise levels in real-time. The software combines the sound levels with GPS data in order to generate a map of sound levels that were encountered during a journey. We report early findings from the trials which have been carried out by cycling couriers who were given Nokia mobile phones equipped with the NoiseSpy software to collect noise data around Cambridge city. Indications are that, not only is the functionality of this personal environmental sensing tool engaging for users, but aspects such as personalization of data, contextual information, and reflection upon both the data and its collection, are important factors in obtaining and retaining their interest
Multimodal system for recording individual-level behaviors in songbird groups
In longitudinal observations of animal groups, the goal is to identify individuals and to reliably detect their interactive behaviors including their vocalizations. However, to reliably extract individual vocalizations from their mixtures and other environmental sounds remains a serious challenge. Promising approaches are multi-modal systems that make use of animal-borne wireless sensors and that exploit the inherent signal redundancy. In this vein, we designed a modular recording system (BirdPark) that yields synchronized data streams and contains a custom software-defined radio receiver. We record pairs of songbirds with multiple cameras and microphones and record their body vibrations with custom low-power frequency-modulated (FM) radio transmitters. Our custom multi-antenna radio demodulation technique increases the signal-to-noise ratio of the received radio signals by 6 dB and reduces the signal loss rate by a factor of 87 to only 0.03% of the recording time compared to standard single-antenna demodulation techniques. Nevertheless, neither a single vibration channel nor a single sound channel is sufficient by itself to signal the complete vocal output of an individual, with each sensor modality missing on average about 3.7% of vocalizations. Our work emphasizes the need for high-quality recording systems and for multi-modal analysis of social behavior
Potential of mobile applications in human-centric production and logistics management
With the increasing market penetration of smart devices (smartphones, smartwatches, and tablets), various mobile applications (apps) have been developed to fulfill tasks in daily life. Recently, efforts have been made to develop apps to support human operators in industrial work. When apps installed on commercial devices are utilized, tasks that were formerly done purely manually or with the help of investment-intensive specific devices can be performed more efficiently and/or at a lower cost and with reduced errors. Despite their advantages, smart devices have limitations because embedded sensors (e.g., accelerometers) and components (e.g., cameras) are usually designed for nonindustrial use. Hence, validation experiments and case studies for industrial applications are needed to ensure the reliability of app usage. In this study, a systematic literature review was employed to identify the state of knowledge about the use of mobile apps in production and logistics management. The results show how apps can support human centricity based on the enabling technologies and components of smart devices. An outlook for future research and applications is provided, including the need for proper validation studies to ensure the diversity and reliability of apps and more research on psychosocial aspects of human-technology interaction
Emotions in context: examining pervasive affective sensing systems, applications, and analyses
Pervasive sensing has opened up new opportunities for measuring our feelings and understanding our behavior by monitoring our affective states while mobile. This review paper surveys pervasive affect sensing by examining and considering three major elements of affective pervasive systems, namely; âsensingâ, âanalysisâ, and âapplicationâ. Sensing investigates the different sensing modalities that are used in existing real-time affective applications, Analysis explores different approaches to emotion recognition and visualization based on different types of collected data, and Application investigates different leading areas of affective applications. For each of the three aspects, the paper includes an extensive survey of the literature and finally outlines some of challenges and future research opportunities of affective sensing in the context of pervasive computing
Facilitating In-Car Use of Multi-Context Mobile Services: The Case of Mobile Telephone Conversations
Evolutionary Speech Recognition
Automatic speech recognition systems are becoming ever more common and are increasingly deployed in more variable acoustic conditions, by very different speakers. So these systems, generally conceived in a laboratory, must be robust in order to provide optimal performance in real situations. This article explores the possibility of gaining robustness by designing speech recognition systems able to auto-modify in real time, in order to adapt to the changes of acoustic environment. As a starting point, the adaptive capacities of living organisms were considered in relation to their environment. Analogues of these mechanisms were then applied to automatic speech recognition systems. It appeared to be interesting to imagine a system adapting to the changing acoustic conditions in order to remain effective regardless of its conditions of use
- âŠ