Search CORE

5,883 research outputs found

Software Defined Media: Virtualization of Audio-Visual Services

Author: Esaki Hiroshi
Ikeda Masahiro
Kasuya Takashi
Niwa Kenta
Ogawa Keiko
Saito Shoichiro
Sone Takuro
Sunahara Hideki
Tsukada Manabu
Publication venue
Publication date: 23/02/2017
Field of study

Internet-native audio-visual services are witnessing rapid development. Among these services, object-based audio-visual services are gaining importance. In 2014, we established the Software Defined Media (SDM) consortium to target new research areas and markets involving object-based digital media and Internet-by-design audio-visual environments. In this paper, we introduce the SDM architecture that virtualizes networked audio-visual services along with the development of smart buildings and smart cities using Internet of Things (IoT) devices and smart building facilities. Moreover, we design the SDM architecture as a layered architecture to promote the development of innovative applications on the basis of rapid advancements in software-defined networking (SDN). Then, we implement a prototype system based on the architecture, present the system at an exhibition, and provide it as an SDM API to application developers at hackathons. Various types of applications are developed using the API at these events. An evaluation of SDM API access shows that the prototype SDM platform effectively provides 3D audio reproducibility and interactiveness for SDM applications.Comment: IEEE International Conference on Communications (ICC2017), Paris, France, 21-25 May 201

arXiv.org e-Print Archive

Crossref

Towards End-to-End Acoustic Localization using Deep Learning: from Audio Signal to Source Position Coordinates

Author: Macias-Guarasa Javier
Pizarro Daniel
Vera-Diaz Juan Manuel
Publication venue: 'MDPI AG'
Publication date: 29/07/2018
Field of study

This paper presents a novel approach for indoor acoustic source localization using microphone arrays and based on a Convolutional Neural Network (CNN). The proposed solution is, to the best of our knowledge, the first published work in which the CNN is designed to directly estimate the three dimensional position of an acoustic source, using the raw audio signal as the input information avoiding the use of hand crafted audio features. Given the limited amount of available localization data, we propose in this paper a training strategy based on two steps. We first train our network using semi-synthetic data, generated from close talk speech recordings, and where we simulate the time delays and distortion suffered in the signal that propagates from the source to the array of microphones. We then fine tune this network using a small amount of real data. Our experimental results show that this strategy is able to produce networks that significantly improve existing localization methods based on \textit{SRP-PHAT} strategies. In addition, our experiments show that our CNN method exhibits better resistance against varying gender of the speaker and different window sizes compared with the other methods.Comment: 18 pages, 3 figures, 8 table

arXiv.org e-Print Archive

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

RGB-D datasets using microsoft kinect or similar sensors: a survey

Author: Galili
Guan
Hu
Kolner
Mulvad
Nakazawa
Palushani
Palushani
Publication venue: Springer
Publication date: 01/01/2015
Field of study

RGB-D data has turned out to be a very useful representation of an indoor scene for solving fundamental computer vision problems. It takes the advantages of the color image that provides appearance information of an object and also the depth image that is immune to the variations in color, illumination, rotation angle and scale. With the invention of the low-cost Microsoft Kinect sensor, which was initially used for gaming and later became a popular device for computer vision, high quality RGB-D data can be acquired easily. In recent years, more and more RGB-D image/video datasets dedicated to various applications have become available, which are of great importance to benchmark the state-of-the-art. In this paper, we systematically survey popular RGB-D datasets for different applications including object recognition, scene classification, hand gesture recognition, 3D-simultaneous localization and mapping, and pose estimation. We provide the insights into the characteristics of each important dataset, and compare the popularity and the difficulty of those datasets. Overall, the main goal of this survey is to give a comprehensive description about the available RGB-D datasets and thus to guide researchers in the selection of suitable datasets for evaluating their algorithms

Northumbria Research Link

Crossref

Springer - Publisher Connector

Online Research Database In Technology

Onboard Audio and Video Processing for Secure Detection, Localization, and Tracking in Counter-UAV Applications

Author: Andrea Toma
Carlo Drioli
Gian Luca Foresti
Gianluigi Sechi
Giovanni Ferrin
Giuseppe Oliva
Niccolò Cecchinato
Publication venue
Publication date: 01/01/2022
Field of study

Nowadays, UAVs are of fundamental importance in numerous civil applications like search and rescue and military applications like monitoring and patrolling or counter-UAV where the remote UAV nodes collect sensor data. In the last case, flying UAVs collect environmental data to be used to contrast external attacks launched by adversary drones. However, due to the limited computing resources on board of the acquisition UAVs, most of the signal processing is still performed on a ground central unit where the sensor data is sent wirelessly. This poses serious security problems from malicious entities such as cyber attacks that exploit vulnerabilities at the application level. One possibility to reduce the risk is to concentrate part of the computing onboard of the remote nodes. In this context, we propose a framework where detection of nearby drones and their localization and tracking can be performed in real-time on the small computing devices mounted on board of the drones. Background subtraction is applied to the video frames for pre-processing with the objective of an on-board UAV detection using machine-vision algorithms. For the localization and tracking of the detected UAV, multi-channel acoustic signals are instead considered and DOA estimations are obtained through the MUSIC algorithm. In this work, the proposed idea is described in detail along with some experiments and, then, methods of effective implementation are provided

Archivio istituzionale della ricerca - Università degli Studi di Udine