202 research outputs found
Fog Computing in Medical Internet-of-Things: Architecture, Implementation, and Applications
In the era when the market segment of Internet of Things (IoT) tops the chart
in various business reports, it is apparently envisioned that the field of
medicine expects to gain a large benefit from the explosion of wearables and
internet-connected sensors that surround us to acquire and communicate
unprecedented data on symptoms, medication, food intake, and daily-life
activities impacting one's health and wellness. However, IoT-driven healthcare
would have to overcome many barriers, such as: 1) There is an increasing demand
for data storage on cloud servers where the analysis of the medical big data
becomes increasingly complex, 2) The data, when communicated, are vulnerable to
security and privacy issues, 3) The communication of the continuously collected
data is not only costly but also energy hungry, 4) Operating and maintaining
the sensors directly from the cloud servers are non-trial tasks. This book
chapter defined Fog Computing in the context of medical IoT. Conceptually, Fog
Computing is a service-oriented intermediate layer in IoT, providing the
interfaces between the sensors and cloud servers for facilitating connectivity,
data transfer, and queryable local database. The centerpiece of Fog computing
is a low-power, intelligent, wireless, embedded computing node that carries out
signal conditioning and data analytics on raw data collected from wearables or
other medical sensors and offers efficient means to serve telehealth
interventions. We implemented and tested an fog computing system using the
Intel Edison and Raspberry Pi that allows acquisition, computing, storage and
communication of the various medical data such as pathological speech data of
individuals with speech disorders, Phonocardiogram (PCG) signal for heart rate
estimation, and Electrocardiogram (ECG)-based Q, R, S detection.Comment: 29 pages, 30 figures, 5 tables. Keywords: Big Data, Body Area
Network, Body Sensor Network, Edge Computing, Fog Computing, Medical
Cyberphysical Systems, Medical Internet-of-Things, Telecare, Tele-treatment,
Wearable Devices, Chapter in Handbook of Large-Scale Distributed Computing in
Smart Healthcare (2017), Springe
Non-linear analysis of cello pitch and timbre
Thesis (M.S.)--Massachusetts Institute of Technology, Dept. of Architecture, 1992.Includes bibliographical references (leaves 62-65).by Andrew Choon-ki Hong.M.S
Portfolio of Compositions with Commentaries
This portfolio analyses the creative means by which a number of audio and visual
compositions were realised. It attempts to dissect the influential factors in the
creation of such pieces and to explore the technological processes involved in the
creation of such work. It is a personal analysis of a body of work which represents
a hybrid of influences, spanning several years. It is supported by three DVDs,
which contain audio and visual material and software files which were used in the
composition and performance of these works
Automatic annotation of musical audio for interactive applications
PhDAs machines become more and more portable, and part of our everyday life, it becomes
apparent that developing interactive and ubiquitous systems is an important
aspect of new music applications created by the research community. We are interested
in developing a robust layer for the automatic annotation of audio signals, to
be used in various applications, from music search engines to interactive installations,
and in various contexts, from embedded devices to audio content servers. We
propose adaptations of existing signal processing techniques to a real time context.
Amongst these annotation techniques, we concentrate on low and mid-level tasks
such as onset detection, pitch tracking, tempo extraction and note modelling. We
present a framework to extract these annotations and evaluate the performances of
different algorithms.
The first task is to detect onsets and offsets in audio streams within short latencies.
The segmentation of audio streams into temporal objects enables various
manipulation and analysis of metrical structure. Evaluation of different algorithms
and their adaptation to real time are described. We then tackle the problem of
fundamental frequency estimation, again trying to reduce both the delay and the
computational cost. Different algorithms are implemented for real time and experimented
on monophonic recordings and complex signals. Spectral analysis can be
used to label the temporal segments; the estimation of higher level descriptions is
approached. Techniques for modelling of note objects and localisation of beats are
implemented and discussed.
Applications of our framework include live and interactive music installations,
and more generally tools for the composers and sound engineers. Speed optimisations
may bring a significant improvement to various automated tasks, such as
automatic classification and recommendation systems. We describe the design of
our software solution, for our research purposes and in view of its integration within
other systems.EU-FP6-IST-507142 project SIMAC (Semantic Interaction with Music
Audio Contents);
EPSRC grants GR/R54620; GR/S75802/01
Portfolio of Compositions with Commentaries
This portfolio analyses the creative means by which a number of audio and visual
compositions were realised. It attempts to dissect the influential factors in the
creation of such pieces and to explore the technological processes involved in the
creation of such work. It is a personal analysis of a body of work which represents
a hybrid of influences, spanning several years. It is supported by three DVDs,
which contain audio and visual material and software files which were used in the
composition and performance of these works
Proceedings of the Linux Audio Conference 2018
These proceedings contain all papers presented at the Linux Audio Conference 2018. The conference took place at c-base, Berlin, from June 7th - 10th, 2018 and was organized in cooperation with the Electronic Music Studio at TU Berlin
Perceptual strategies in active and passive hearing of neotropical bats
Basic spectral and temporal sound properties, such as frequency content and timing, are evaluated by the auditory system to build an internal representation of the external world and to generate auditory guided behaviour. Using echolocating bats as model system, I investigated aspects of spectral and temporal processing during echolocation and in relation to passive listening, and the echo-acoustic object recognition for navigation.
In the first project (chapter 2), the spectral processing during passive and active hearing was compared in the echolocting bat Phyllostomus discolor. Sounds are ubiquitously used for many vital behaviours, such as communication, predator and prey detection, or echolocation.
The frequency content of a sound is one major component for the correct perception of the transmitted information, but it is distorted while travelling from the sound source to the receiver. In order to correctly determine the frequency content of an acoustic signal, the receiver needs to compensate for these distortions. We first investigated whether P. discolor compensates for distortions of the spectral shape of transmitted sounds during passive listening. Bats were trained to discriminate lowpass filtered from highpass filtered acoustic impulses, while hearing a continuous white noise background with a flat spectral shape. We then assessed their spontaneous classification of acoustic impulses with varying spectral content depending on the background’s spectral shape (flat or lowpass filtered). Lowpass filtered noise background increased the proportion of highpass classifications of the same filtered impulses, compared to white noise background. Like humans, the bats thus compensated for the background’s spectral shape. In an active-acoustic version of the identical experiment, the bats had to classify filtered playbacks of their emitted echolocation calls instead of passively presented impulses. During echolocation, the classification of the filtered echoes was independent of the spectral shape of the passively presented background noise. Likewise, call structure did not change to compensate for the background’s spectral shape. Hence, auditory processing differs between passive and active hearing, with echolocation representing an independent mode with its own rules of auditory spectral analysis.
The second project (chapter 3) was concerned with the accurate measurement of the time of occurrence of auditory signals, and as such also distance in echolocation. In addition, the importance of passive listening compared to echolocation turned out to be an unexpected factor in this study. To measure the distance to objects, called ranging, bats measure the time delay between an outgoing call and its returning echo. Ranging accuracy received considerable interest in echolocation research for several reasons: (i) behaviourally, it is of importance for the bat’s ability to locate objects and navigate its surrounding, (ii) physiologically, the neuronal implementation of precise measurements of very short time intervals is a challenge and (iii) the conjectured echo-acoustic receiver of bats is of interest for signal processing. Here, I trained the nectarivorous bat Glossophaga soricina to detect a jittering real target and found a biologically plausible distance accuracy of 4–7 mm, corresponding to a temporal accuracy of 20–40 μs. However, presumably all bats did not learn to use the jittering echo delay as the first and most prominent cue, but relied on passive acoustic listening first, which could only be prevented by the playback of masking noise. This shows that even a non-gleaning bat heavily relies on passive acoustic cues and that the measuring of short time intervals is difficult. This result questions other studies
reporting a sub-microsecond time jitter threshold.
The third project (chapter 4) linked the perception of echo-acoustic stimuli to the appropriate behavioural reactions, namely evasive flight manoeuvres around virtual objects presented in the flight paths of wild, untrained bats. Echolocating bats are able to orient in complete darkness only by analysing the echoes of their emitted calls. They detect, recognize and classify objects based on the spectro-temporal reflection pattern received at the two ears. Auditory object analysis, however, is inevitably more complicated than visual object analysis, because the one-dimensional acoustic time signal only transmits range information, i.e., the object’s distance and its longitudinal extent. All other object dimensions like width and height have to be inferred from comparative analysis of the signals at both ears and over time. The purpose of this study was to measure perceived object dimensions in wild, experimentally naïve bats by video-recording and analysing the bats’ evasive flight manoeuvres in response to the presentation of virtual echo-acoustic objects with independently manipulated acoustic parameters. Flight manoeuvres were analysed by extracting the flight paths of all passing bats. As a control to our method, we also recorded the flight paths of bats in response to a real object. Bats avoided the real object by flying around it. However, we did not find any flight path changes in response to the presentation of several virtual objects. We assume that the missing spatial extent of virtual echo-acoustic objects, due to playback from only one loudspeaker, was the main reason for the failure to evoke evasive flight manoeuvres. This study therefore emphasises for the first time the importance of the spatial dimension of virtual objects, which were up to now neglected in virtual object presentations
Deep Learning Methods for Instrument Separation and Recognition
This thesis explores deep learning methods for timbral information processing in polyphonic music analysis. It encompasses two primary tasks: Music Source Separation (MSS) and Instrument Recognition, with focus on applying domain knowledge and utilising dense arrangements of skip-connections in the frameworks in order to reduce the number of trainable parameters and create more efficient models. Musically-motivated Convolutional Neural Network (CNN) architectures are introduced, emphasizing kernels with vertical, square, and horizontal shapes. This design choice allows for the extraction of essential harmonic and percussive features, which enhances the discrimination of different instruments. Notably, this methodology proves valuable for Harmonic-Percussive Source Separation (HPSS) and instrument recognition tasks. A significant challenge in MSS is generalising to new instrument types and music styles. To address this, a versatile framework for adversarial unsupervised domain adaptation for source separation is proposed, particularly beneficial when labeled data for specific instruments is unavailable. The curation of the Tap & Fiddle dataset is another contribution of the research, offering mixed and isolated stem recordings of traditional Scandinavian fiddle tunes, along with foot-tapping accompaniments, fostering research in source separation and metrical expression analysis within these musical styles. Since our perception of timbre is affected in different ways by transient and stationary parts of sound, the research investigates the potential of Transient Stationary-Noise Decomposition (TSND) as a preprocessing step for frame-level recognition. A method that performs TSND of spectrograms and feeds the decomposed spectrograms to a neural classifier is proposed. Furthermore, this thesis introduces a novel deep learning-based approach for pitch streaming, treating the task as a note-level instrument classification. Such an approach is modular, meaning that it can also successfully stream predicted note-events and not only labelled ground truth note-event information to corresponding instruments. Therefore, the proposed pitch streaming method enables third-party multi-pitch estimation algorithms to perform multi-instrument AMT
RADIC Voice Authentication: Replay Attack Detection using Image Classification for Voice Authentication Systems
Systems like Google Home, Alexa, and Siri that use voice-based authentication to verify their users’ identities are vulnerable to voice replay attacks. These attacks gain unauthorized access to voice-controlled devices or systems by replaying recordings of passphrases and voice commands. This shows the necessity to develop more resilient voice-based authentication systems that can detect voice replay attacks.
This thesis implements a system that detects voice-based replay attacks by using deep learning and image classification of voice spectrograms to differentiate between live and recorded speech. Tests of this system indicate that the approach represents a promising direction for detecting voice-based replay attacks
- …