714 research outputs found
Efficient Acoustic Feature Computation Using FPGAs
Many recent advances in music information retrieval (MIR) have been data-driven. Widespread performance evaluations on common data sets, like the annual MIREX events, have been instrumental in advancing the field. Such endeavors incur large computational costs and could potentially benefit from faster calculation of acoustic features. Traditional cluster-based solutions are expensive and space- and power inefficient. The massively parallel architecture of the field programmable gate array (FPGA) makes it possible to design lower-cost, applicationspecific chips rivaling cluster speed for large-scale acoustic feature computation. Such devices also show potential for implementations of MIR systems on embedded devices where hardware acceleration is a necessity. We present a prototype Xilinx System Generator (XSG) library for acoustic feature calculation. We use a genre classification task to compare the performance of simulated hardware features to those computed using standard methods. Finally, we discuss ongoing efforts toward a working hardware design
Efficient audio signal processing for embedded systems
We investigated two design strategies that would allow us to efficiently process audio signals on embedded systems such as mobile phones and portable electronics. In the first strategy, we exploit properties of the human auditory system to process audio signals. We designed a sound enhancement algorithm to make piezoelectric loudspeakers sound "richer" and "fuller," using a combination of bass extension and dynamic range compression. We also developed an audio energy reduction algorithm for loudspeaker power management by suppressing signal energy below the masking threshold. In the second strategy, we use low-power analog circuits to process the signal before digitizing it. We designed an analog front-end for sound detection and implemented it on a field programmable analog array (FPAA). The sound classifier front-end can be used in a wide range of applications because programmable floating-gate transistors are employed to store classifier weights. Moreover, we incorporated a feature selection algorithm to simplify the analog front-end. A machine learning algorithm AdaBoost is used to select the most relevant features for a particular sound detection application. We also designed the circuits to implement the AdaBoost-based analog classifier.PhDCommittee Chair: Anderson, David; Committee Member: Hasler, Jennifer; Committee Member: Hunt, William; Committee Member: Lanterman, Aaron; Committee Member: Minch, Bradle
Automatic annotation of musical audio for interactive applications
PhDAs machines become more and more portable, and part of our everyday life, it becomes
apparent that developing interactive and ubiquitous systems is an important
aspect of new music applications created by the research community. We are interested
in developing a robust layer for the automatic annotation of audio signals, to
be used in various applications, from music search engines to interactive installations,
and in various contexts, from embedded devices to audio content servers. We
propose adaptations of existing signal processing techniques to a real time context.
Amongst these annotation techniques, we concentrate on low and mid-level tasks
such as onset detection, pitch tracking, tempo extraction and note modelling. We
present a framework to extract these annotations and evaluate the performances of
different algorithms.
The first task is to detect onsets and offsets in audio streams within short latencies.
The segmentation of audio streams into temporal objects enables various
manipulation and analysis of metrical structure. Evaluation of different algorithms
and their adaptation to real time are described. We then tackle the problem of
fundamental frequency estimation, again trying to reduce both the delay and the
computational cost. Different algorithms are implemented for real time and experimented
on monophonic recordings and complex signals. Spectral analysis can be
used to label the temporal segments; the estimation of higher level descriptions is
approached. Techniques for modelling of note objects and localisation of beats are
implemented and discussed.
Applications of our framework include live and interactive music installations,
and more generally tools for the composers and sound engineers. Speed optimisations
may bring a significant improvement to various automated tasks, such as
automatic classification and recommendation systems. We describe the design of
our software solution, for our research purposes and in view of its integration within
other systems.EU-FP6-IST-507142 project SIMAC (Semantic Interaction with Music
Audio Contents);
EPSRC grants GR/R54620; GR/S75802/01
A COMPUTATION METHOD/FRAMEWORK FOR HIGH LEVEL VIDEO CONTENT ANALYSIS AND SEGMENTATION USING AFFECTIVE LEVEL INFORMATION
VIDEO segmentation facilitates e±cient video indexing and navigation in large
digital video archives. It is an important process in a content-based video
indexing and retrieval (CBVIR) system. Many automated solutions performed seg-
mentation by utilizing information about the \facts" of the video. These \facts"
come in the form of labels that describe the objects which are captured by the cam-
era. This type of solutions was able to achieve good and consistent results for some
video genres such as news programs and informational presentations. The content
format of this type of videos is generally quite standard, and automated solutions
were designed to follow these format rules. For example in [1], the presence of news
anchor persons was used as a cue to determine the start and end of a meaningful
news segment.
The same cannot be said for video genres such as movies and feature films.
This is because makers of this type of videos utilized different filming techniques to
design their videos in order to elicit certain affective response from their targeted
audience. Humans usually perform manual video segmentation by trying to relate
changes in time and locale to discontinuities in meaning [2]. As a result, viewers
usually have doubts about the boundary locations of a meaningful video segment
due to their different affective responses.
This thesis presents an entirely new view to the problem of high level video
segmentation. We developed a novel probabilistic method for affective level video
content analysis and segmentation. Our method had two stages. In the first stage,
a®ective content labels were assigned to video shots by means of a dynamic bayesian
0. Abstract 3
network (DBN). A novel hierarchical-coupled dynamic bayesian network (HCDBN)
topology was proposed for this stage. The topology was based on the pleasure-
arousal-dominance (P-A-D) model of a®ect representation [3]. In principle, this
model can represent a large number of emotions. In the second stage, the visual,
audio and a®ective information of the video was used to compute a statistical feature
vector to represent the content of each shot. Affective level video segmentation was
achieved by applying spectral clustering to the feature vectors.
We evaluated the first stage of our proposal by comparing its emotion detec-
tion ability with all the existing works which are related to the field of a®ective video
content analysis. To evaluate the second stage, we used the time adaptive clustering
(TAC) algorithm as our performance benchmark. The TAC algorithm was the best
high level video segmentation method [2]. However, it is a very computationally
intensive algorithm. To accelerate its computation speed, we developed a modified
TAC (modTAC) algorithm which was designed to be mapped easily onto a field
programmable gate array (FPGA) device. Both the TAC and modTAC algorithms
were used as performance benchmarks for our proposed method.
Since affective video content is a perceptual concept, the segmentation per-
formance and human agreement rates were used as our evaluation criteria. To obtain
our ground truth data and viewer agreement rates, a pilot panel study which was
based on the work of Gross et al. [4] was conducted. Experiment results will show
the feasibility of our proposed method. For the first stage of our proposal, our
experiment results will show that an average improvement of as high as 38% was
achieved over previous works. As for the second stage, an improvement of as high
as 37% was achieved over the TAC algorithm
Recommended from our members
Implementation of Deep Learning Models on an SoC-FPGA Device for Real-Time Music Genre Classification
Data Availability Statement:
https://www.kaggle.com/datasets/andradaolteanu/gtzan-dataset-music-genre-classification (accessed on 30 June 2023).Copyright © 2023 by the authors. Deep neutral networks (DNNs) are complex machine learning models designed for decision-making tasks with high accuracy. However, DNNs require high computational power and memory, which limits such models to fitting on edge devices, resulting in unnecessary processing delays and high energy consumption. Graphical processing units (GPUs) offer reliable hardware acceleration, but their bulky sizes prevent their utilization in portable equipment. System-on-chip field programmable gated arrays (SoC-FPGAs) provide considerable computational power with low energy consumption, making them ideal for edge computing applications, owing to their innovative, flexible, and small design. In this paper, we implement a deep-learning-based music genre classification system on a SoC-FPGA board, evaluate the model’s performance, and provide a comparative analysis across different platforms. Specifically, we compare the performance of long short-term memory (LSTM), convolutional neural networks (CNNs), and a hybrid model (CNN-LSTM) on an Intel Core i7-8550U by Intel Cooperation. The models are fed an acoustic feature called the Mel-frequency cepstral coefficient (MFCC) for training and testing (inference). Then, by using the advanced Vitis AI tool, a deployable version of the model is generated. The experimental results show that the execution speed is increased by 80%, and the throughput rises four times when the CNN-based music genre classification system is implemented on SoC-FPGA.British Heart Foundation (The development of a sophisticated cardiac pacing simulator: a training tool to enhance the management of post cardiac surgical patient care) under grant number FS/19/73/34690
Accelerating Audio Data Analysis with In-Network Computing
Digital transformation will experience massive connections and massive data handling. This will imply a growing demand for computing in communication networks due to network softwarization. Moreover, digital transformation will host very sensitive verticals, requiring high end-to-end reliability and low latency. Accordingly, the emerging concept “in-network computing” has been arising. This means integrating the network communications with computing and also performing computations on the transport path of the network. This can be used to deliver actionable information directly to end users instead of raw data.
However, this change of paradigm to in-network computing raises disruptive challenges to the current communication networks. In-network computing (i) expects the network to host general-purpose softwarized network functions and (ii) encourages the packet payload to be modified. Yet, today’s networks are designed to focus on packet forwarding functions, and packet payloads should not be touched in the forwarding path, under the current end-to-end transport mechanisms. This dissertation presents fullstack in-network computing solutions, jointly designed from network and computing perspectives to accelerate data analysis applications, specifically for acoustic data analysis.
In the computing domain, two design paradigms of computational logic, namely progressive computing and traffic filtering, are proposed in this dissertation for data reconstruction and feature extraction tasks. Two widely used practical use cases, Blind Source Separation (BSS) and anomaly detection, are selected to demonstrate the design of computing modules for data reconstruction and feature extraction tasks in the in-network computing scheme, respectively. Following these two design paradigms of progressive computing and traffic filtering, this dissertation designs two computing modules: progressive ICA (pICA) and You only hear once (Yoho) for BSS and anomaly detection, respectively. These lightweight computing modules can cooperatively perform computational tasks along the forwarding path. In this way, computational virtual functions can be introduced into the network, addressing the first challenge mentioned above, namely that the network should be able to host general-purpose softwarized network functions. In this dissertation, quantitative simulations have shown that the computing time of pICA and Yoho in in-network computing scenarios is significantly reduced, since pICA and Yoho are performed, simultaneously with the data forwarding. At the same time, pICA guarantees the same computing accuracy, and Yoho’s computing accuracy is improved.
Furthermore, this dissertation proposes a stateful transport module in the network domain to support in-network computing under the end-to-end transport architecture. The stateful transport module extends the IP packet header, so that network packets carry message-related metadata (message-based packaging). Additionally, the forwarding layer of the network device is optimized to be able to process the packet payload based on the computational state (state-based transport component). The second challenge posed by in-network computing has been tackled by supporting the modification of packet payloads.
The two computational modules mentioned above and the stateful transport module form the designed in-network computing solutions. By merging pICA and Yoho with the stateful transport module, respectively, two emulation systems, i.e., in-network pICA and in-network Yoho, have been implemented in the Communication Networks Emulator (ComNetsEmu). Through quantitative emulations, the experimental results showed that in-network pICA accelerates the overall service time of BSS by up to 32.18%. On the other hand, using in-network Yoho accelerates the overall service time of anomaly detection by a maximum of 30.51%. These are promising results for the design and actual realization of future communication networks
- …