Search CORE

714 research outputs found

Efficient Acoustic Feature Computation Using FPGAs

Author: Schmidt Erik M.
Speck Jacquelin A.
Publication venue
Publication date: 01/02/2012
Field of study

Many recent advances in music information retrieval (MIR) have been data-driven. Widespread performance evaluations on common data sets, like the annual MIREX events, have been instrumental in advancing the field. Such endeavors incur large computational costs and could potentially benefit from faster calculation of acoustic features. Traditional cluster-based solutions are expensive and space- and power inefficient. The massively parallel architecture of the field programmable gate array (FPGA) makes it possible to design lower-cost, applicationspecific chips rivaling cluster speed for large-scale acoustic feature computation. Such devices also show potential for implementations of MIR systems on embedded devices where hardware acceleration is a necessity. We present a prototype Xilinx System Generator (XSG) library for acoustic feature calculation. We use a genre classification task to compare the performance of simulated hardware features to those computed using standard methods. Finally, we discuss ongoing efforts toward a working hardware design

Drexel Libraries E-Repository and Archives

Efficient audio signal processing for embedded systems

Author: Chiu Leung Kin
Publication venue: Georgia Institute of Technology
Publication date: 21/05/2012
Field of study

We investigated two design strategies that would allow us to efficiently process audio signals on embedded systems such as mobile phones and portable electronics. In the first strategy, we exploit properties of the human auditory system to process audio signals. We designed a sound enhancement algorithm to make piezoelectric loudspeakers sound "richer" and "fuller," using a combination of bass extension and dynamic range compression. We also developed an audio energy reduction algorithm for loudspeaker power management by suppressing signal energy below the masking threshold. In the second strategy, we use low-power analog circuits to process the signal before digitizing it. We designed an analog front-end for sound detection and implemented it on a field programmable analog array (FPAA). The sound classifier front-end can be used in a wide range of applications because programmable floating-gate transistors are employed to store classifier weights. Moreover, we incorporated a feature selection algorithm to simplify the analog front-end. A machine learning algorithm AdaBoost is used to select the most relevant features for a particular sound detection application. We also designed the circuits to implement the AdaBoost-based analog classifier.PhDCommittee Chair: Anderson, David; Committee Member: Hasler, Jennifer; Committee Member: Hunt, William; Committee Member: Lanterman, Aaron; Committee Member: Minch, Bradle

Scholarly Materials And Research @ Georgia Tech

Deep Neural Networks

Author
Publication venue: Springer
Publication date
Field of study

Springer - Publisher Connector

Deep Neural Networks

Author: Mariette Awad
Rahul Khanna
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Springer - Publisher Connector

Automatic annotation of musical audio for interactive applications

Author: Brossier Paul M.
Publication venue
Publication date: 01/01/2006
Field of study

PhDAs machines become more and more portable, and part of our everyday life, it becomes apparent that developing interactive and ubiquitous systems is an important aspect of new music applications created by the research community. We are interested in developing a robust layer for the automatic annotation of audio signals, to be used in various applications, from music search engines to interactive installations, and in various contexts, from embedded devices to audio content servers. We propose adaptations of existing signal processing techniques to a real time context. Amongst these annotation techniques, we concentrate on low and mid-level tasks such as onset detection, pitch tracking, tempo extraction and note modelling. We present a framework to extract these annotations and evaluate the performances of different algorithms. The first task is to detect onsets and offsets in audio streams within short latencies. The segmentation of audio streams into temporal objects enables various manipulation and analysis of metrical structure. Evaluation of different algorithms and their adaptation to real time are described. We then tackle the problem of fundamental frequency estimation, again trying to reduce both the delay and the computational cost. Different algorithms are implemented for real time and experimented on monophonic recordings and complex signals. Spectral analysis can be used to label the temporal segments; the estimation of higher level descriptions is approached. Techniques for modelling of note objects and localisation of beats are implemented and discussed. Applications of our framework include live and interactive music installations, and more generally tools for the composers and sound engineers. Speed optimisations may bring a significant improvement to various automated tasks, such as automatic classification and recommendation systems. We describe the design of our software solution, for our research purposes and in view of its integration within other systems.EU-FP6-IST-507142 project SIMAC (Semantic Interaction with Music Audio Contents); EPSRC grants GR/R54620; GR/S75802/01

CiteSeerX

Queen Mary Research Online

A COMPUTATION METHOD/FRAMEWORK FOR HIGH LEVEL VIDEO CONTENT ANALYSIS AND SEGMENTATION USING AFFECTIVE LEVEL INFORMATION

Author: Arifin Sutjipoto
Arifin Sutjipoto
Publication venue: Electrical & Electronic Engineering, Imperial College London
Publication date: 01/10/2008
Field of study

VIDEO segmentation facilitates e±cient video indexing and navigation in large digital video archives. It is an important process in a content-based video indexing and retrieval (CBVIR) system. Many automated solutions performed seg- mentation by utilizing information about the \facts" of the video. These \facts" come in the form of labels that describe the objects which are captured by the cam- era. This type of solutions was able to achieve good and consistent results for some video genres such as news programs and informational presentations. The content format of this type of videos is generally quite standard, and automated solutions were designed to follow these format rules. For example in [1], the presence of news anchor persons was used as a cue to determine the start and end of a meaningful news segment. The same cannot be said for video genres such as movies and feature films. This is because makers of this type of videos utilized different filming techniques to design their videos in order to elicit certain affective response from their targeted audience. Humans usually perform manual video segmentation by trying to relate changes in time and locale to discontinuities in meaning [2]. As a result, viewers usually have doubts about the boundary locations of a meaningful video segment due to their different affective responses. This thesis presents an entirely new view to the problem of high level video segmentation. We developed a novel probabilistic method for affective level video content analysis and segmentation. Our method had two stages. In the first stage, a®ective content labels were assigned to video shots by means of a dynamic bayesian 0. Abstract 3 network (DBN). A novel hierarchical-coupled dynamic bayesian network (HCDBN) topology was proposed for this stage. The topology was based on the pleasure- arousal-dominance (P-A-D) model of a®ect representation [3]. In principle, this model can represent a large number of emotions. In the second stage, the visual, audio and a®ective information of the video was used to compute a statistical feature vector to represent the content of each shot. Affective level video segmentation was achieved by applying spectral clustering to the feature vectors. We evaluated the first stage of our proposal by comparing its emotion detec- tion ability with all the existing works which are related to the field of a®ective video content analysis. To evaluate the second stage, we used the time adaptive clustering (TAC) algorithm as our performance benchmark. The TAC algorithm was the best high level video segmentation method [2]. However, it is a very computationally intensive algorithm. To accelerate its computation speed, we developed a modified TAC (modTAC) algorithm which was designed to be mapped easily onto a field programmable gate array (FPGA) device. Both the TAC and modTAC algorithms were used as performance benchmarks for our proposed method. Since affective video content is a perceptual concept, the segmentation per- formance and human agreement rates were used as our evaluation criteria. To obtain our ground truth data and viewer agreement rates, a pilot panel study which was based on the work of Gross et al. [4] was conducted. Experiment results will show the feasibility of our proposed method. For the first stage of our proposal, our experiment results will show that an average improvement of as high as 38% was achieved over previous works. As for the second stage, an improvement of as high as 37% was achieved over the TAC algorithm

Spiral - Imperial College Digital Repository

Recommended from our members

Implementation of Deep Learning Models on an SoC-FPGA Device for Real-Time Music Genre Classification

Author: Cretu I
Faizan M
Intzes I
Meng H
Publication venue: MDPI
Publication date: 10/07/2023
Field of study

Data Availability Statement: https://www.kaggle.com/datasets/andradaolteanu/gtzan-dataset-music-genre-classification (accessed on 30 June 2023).Copyright © 2023 by the authors. Deep neutral networks (DNNs) are complex machine learning models designed for decision-making tasks with high accuracy. However, DNNs require high computational power and memory, which limits such models to fitting on edge devices, resulting in unnecessary processing delays and high energy consumption. Graphical processing units (GPUs) offer reliable hardware acceleration, but their bulky sizes prevent their utilization in portable equipment. System-on-chip field programmable gated arrays (SoC-FPGAs) provide considerable computational power with low energy consumption, making them ideal for edge computing applications, owing to their innovative, flexible, and small design. In this paper, we implement a deep-learning-based music genre classification system on a SoC-FPGA board, evaluate the model’s performance, and provide a comparative analysis across different platforms. Specifically, we compare the performance of long short-term memory (LSTM), convolutional neural networks (CNNs), and a hybrid model (CNN-LSTM) on an Intel Core i7-8550U by Intel Cooperation. The models are fed an acoustic feature called the Mel-frequency cepstral coefficient (MFCC) for training and testing (inference). Then, by using the advanced Vitis AI tool, a deployable version of the model is generated. The experimental results show that the execution speed is increased by 80%, and the throughput rises four times when the CNN-based music genre classification system is implemented on SoC-FPGA.British Heart Foundation (The development of a sophisticated cardiac pacing simulator: a training tool to enhance the management of post cardiac surgical patient care) under grant number FS/19/73/34690

Brunel University Research Archive

Accelerating Audio Data Analysis with In-Network Computing

Author: Wu Huanzhuo
Publication venue
Publication date: 19/07/2023
Field of study

Digital transformation will experience massive connections and massive data handling. This will imply a growing demand for computing in communication networks due to network softwarization. Moreover, digital transformation will host very sensitive verticals, requiring high end-to-end reliability and low latency. Accordingly, the emerging concept “in-network computing” has been arising. This means integrating the network communications with computing and also performing computations on the transport path of the network. This can be used to deliver actionable information directly to end users instead of raw data. However, this change of paradigm to in-network computing raises disruptive challenges to the current communication networks. In-network computing (i) expects the network to host general-purpose softwarized network functions and (ii) encourages the packet payload to be modified. Yet, today’s networks are designed to focus on packet forwarding functions, and packet payloads should not be touched in the forwarding path, under the current end-to-end transport mechanisms. This dissertation presents fullstack in-network computing solutions, jointly designed from network and computing perspectives to accelerate data analysis applications, specifically for acoustic data analysis. In the computing domain, two design paradigms of computational logic, namely progressive computing and traffic filtering, are proposed in this dissertation for data reconstruction and feature extraction tasks. Two widely used practical use cases, Blind Source Separation (BSS) and anomaly detection, are selected to demonstrate the design of computing modules for data reconstruction and feature extraction tasks in the in-network computing scheme, respectively. Following these two design paradigms of progressive computing and traffic filtering, this dissertation designs two computing modules: progressive ICA (pICA) and You only hear once (Yoho) for BSS and anomaly detection, respectively. These lightweight computing modules can cooperatively perform computational tasks along the forwarding path. In this way, computational virtual functions can be introduced into the network, addressing the first challenge mentioned above, namely that the network should be able to host general-purpose softwarized network functions. In this dissertation, quantitative simulations have shown that the computing time of pICA and Yoho in in-network computing scenarios is significantly reduced, since pICA and Yoho are performed, simultaneously with the data forwarding. At the same time, pICA guarantees the same computing accuracy, and Yoho’s computing accuracy is improved. Furthermore, this dissertation proposes a stateful transport module in the network domain to support in-network computing under the end-to-end transport architecture. The stateful transport module extends the IP packet header, so that network packets carry message-related metadata (message-based packaging). Additionally, the forwarding layer of the network device is optimized to be able to process the packet payload based on the computational state (state-based transport component). The second challenge posed by in-network computing has been tackled by supporting the modification of packet payloads. The two computational modules mentioned above and the stateful transport module form the designed in-network computing solutions. By merging pICA and Yoho with the stateful transport module, respectively, two emulation systems, i.e., in-network pICA and in-network Yoho, have been implemented in the Communication Networks Emulator (ComNetsEmu). Through quantitative emulations, the experimental results showed that in-network pICA accelerates the overall service time of BSS by up to 32.18%. On the other hand, using in-network Yoho accelerates the overall service time of anomaly detection by a maximum of 30.51%. These are promising results for the design and actual realization of future communication networks

Technische Universität Dresden: Qucosa

Design of large polyphase filters in the Quadratic Residue Number System

Author: Cardarilli G
Nannarelli A
Oster Y
Petricca M
Re M
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2010
Field of study

Crossref

ART

Online Research Database In Technology