Search CORE

24 research outputs found

Multi-sensor data fusion in mobile devices for the identification of Activities of Daily Living

Author: Pires Ivan Miguel Serrano
Publication venue
Publication date: 01/11/2018
Field of study

Following the recent advances in technology and the growing use of mobile devices such as smartphones, several solutions may be developed to improve the quality of life of users in the context of Ambient Assisted Living (AAL). Mobile devices have different available sensors, e.g., accelerometer, gyroscope, magnetometer, microphone and Global Positioning System (GPS) receiver, which allow the acquisition of physical and physiological parameters for the recognition of different Activities of Daily Living (ADL) and the environments in which they are performed. The definition of ADL includes a well-known set of tasks, which include basic selfcare tasks, based on the types of skills that people usually learn in early childhood, including feeding, bathing, dressing, grooming, walking, running, jumping, climbing stairs, sleeping, watching TV, working, listening to music, cooking, eating and others. On the context of AAL, some individuals (henceforth called user or users) need particular assistance, either because the user has some sort of impairment, or because the user is old, or simply because users need/want to monitor their lifestyle. The research and development of systems that provide a particular assistance to people is increasing in many areas of application. In particular, in the future, the recognition of ADL will be an important element for the development of a personal digital life coach, providing assistance to different types of users. To support the recognition of ADL, the surrounding environments should be also recognized to increase the reliability of these systems. The main focus of this Thesis is the research on methods for the fusion and classification of the data acquired by the sensors available in off-the-shelf mobile devices in order to recognize ADL in almost real-time, taking into account the large diversity of the capabilities and characteristics of the mobile devices available in the market. In order to achieve this objective, this Thesis started with the review of the existing methods and technologies to define the architecture and modules of the method for the identification of ADL. With this review and based on the knowledge acquired about the sensors available in off-the-shelf mobile devices, a set of tasks that may be reliably identified was defined as a basis for the remaining research and development to be carried out in this Thesis. This review also identified the main stages for the development of a new method for the identification of the ADL using the sensors available in off-the-shelf mobile devices; these stages are data acquisition, data processing, data cleaning, data imputation, feature extraction, data fusion and artificial intelligence. One of the challenges is related to the different types of data acquired from the different sensors, but other challenges were found, including the presence of environmental noise, the positioning of the mobile device during the daily activities, the limited capabilities of the mobile devices and others. Based on the acquired data, the processing was performed, implementing data cleaning and feature extraction methods, in order to define a new framework for the recognition of ADL. The data imputation methods were not applied, because at this stage of the research their implementation does not have influence in the results of the identification of the ADL and environments, as the features are extracted from a set of data acquired during a defined time interval and there are no missing values during this stage. The joint selection of the set of usable sensors and the identifiable set of tasks will then allow the development of a framework that, considering multi-sensor data fusion technologies and context awareness, in coordination with other information available from the user context, such as his/her agenda and the time of the day, will allow to establish a profile of the tasks that the user performs in a regular activity day. The classification method and the algorithm for the fusion of the features for the recognition of ADL and its environments needs to be deployed in a machine with some computational power, while the mobile device that will use the created framework, can perform the identification of the ADL using a much less computational power. Based on the results reported in the literature, the method chosen for the recognition of the ADL is composed by three variants of Artificial Neural Networks (ANN), including simple Multilayer Perceptron (MLP) networks, Feedforward Neural Networks (FNN) with Backpropagation, and Deep Neural Networks (DNN). Data acquisition can be performed with standard methods. After the acquisition, the data must be processed at the data processing stage, which includes data cleaning and feature extraction methods. The data cleaning method used for motion and magnetic sensors is the low pass filter, in order to reduce the noise acquired; but for the acoustic data, the Fast Fourier Transform (FFT) was applied to extract the different frequencies. When the data is clean, several features are then extracted based on the types of sensors used, including the mean, standard deviation, variance, maximum value, minimum value and median of raw data acquired from the motion and magnetic sensors; the mean, standard deviation, variance and median of the maximum peaks calculated with the raw data acquired from the motion and magnetic sensors; the five greatest distances between the maximum peaks calculated with the raw data acquired from the motion and magnetic sensors; the mean, standard deviation, variance, median and 26 Mel- Frequency Cepstral Coefficients (MFCC) of the frequencies obtained with FFT based on the raw data acquired from the microphone data; and the distance travelled calculated with the data acquired from the GPS receiver. After the extraction of the features, these will be grouped in different datasets for the application of the ANN methods and to discover the method and dataset that reports better results. The classification stage was incrementally developed, starting with the identification of the most common ADL (i.e., walking, running, going upstairs, going downstairs and standing activities) with motion and magnetic sensors. Next, the environments were identified with acoustic data, i.e., bedroom, bar, classroom, gym, kitchen, living room, hall, street and library. After the environments are recognized, and based on the different sets of sensors commonly available in the mobile devices, the data acquired from the motion and magnetic sensors were combined with the recognized environment in order to differentiate some activities without motion, i.e., sleeping and watching TV. The number of recognized activities in this stage was increased with the use of the distance travelled, extracted from the GPS receiver data, allowing also to recognize the driving activity. After the implementation of the three classification methods with different numbers of iterations, datasets and remaining configurations in a machine with high processing capabilities, the reported results proved that the best method for the recognition of the most common ADL and activities without motion is the DNN method, but the best method for the recognition of environments is the FNN method with Backpropagation. Depending on the number of sensors used, this implementation reports a mean accuracy between 85.89% and 89.51% for the recognition of the most common ADL, equals to 86.50% for the recognition of environments, and equals to 100% for the recognition of activities without motion, reporting an overall accuracy between 85.89% and 92.00%. The last stage of this research work was the implementation of the structured framework for the mobile devices, verifying that the FNN method requires a high processing power for the recognition of environments and the results reported with the mobile application are lower than the results reported with the machine with high processing capabilities used. Thus, the DNN method was also implemented for the recognition of the environments with the mobile devices. Finally, the results reported with the mobile devices show an accuracy between 86.39% and 89.15% for the recognition of the most common ADL, equal to 45.68% for the recognition of environments, and equal to 100% for the recognition of activities without motion, reporting an overall accuracy between 58.02% and 89.15%. Compared with the literature, the results returned by the implemented framework show only a residual improvement. However, the results reported in this research work comprehend the identification of more ADL than the ones described in other studies. The improvement in the recognition of ADL based on the mean of the accuracies is equal to 2.93%, but the maximum number of ADL and environments previously recognized was 13, while the number of ADL and environments recognized with the framework resulting from this research is 16. In conclusion, the framework developed has a mean improvement of 2.93% in the accuracy of the recognition for a larger number of ADL and environments than previously reported. In the future, the achievements reported by this PhD research may be considered as a start point of the development of a personal digital life coach, but the number of ADL and environments recognized by the framework should be increased and the experiments should be performed with different types of devices (i.e., smartphones and smartwatches), and the data imputation and other machine learning methods should be explored in order to attempt to increase the reliability of the framework for the recognition of ADL and its environments.Após os recentes avanços tecnológicos e o crescente uso dos dispositivos móveis, como por exemplo os smartphones, várias soluções podem ser desenvolvidas para melhorar a qualidade de vida dos utilizadores no contexto de Ambientes de Vida Assistida (AVA) ou Ambient Assisted Living (AAL). Os dispositivos móveis integram vários sensores, tais como acelerómetro, giroscópio, magnetómetro, microfone e recetor de Sistema de Posicionamento Global (GPS), que permitem a aquisição de vários parâmetros físicos e fisiológicos para o reconhecimento de diferentes Atividades da Vida Diária (AVD) e os seus ambientes. A definição de AVD inclui um conjunto bem conhecido de tarefas que são tarefas básicas de autocuidado, baseadas nos tipos de habilidades que as pessoas geralmente aprendem na infância. Essas tarefas incluem alimentar-se, tomar banho, vestir-se, fazer os cuidados pessoais, caminhar, correr, pular, subir escadas, dormir, ver televisão, trabalhar, ouvir música, cozinhar, comer, entre outras. No contexto de AVA, alguns indivíduos (comumente chamados de utilizadores) precisam de assistência particular, seja porque o utilizador tem algum tipo de deficiência, seja porque é idoso, ou simplesmente porque o utilizador precisa/quer monitorizar e treinar o seu estilo de vida. A investigação e desenvolvimento de sistemas que fornecem algum tipo de assistência particular está em crescente em muitas áreas de aplicação. Em particular, no futuro, o reconhecimento das AVD é uma parte importante para o desenvolvimento de um assistente pessoal digital, fornecendo uma assistência pessoal de baixo custo aos diferentes tipos de pessoas. pessoas. Para ajudar no reconhecimento das AVD, os ambientes em que estas se desenrolam devem ser reconhecidos para aumentar a fiabilidade destes sistemas. O foco principal desta Tese é o desenvolvimento de métodos para a fusão e classificação dos dados adquiridos a partir dos sensores disponíveis nos dispositivos móveis, para o reconhecimento quase em tempo real das AVD, tendo em consideração a grande diversidade das características dos dispositivos móveis disponíveis no mercado. Para atingir este objetivo, esta Tese iniciou-se com a revisão dos métodos e tecnologias existentes para definir a arquitetura e os módulos do novo método de identificação das AVD. Com esta revisão da literatura e com base no conhecimento adquirido sobre os sensores disponíveis nos dispositivos móveis disponíveis no mercado, um conjunto de tarefas que podem ser identificadas foi definido para as pesquisas e desenvolvimentos desta Tese. Esta revisão também identifica os principais conceitos para o desenvolvimento do novo método de identificação das AVD, utilizando os sensores, são eles: aquisição de dados, processamento de dados, correção de dados, imputação de dados, extração de características, fusão de dados e extração de resultados recorrendo a métodos de inteligência artificial. Um dos desafios está relacionado aos diferentes tipos de dados adquiridos pelos diferentes sensores, mas outros desafios foram encontrados, sendo os mais relevantes o ruído ambiental, o posicionamento do dispositivo durante a realização das atividades diárias, as capacidades limitadas dos dispositivos móveis. As diferentes características das pessoas podem igualmente influenciar a criação dos métodos, escolhendo pessoas com diferentes estilos de vida e características físicas para a aquisição e identificação dos dados adquiridos a partir de sensores. Com base nos dados adquiridos, realizou-se o processamento dos dados, implementando-se métodos de correção dos dados e a extração de características, para iniciar a criação do novo método para o reconhecimento das AVD. Os métodos de imputação de dados foram excluídos da implementação, pois não iriam influenciar os resultados da identificação das AVD e dos ambientes, na medida em que são utilizadas as características extraídas de um conjunto de dados adquiridos durante um intervalo de tempo definido. A seleção dos sensores utilizáveis, bem como das AVD identificáveis, permitirá o desenvolvimento de um método que, considerando o uso de tecnologias para a fusão de dados adquiridos com múltiplos sensores em coordenação com outras informações relativas ao contexto do utilizador, tais como a agenda do utilizador, permitindo estabelecer um perfil de tarefas que o utilizador realiza diariamente. Com base nos resultados obtidos na literatura, o método escolhido para o reconhecimento das AVD são as diferentes variantes das Redes Neuronais Artificiais (RNA), incluindo Multilayer Perceptron (MLP), Feedforward Neural Networks (FNN) with Backpropagation and Deep Neural Networks (DNN). No final, após a criação dos métodos para cada fase do método para o reconhecimento das AVD e ambientes, a implementação sequencial dos diferentes métodos foi realizada num dispositivo móvel para testes adicionais. Após a definição da estrutura do método para o reconhecimento de AVD e ambientes usando dispositivos móveis, verificou-se que a aquisição de dados pode ser realizada com os métodos comuns. Após a aquisição de dados, os mesmos devem ser processados no módulo de processamento de dados, que inclui os métodos de correção de dados e de extração de características. O método de correção de dados utilizado para sensores de movimento e magnéticos é o filtro passa-baixo de modo a reduzir o ruído, mas para os dados acústicos, a Transformada Rápida de Fourier (FFT) foi aplicada para extrair as diferentes frequências. Após a correção dos dados, as diferentes características foram extraídas com base nos tipos de sensores usados, sendo a média, desvio padrão, variância, valor máximo, valor mínimo e mediana de dados adquiridos pelos sensores magnéticos e de movimento, a média, desvio padrão, variância e mediana dos picos máximos calculados com base nos dados adquiridos pelos sensores magnéticos e de movimento, as cinco maiores distâncias entre os picos máximos calculados com os dados adquiridos dos sensores de movimento e magnéticos, a média, desvio padrão, variância e 26 Mel-Frequency Cepstral Coefficients (MFCC) das frequências obtidas com FFT com base nos dados obtidos a partir do microfone, e a distância calculada com os dados adquiridos pelo recetor de GPS. Após a extração das características, as mesmas são agrupadas em diferentes conjuntos de dados para a aplicação dos métodos de RNA de modo a descobrir o método e o conjunto de características que reporta melhores resultados. O módulo de classificação de dados foi incrementalmente desenvolvido, começando com a identificação das AVD comuns com sensores magnéticos e de movimento, i.e., andar, correr, subir escadas, descer escadas e parado. Em seguida, os ambientes são identificados com dados de sensores acústicos, i.e., quarto, bar, sala de aula, ginásio, cozinha, sala de estar, hall, rua e biblioteca. Com base nos ambientes reconhecidos e os restantes sensores disponíveis nos dispositivos móveis, os dados adquiridos dos sensores magnéticos e de movimento foram combinados com o ambiente reconhecido para diferenciar algumas atividades sem movimento (i.e., dormir e ver televisão), onde o número de atividades reconhecidas nesta fase aumenta com a fusão da distância percorrida, extraída a partir dos dados do recetor GPS, permitindo também reconhecer a atividade de conduzir. Após a implementação dos três métodos de classificação com diferentes números de iterações, conjuntos de dados e configurações numa máquina com alta capacidade de processamento, os resultados relatados provaram que o melhor método para o reconhecimento das atividades comuns de AVD e atividades sem movimento é o método DNN, mas o melhor método para o reconhecimento de ambientes é o método FNN with Backpropagation. Dependendo do número de sensores utilizados, esta implementação reporta uma exatidão média entre 85,89% e 89,51% para o reconhecimento das AVD comuns, igual a 86,50% para o reconhecimento de ambientes, e igual a 100% para o reconhecimento de atividades sem movimento, reportando uma exatidão global entre 85,89% e 92,00%. A última etapa desta Tese foi a implementação do método nos dispositivos móveis, verificando que o método FNN requer um alto poder de processamento para o reconhecimento de ambientes e os resultados reportados com estes dispositivos são inferiores aos resultados reportados com a máquina com alta capacidade de processamento utilizada no desenvolvimento do método. Assim, o método DNN foi igualmente implementado para o reconhecimento dos ambientes com os dispositivos móveis. Finalmente, os resultados relatados com os dispositivos móveis reportam uma exatidão entre 86,39% e 89,15% para o reconhecimento das AVD comuns, igual a 45,68% para o reconhecimento de ambientes, e igual a 100% para o reconhecimento de atividades sem movimento, reportando uma exatidão geral entre 58,02% e 89,15%. Com base nos resultados relatados na literatura, os resultados do método desenvolvido mostram uma melhoria residual, mas os resultados desta Tese identificam mais AVD que os demais estudos disponíveis na literatura. A melhoria no reconhecimento das AVD com base na média das exatidões é igual a 2,93%, mas o número máximo de AVD e ambientes reconhecidos pelos estudos disponíveis na literatura é 13, enquanto o número de AVD e ambientes reconhecidos com o método implementado é 16. Assim, o método desenvolvido tem uma melhoria de 2,93% na exatidão do reconhecimento num maior número de AVD e ambientes. Como trabalho futuro, os resultados reportados nesta Tese podem ser considerados um ponto de partida para o desenvolvimento de um assistente digital pessoal, mas o número de ADL e ambientes reconhecidos pelo método deve ser aumentado e as experiências devem ser repetidas com diferentes tipos de dispositivos móveis (i.e., smartphones e smartwatches), e os métodos de imputação e outros métodos de classificação de dados devem ser explorados de modo a tentar aumentar a confiabilidade do método para o reconhecimento das AVD e ambientes

UBibliorum repositorio digital da ubi

Applications of fuzzy counterpropagation neural networks to non-linear function approximation and background noise elimination

Author: Wiryana I. M.
Publication venue: Edith Cowan University, Research Online, Perth, Western Australia
Publication date: 01/01/1994
Field of study

An adaptive filter which can operate in an unknown environment by performing a learning mechanism that is suitable for the speech enhancement process. This research develops a novel ANN model which incorporates the fuzzy set approach and which can perform a non-linear function approximation. The model is used as the basic structure of an adaptive filter. The learning capability of ANN is expected to be able to reduce the development time and cost of the designing adaptive filters based on fuzzy set approach. A combination of both techniques may result in a learnable system that can tackle the vagueness problem of a changing environment where the adaptive filter operates. This proposed model is called Fuzzy Counterpropagation Network (Fuzzy CPN). It has fast learning capability and self-growing structure. This model is applied to non-linear function approximation, chaotic time series prediction and background noise elimination

Research Online @ ECU

Speech dereverberation and speaker separation using microphone arrays in realistic environments

Author: Krikke Teun F.
Publication venue: The University of Edinburgh
Publication date: 01/06/2021
Field of study

This thesis concentrates on comparing novel and existing dereverberation and speaker separation techniques using multiple corpora, including a new corpus collected using a microphone array. Many corpora currently used for these techniques are recorded using head-mounted microphones in anechoic chambers. This novel corpus contains recordings with noise and reverberation made in office and workshop environments. Novel algorithms present a different way of approximating the reverberation, producing results that are competitive with existing algorithms. Dereverberation is evaluated using seven correlation-based algorithms and applied to two different corpora. Three of these are novel algorithms (Hs NTF, Cauchy WPE and Cauchy MIMO WPE). Both non-learning and learning algorithms are tested, with the learning algorithms performing better. For single and multi-channel speaker separation, unsupervised non-negative matrix factorization (NMF) algorithms are compared using three cost functions combined with sparsity, convolution and direction of arrival. The results show that the choice of cost function is important for improving the separation result. Furthermore, six different supervised deep learning algorithms are applied to single channel speaker separation. Historic information improves the result. When comparing NMF to deep learning, NMF is able to converge faster to a solution and provides a better result for the corpora used in this thesis

ROS: The Research Output Service. Heriot-Watt University Edinburgh

Recommended from our members

Identification and prediction of abnormal behaviour activities of daily living in intelligent environments

Author: Mahmoud SM
Publication venue
Publication date: 01/01/2012
Field of study

The aim of this research is to investigate efficient mining of useful information from a sensor network forming an Ambient Intelligence (AmI) environment. In this thesis, we investigate methods for supporting independent living of the elderly (and specifically patients who are suffering from dementia) by means of equipping their home with a simple sensor network to monitor their behaviour and identify their Activities of Daily Living (ADL). Dementia is considered to be one of the most important causes of disability in the elderly. Mostpatients would prefer to use non-intrusive technology to help them tomaintain their independence. Such monitoring and prediction would allow the caregiver to see any trend in the behaviour of the elderly person and to be informed of any abnormal behaviour

Nottingham Trent Institutional Repository (IRep)

Spatial features of reverberant speech: estimation and application to recognition and diarization

Author: Peso Pablo
Publication venue: Electrical and Electronic Engineering, Imperial College London
Publication date: 01/11/2016
Field of study

Distant talking scenarios, such as hands-free calling or teleconference meetings, are essential for natural and comfortable human-machine interaction and they are being increasingly used in multiple contexts. The acquired speech signal in such scenarios is reverberant and affected by additive noise. This signal distortion degrades the performance of speech recognition and diarization systems creating troublesome human-machine interactions.This thesis proposes a method to non-intrusively estimate room acoustic parameters, paying special attention to a room acoustic parameter highly correlated with speech recognition degradation: clarity index. In addition, a method to provide information regarding the estimation accuracy is proposed. An analysis of the phoneme recognition performance for multiple reverberant environments is presented, from which a confusability metric for each phoneme is derived. This confusability metric is then employed to improve reverberant speech recognition performance. Additionally, room acoustic parameters can as well be used in speech recognition to provide robustness against reverberation. A method to exploit clarity index estimates in order to perform reverberant speech recognition is introduced. Finally, room acoustic parameters can also be used to diarize reverberant speech. A room acoustic parameter is proposed to be used as an additional source of information for single-channel diarization purposes in reverberant environments. In multi-channel environments, the time delay of arrival is a feature commonly used to diarize the input speech, however the computation of this feature is affected by reverberation. A method is presented to model the time delay of arrival in a robust manner so that speaker diarization is more accurately performed.Open Acces

Lirias

Spiral - Imperial College Digital Repository

A learning-by-example method for reducing BDCT compression artifacts in high-contrast images.

Author
Publication venue
Publication date: 01/01/2004
Field of study

Wang, Guangyu.Thesis submitted in: December 2003.Thesis (M.Phil.)--Chinese University of Hong Kong, 2004.Includes bibliographical references (leaves 70-75).Abstracts in English and Chinese.Chapter 1 --- Introduction --- p.1Chapter 1.1 --- BDCT Compression Artifacts --- p.1Chapter 1.2 --- Previous Artifact Removal Methods --- p.3Chapter 1.3 --- Our Method --- p.4Chapter 1.4 --- Structure of the Thesis --- p.4Chapter 2 --- Related Work --- p.6Chapter 2.1 --- Image Compression --- p.6Chapter 2.2 --- A Typical BDCT Compression: Baseline JPEG --- p.7Chapter 2.3 --- Existing Artifact Removal Methods --- p.10Chapter 2.3.1 --- Post-Filtering --- p.10Chapter 2.3.2 --- Projection onto Convex Sets --- p.12Chapter 2.3.3 --- Learning by Examples --- p.13Chapter 2.4 --- Other Related Work --- p.14Chapter 3 --- Contamination as Markov Random Field --- p.17Chapter 3.1 --- Markov Random Field --- p.17Chapter 3.2 --- Contamination as MRF --- p.18Chapter 4 --- Training Set Preparation --- p.22Chapter 4.1 --- Training Images Selection --- p.22Chapter 4.2 --- Bit Rate --- p.23Chapter 5 --- Artifact Vectors --- p.26Chapter 5.1 --- Formation of Artifact Vectors --- p.26Chapter 5.2 --- Luminance Remapping --- p.29Chapter 5.3 --- Dominant Implication --- p.29Chapter 6 --- Tree-Structured Vector Quantization --- p.32Chapter 6.1 --- Background --- p.32Chapter 6.1.1 --- Vector Quantization --- p.32Chapter 6.1.2 --- Tree-Structured Vector Quantization --- p.33Chapter 6.1.3 --- K-Means Clustering --- p.34Chapter 6.2 --- TSVQ in Artifact Removal --- p.35Chapter 7 --- Synthesis --- p.39Chapter 7.1 --- Color Processing --- p.39Chapter 7.2 --- Artifact Removal --- p.40Chapter 7.3 --- Selective Rejection of Synthesized Values --- p.42Chapter 8 --- Experimental Results --- p.48Chapter 8.1 --- Image Quality Assessments --- p.48Chapter 8.1.1 --- Peak Signal-Noise Ratio --- p.48Chapter 8.1.2 --- Mean Structural SIMilarity --- p.49Chapter 8.2 --- Performance --- p.50Chapter 8.3 --- How Size of Training Set Affects the Performance --- p.52Chapter 8.4 --- How Bit Rates Affect the Performance --- p.54Chapter 8.5 --- Comparisons --- p.56Chapter 9 --- Conclusion --- p.61Chapter A --- Color Transformation --- p.63Chapter B --- Image Quality --- p.64Chapter B.1 --- Image Quality vs. Quantization Table --- p.64Chapter B.2 --- Image Quality vs. Bit Rate --- p.66Chapter C --- Arti User's Manual --- p.68Bibliography --- p.7

CUHK Digital Repository

The Use of Optimal Cue Mapping to Improve the Intelligibility and Quality of Speech in Complex Binaural Sound Mixtures.

Author: Gao Jingbo
Publication venue: University of York
Publication date: 01/01/2016
Field of study

A person with normal hearing has the ability to follow a particular conversation of interest in a noisy and reverberant environment, whilst simultaneously ignoring the interfering sounds. This task often becomes more challenging for individuals with a hearing impairment. Attending selectively to a sound source is difficult to replicate in machines, including devices such as hearing aids. A correctly set up hearing aid will work well in quiet conditions, but its performance may deteriorate seriously in the presence of competing sounds. To be of help in these more challenging situations the hearing aid should be able to segregate the desired sound source from any other, unwanted sounds. This thesis explores a novel approach to speech segregation based on optimal cue mapping (OCM). OCM is a signal processing method for segregating a sound source based on spatial and other cues extracted from the binaural mixture of sounds arriving at a listener's ears. The spectral energy fraction of the target speech source in the mixture is estimated frame-by-frame using artificial neural networks (ANNs). The resulting target speech magnitude estimates for the left and right channels are combined with the corresponding original phase spectra to produce the final binaural output signal. The performance improvements delivered by the OCM algorithm are evaluated using the STOI and PESQ metrics for speech intelligibility and quality, respectively. A variety of increasingly challenging binaural mixtures are synthesised involving up to five spatially separate sound sources in both anechoic and reverberant environments. The segregated speech consistently exhibits gains in intelligibility and quality and compares favourably with a leading, somewhat more complex approach. The OCM method allows the selection and integration of multiple cues to be optimised and provides scalable performance benefits to suit the available computational resources. The ability to determine the varying relative importance of each cue in different acoustic conditions is expected to facilitate computationally efficient solutions suitable for use in a hearing aid, allowing the aid to operate effectively in a range of typical acoustic environments. Further developments are proposed to achieve this overall goal

White Rose E-theses Online

Efficient FPGA-Based Inference Architectures for Deep Learning Networks

Author: Abdelsalam Ahmed
Publication venue
Publication date: 01/11/2019
Field of study

L’apprentissage profond est devenu la technique de pointe pour de nombreuses applications de classification et de régression. Les modèles d’apprentissage profond, tels que les réseaux de neurones profonds (Deep Neural Network - DNN) et les réseaux de neurones convolutionnels (Convolutional Neural Network - CNN), déploient des dizaines de couches cachées avec des centaines de neurones pour obtenir une représentation significative des données d’entrée. La puissance des DNN et des CNN provient du fait qu’ils sont formés par apprentissage de caractéristiques extraites plutôt que par des algorithmes spécifiques à une tâche. Cependant, cela se fait aux dépens d’un coût de calcul élevé pour les processus d’apprentissage et d’inférence. Cela nécessite des accélérateurs avec de hautes performances et économes en énergie, en particulier pour les inférences lorsque le traitement en temps réel est important. Les FPGA offrent une plateforme attrayante pour accélérer l’inférence des DNN et des CNN en raison de leurs performances, dû à leur configurabilité et de leur efficacité énergétique. Dans cette thèse, nous abordons trois problèmes principaux. Premièrement, nous examinons le problème de la mise en oeuvre précise et efficace des DNN traditionnels entièrement connectés sur les FPGA. Bien que les réseaux de neurones binaires (Binary Neural Network - BNN) utilisent une représentation de données compacte sur un bit par rapport aux données à virgule fixe et à virgule flottante pour les DNN et les CNN traditionnels, ils peuvent encore nécessiter trop de ressources de calcul et de mémoire. Par conséquent, nous étudions le problème de l’implémentation des BNN sur FPGA en tant que deuxième problème. Enfin, nous nous concentrons sur l’introduction des FPGA en tant qu’accélérateurs matériels pour un plus grand nombre de développeurs de logiciels, en particulier ceux qui ne maîtrisent pas les connaissances en programmation sur FPGA. Pour résoudre le premier problème, et dans la mesure où l’implémentation efficace de fonctions d’activation non linéaires est essentielle à la mise en oeuvre de modèles d’apprentissage profond sur les FPGA, nous introduisons une implémentation de fonction d’activation non linéaire basée sur le filtre à interpolation de la transformée cosinus discrète (Discrete Cosine Transform Interpolation Filter - DCTIF). L’architecture d’interpolation proposée combine des opérations arithmétiques sur des échantillons stockés de la fonction de tangente hyperbolique et sur les données d’entrée. Cette solution offre une précision 3× supérieure à celle des travaux précédents, tout en utilisant une quantité similaire des ressources de calculs et une petite quantité de mémoire. Différentes combinaisons de paramètres du filtre DCTIF peuvent être choisies pour compenser la précision et la complexité globale du circuit de la fonction tangente hyperbolique.----------ABSTRACT: Deep learning has evolved to become the state-of-the-art technique for numerous classification and regression applications. Deep learning models, such as Deep Neural Networks (DNNs) and Convolutional Neural Networks (CNNs), deploy dozens of hidden layers with hundreds of neurons to learn a meaningful representation of the input data. The power of DNNs and CNNs comes from the fact that they are trained through feature learning rather than task-specific algorithms. However, this comes at the expense of high computational cost for both training and inference processes. This necessitates high-performance and energyefficient accelerators, especially for inference where real-time processing matters. FPGAs offer an appealing platform for accelerating the inference of DNNs and CNNs due to their performance, configurability and energy-efficiency. In this thesis, we address three main problems. Firstly, we consider the problem of realizing a precise but efficient implementation of traditional fully connected DNNs in FPGAs. Although Binary Neural Networks (BNNs) use compact data representation (1-bit) compared to fixedpoint data and floating-point representation in traditional DNNs and CNNs, they may still need too many computational and memory resources. Therefore, we study the problem of implementing BNNs in FPGAs as the second problem. Finally, we focus on introducing FPGAs as accelerators to a wider range of software developers, especially those who do not posses FPGA programming knowledge. To address the first problem, and since efficient implementation of non-linear activation functions is essential to the implementation of deep learning models on FPGAs, we introduce a non-linear activation function implementation based on the Discrete Cosine Transform Interpolation Filter (DCTIF). The proposed interpolation architecture combines arithmetic operations on the stored samples of the hyperbolic tangent function and on input data. It achieves almost 3× better precision than previous works while using a similar amount of computational resources and a small amount of memory. Various combinations of DCTIF parameters can be chosen to trade off the accuracy and the overall circuit complexity of the tanh function. In an attempt to address the first and third problems, we introduce a Single hidden layer Neural Network (SNN) multiplication-free overlay architecture with fully connected DNN-level performance. This FPGA inference overlay can be used for applications that are normally solved with fully connected DNNs. The overlay avoids the time needed to synthesize, place, route and regenerate a new bitstream when the application changes. The SNN overlay in puts and activations are quantized to power-of-two values, which allows utilizing shift units instead of multipliers. Since the overlay is a SNN, we fill the FPGA chip with the maximum possible number of neurons that can work in parallel in the hidden layer. We evaluate the proposed architecture on typical benchmark datasets and demonstrate higher throughput with respect to the state-of-the-art while achieving the same accuracy. In addition, the SNN overlay makes the power and versatility of FPGAs available to a wider DNN user community and to improve DNN design efficiency

PolyPublie