142 research outputs found

    Learning-Based Reference-Free Speech Quality Assessment for Normal Hearing and Hearing Impaired Applications

    Get PDF
    Accurate speech quality measures are highly attractive and beneficial in the design, fine-tuning, and benchmarking of speech processing algorithms, devices, and communication systems. Switching from narrowband telecommunication to wideband telephony is a change within the telecommunication industry which provides users with better speech quality experience but introduces a number of challenges in speech processing. Noise is the most common distortion on audio signals and as a result there have been a lot of studies on developing high performance noise reduction algorithms. Assistive hearing devices are designed to decrease communication difficulties for people with loss of hearing. As the algorithms within these devices become more advanced, it becomes increasingly crucial to develop accurate and robust quality metrics to assess their performance. Objective speech quality measurements are more attractive compared to subjective assessments as they are cost-effective and subjective variability is eliminated. Although there has been extensive research on objective speech quality evaluation for narrowband speech, those methods are unsuitable for wideband telephony. In the case of hearing-impaired applications, objective quality assessment is challenging as it has to be capable of distinguishing between desired modifications which make signals audible and undesired artifacts. In this thesis a model is proposed that allows extracting two sets of features from the distorted signal only. This approach which is called reference-free (nonintrusive) assessment is attractive as it does not need access to the reference signal. Although this benefit makes nonintrusive assessments suitable for real-time applications, more features need to be extracted and smartly combined to provide comparable accuracy as intrusive metrics. Two feature vectors are proposed to extract information from distorted signals and their performance is examined in three studies. In the first study, both feature vectors are trained on various portions of a noise reduction database for normal hearing applications. In the second study, the same investigation is performed on two sets of databases acquired through several hearing aids. Third study examined the generalizability of the proposed metrics on benchmarking four wireless remote microphones in a variety of environmental conditions. Machine learning techniques are deployed for training the models in the three studies. The studies show that one of the feature sets is robust when trained on different portions of the data from different databases and it also provides good quality prediction accuracy for both normal hearing and hearing-impaired applications

    Supervisory Wireless Control for Critical Industrial Applications

    Get PDF

    Deep learning for speech to text transcription for the portuguese language

    Get PDF
    Automatic speech recognition (ASR) is the process of transcribing audio recordings into text, i.e. to transform speech into the respective sequence of words. This process is also commonly known as speechto- text. Machine learning (ML), the ability of machines to learn from examples, is one of the most relevant areas of artificial intelligence in today’s world. Deep learning is a subset of ML which makes use of Deep Neural Networks, a particular type of Artificial Neural Networks (ANNs), which are intended to mimic human neurons, that possess a large number of layers. This dissertation reviews the state-of-the-art on automatic speech recognition throughout time, from early systems which used Hidden Markov Models (HMMs) and Gaussian Mixture Models (GMMs) to the most up-to-date end-to-end (E2E) deep neural models. Considering the context of the present work, some deep learning algorithms used in state-of-the-art approaches are explained in additional detail. The current work aims to develop an ASR system for the European Portuguese language using deep learning. This is achieved by implementing a pipeline composed of stages responsible for data acquisition, data analysis, data pre-processing, model creation and evaluation of results. With the NVIDIA NeMo framework was possible to implement the QuartzNet15x5 architecture based on 1D time-channel separable convolutions. Following a data-centric methodology, the model developed yielded state-of-the-art Word Error Rate (WER) results of WER = 0.0503; Sumário: Aprendizagem profunda para transcrição de fala para texto para a Língua Portuguesa - O reconhecimento automático de fala (ASR) é o processo de transcrever gravações de áudio em texto, i.e., transformar a fala na respectiva sequência de palavras. Esse processo também é comumente conhecido como speech-to-text. A aprendizagem de máquina (ML), a capacidade das máquinas de aprenderem através de exemplos, é um dos campos mais relevantes da inteligência artificial no mundo atual. Deep learning é um subconjunto de ML que faz uso de Redes Neurais Profundas, um tipo particular de Redes Neurais Artificiais (ANNs), que se destinam a imitar neurónios humanos, que possuem um grande número de camadas Esta dissertação faz uma revisão ao estado da arte do reconhecimento automático de fala ao longo do tempo, desde os primeiros sistemas que usavam Hidden Markov Models (HMMs) e Gaussian Mixture Models (GMMs até sistemas end-to-end (E2E) mais recentes que usam modelos neuronais profundos. Considerando o contexto do presente trabalho, alguns algoritmos de aprendizagem profunda usados em abordagens de ponta são explicados mais detalhadamente. O presente trabalho tem como objetivo desenvolver um sistema ASR para a língua portuguesa europeia utilizando deep learning. Isso é conseguido por meio da implementação de um pipeline composto por etapas responsáveis pela aquisição de dados, análise dos dados, pré-processamento dos dados, criação do modelo e avaliação dos resultados. Com o framework NVIDIA NeMo foi possível implementar a arquitetura QuartzNet15x5 baseada em convoluções 1D separáveis por canal de tempo. Seguindo uma metodologia centrada em dados, o modelo desenvolvido produziu resultados de taxa de erro de palavra (WER) semelhantes aos de estado da arte de WER = 0.0503

    Machine Learning and Signal Processing Design for Edge Acoustic Applications

    Get PDF

    Machine Learning and Signal Processing Design for Edge Acoustic Applications

    Get PDF

    Acoustic Monitoring for Leaks in Water Distribution Networks

    Get PDF
    Water distribution networks (WDNs) are complex systems that are subjected to stresses due to a number of hydraulic and environmental loads. Small leaks can run continuously for extended periods, sometimes indefinitely, undetected due to their minimal impact on the global system characteristics. As a result, system leaks remain an unavoidable reality and water loss estimates range from 10\%-25\% between treatment and delivery. This is a significant economic loss due to non-revenue water and a waste of valuable natural resource. Leaks produce perceptible changes in the sound and vibration fields in their vicinity and this aspect as been exploited in various techniques to detect leaks today. For example, the vibrations caused on the pipe wall in metal pipes, or acoustic energy in the vicinity of the leak, have all been exploited to develop inspection tools. However, most techniques in use today suffer from the following: (i) they are primarily inspection techniques (not monitoring) and often involve an expert user to interpret inspection data; (ii) they employ intrusive procedures to gain access into the WDN and, (iii) their algorithms remain closed and publicly available blind benchmark tests have shown that the detection rates are quite low. The main objective of this thesis is to address each of the aforementioned three problems existing in current methods. First, a technology conducive to long-term monitoring will be developed, which can be deployed year-around in live WDN. Secondly, this technology will be developed around existing access locations in a WDN, specifically from fire hydrant locations. To make this technology conducive to operate in cold climates such as Canada, the technology will be deployed from dry-barrel hydrants. Finally, the technology will be tested with a range of powerful machine learning algorithms, some new and some well-proven, and results published in the open scientific literature. In terms of the technology itself, unlike a majority of technologies that rely on accelerometer or pressure data, this technology relies on the measurement of the acoustic (sound) field within the water column. The problem of leak detection and localization is addressed through a technique called linear prediction (LP). Extensively used in speech processing, LP is shown in this work to be effective in capturing the composite spectrum effects of radiation, pipe system, and leak-induced excitation of the pipe system, with and without leaks, and thus has the potential to be an effective tool to detect leaks. The relatively simple mathematical formulation of LP lends itself well to online implementation in long-term monitoring applications and hence motivates an in-depth investigation. For comparison purposes, model-free methods including a powerful signal processing technique and a technique from machine learning are employed. In terms of leak detection, three data-driven anomaly detection approaches are employed and the LP method is explored for leak localization as well. Tests were conducted on several laboratory test beds, with increasing levels of complexity and in a live WDN in the city of Guelph, Ontario, Canada. Results form this study show that the LP method developed in this thesis provides a unified framework for both leak detection and localization when used in conjunction with semi-supervised anomaly detection algorithms. A novel two-part localization approach is developed which utilizes LP pre-processed data, in tandem with the traditional cross-correlation approach. Results of the field study show that the presented method is able to perform both leak-detection and localization using relatively short time signal lengths. This is advantageous in continuous monitoring situations as this minimizes the data transmission requirements, the latter being one of the main impediments to full-scale implementation and deployment of leak-detection technology

    Computational Labeling, Partitioning, and Balancing of Molecular Networks

    Get PDF
    Recent advances in high throughput techniques enable large-scale molecular quantification with high accuracy, including mRNAs, proteins and metabolites. Differential expression of these molecules in case and control samples provides a way to select phenotype-associated molecules with statistically significant changes. However, given the significance ranking list of molecular changes, how those molecules work together to drive phenotype formation is still unclear. In particular, the changes in molecular quantities are insufficient to interpret the changes in their functional behavior. My study is aimed at answering this question by integrating molecular network data to systematically model and estimate the changes of molecular functional behaviors. We build three computational models to label, partition, and balance molecular networks using modern machine learning techniques. (1) Due to the incompleteness of protein functional annotation, we develop AptRank, an adaptive PageRank model for protein function prediction on bilayer networks. By integrating Gene Ontology (GO) hierarchy with protein-protein interaction network, our AptRank outperforms four state-of-the-art methods in a comprehensive evaluation using benchmark datasets. (2) We next extend our AptRank into a network partitioning method, BioSweeper, to identify functional network modules in which molecules share similar functions and also densely connect to each other. Compared to traditional network partitioning methods using only network connections, BioSweeper, which integrates the GO hierarchy, can automatically identify functionally enriched network modules. (3) Finally, we conduct a differential interaction analysis, namely difFBA, on protein-protein interaction networks by simulating protein fluxes using flux balance analysis (FBA). We test difFBA using quantitative proteomic data from colon cancer, and demonstrate that difFBA offers more insights into functional changes in molecular behavior than does protein quantity changes alone. We conclude that our integrative network model increases the observational dimensions of complex biological systems, and enables us to more deeply understand the causal relationships between genotypes and phenotypes

    Development of a sensory substitution API

    Get PDF
    2018 Summer.Includes bibliographical references.Sensory substitution – or the practice of mapping information from one sensory modality to another – has been shown to be a viable technique for non-invasive sensory replacement and augmentation. With the rise in popularity, ubiquity, and capability of mobile devices and wearable electronics, sensory substitution research has seen a resurgence in recent years. Due to the standard features of mobile/wearable electronics such as Bluetooth, multicore processing, and audio recording, these devices can be used to drive sensory substitution systems. Therefore, there exists a need for a flexible, extensible software package capable of performing the required real-time data processing for sensory substitution, on modern mobile devices. The primary contribution of this thesis is the development and release of an Open Source Application Programming Interface (API) capable of managing an audio stream from the source of sound to a sensory stimulus interface on the body. The API (named Tactile Waves) is written in the Java programming language and packaged as both a Java library (JAR) and Android library (AAR). The development and design of the library is presented, and its primary functions are explained. Implementation details for each primary function are discussed. Performance evaluation of all processing routines is performed to ensure real-time capability, and the results are summarized. Finally, future improvements to the library and additional applications of sensory substitution are proposed

    Modeling and fault diagnosis of broken rotor bar faults in induction motors

    Get PDF
    Due to vast industrial applications, induction motors are often referred to as the “workhorse” of the industry. To detect incipient faults and improve reliability, condition monitoring and fault diagnosis of induction motors are very important. In this thesis, the focus is to model and detect broken rotor bar (BRB) faults in induction motors through the finite element analysis and machine learning approach. The most successfully deployed method for the BRB fault detection is Motor Current Signature Analysis (MSCA) due to its non-invasive, easy to implement, lower cost, reliable and effective nature. However, MSCA has its own limitations. To overcome such limitations, fault diagnosis using machine learning attracts more research interests lately. Feature selection is an important part of machine learning techniques. The main contributions of the thesis include: 1) model a healthy motor and a motor with different number of BRBs using finite element analysis software ANSYS; 2) analyze BRB faults of induction motors using various spectral analysis algorithms (parametric and non-parametric) by processing stator current signals obtained from the finite element analysis; 3) conduct feature selection and classification of BRB faults using support vector machine (SVM) and artificial neural network (ANN); 4) analyze neighbouring and spaced BRB faults using Burg and Welch PSD analysis

    Design of approximate overclocked datapath

    Get PDF
    Embedded applications can often demand stringent latency requirements. While high degrees of parallelism within custom FPGA-based accelerators may help to some extent, it may also be necessary to limit the precision used in the datapath to boost the operating frequency of the implementation. However, by reducing the precision, the engineer introduces quantisation error into the design. In this thesis, we describe an alternative circuit design methodology when considering trade-offs between accuracy, performance and silicon area. We compare two different approaches that could trade accuracy for performance. One is the traditional approach where the precision used in the datapath is limited to meet a target latency. The other is a proposed new approach which simply allows the datapath to operate without timing closure. We demonstrate analytically and experimentally that for many applications it would be preferable to simply overclock the design and accept that timing violations may arise. Since the errors introduced by timing violations occur rarely, they will cause less noise than quantisation errors. Furthermore, we show that conventional forms of computer arithmetic do not fail gracefully when pushed beyond the deterministic clocking region. In this thesis we take a fresh look at Online Arithmetic, originally proposed for digit serial operation, and synthesize unrolled digit parallel online arithmetic operators to allow for graceful degradation. We quantify the impact of timing violations on key arithmetic primitives, and show that substantial performance benefits can be obtained in comparison to binary arithmetic. Since timing errors are caused by long carry chains, these result in errors in least significant digits with online arithmetic, causing less impact than conventional implementations.Open Acces
    corecore