626 research outputs found

    Privacy and Accountability in Black-Box Medicine

    Get PDF
    Black-box medicine—the use of big data and sophisticated machine learning techniques for health-care applications—could be the future of personalized medicine. Black-box medicine promises to make it easier to diagnose rare diseases and conditions, identify the most promising treatments, and allocate scarce resources among different patients. But to succeed, it must overcome two separate, but related, problems: patient privacy and algorithmic accountability. Privacy is a problem because researchers need access to huge amounts of patient health information to generate useful medical predictions. And accountability is a problem because black-box algorithms must be verified by outsiders to ensure they are accurate and unbiased, but this means giving outsiders access to this health information. This article examines the tension between the twin goals of privacy and accountability and develops a framework for balancing that tension. It proposes three pillars for an effective system of privacy-preserving accountability: substantive limitations on the collection, use, and disclosure of patient information; independent gatekeepers regulating information sharing between those developing and verifying black-box algorithms; and information-security requirements to prevent unintentional disclosures of patient information. The article examines and draws on a similar debate in the field of clinical trials, where disclosing information from past trials can lead to new treatments but also threatens patient privacy

    Pré-distorção neuronal analógica de amplificadores de potência

    Get PDF
    Mestrado em Engenharia Electrónica e TelecomunicaçõesAs especificações das redes de telecomunicações de quinta geração ultrapassam largamente as capacidades das técnicas mais modernas de linearização de amplificadores de potência como a pré-distorção digital. Por esta razão, esta tese propõe um método de linearização alternativo: um prédistorçor analógico, à banda base, constituído por uma rede neuronal artificial. A rede foi treinada usando três métodos distintos: avaliação de política através de TD(λ), otimização por estratégias de evolução como CMA-ES, e um algoritmo original de aproximações sucessivas. Apesar do TD(λ) não ter produzido resultados de simulação satisfatórios, os resultados dos outros dois métodos foram excelentes: um NMSE entre as funções de transferência pretendida e efetiva do amplificador pré-distorcido até -70 dB, e uma redução total das componentes de distorção do espetro de frequência de um sinal GSM de teste. Apesar das estratégias de evolução terem alcançado este nível de linearização após cerca de 4 horas de execução contínua, o algoritmo original consegue fazê-lo numa questão de segundos. Desta forma, esta tese abre caminho para que se cumpram as exigências das redes de nova geração.Fifth-generation telecommunications networks are expected to have technical requirements which far outpace the capabilities of modern power amplifier (PA) linearization techniques such as digital predistortion. For this reason, this thesis proposes an alternative linearization method: a base band analog predistorter consisting of an artificial neural network. The network was trained through three very distinct methods: policy evaluation using TD(λ), optimization using evolution strategies such as CMA-ES, and an original algorithm of successive approximations. While TD(λ) proved to be unsuccessful, the other two methods produced excellent simulation results: an NMSE between the target and the predistorted PA transfer functions up to -70 dB, and the complete elimination of distortion components in the frequency spectrum of a GSM test signal. While the evolution strategies achieved this level of linearization after about 4 hours of continuous work, the original algorithm consistently does so in a matter of seconds. In effect, this thesis outlines a way towards the meeting of the specifications of next-generation networks

    Multiuser detection employing recurrent neural networks for DS-CDMA systems.

    Get PDF
    Thesis (M.Sc.Eng.)-University of KwaZulu-Natal, 2006.Over the last decade, access to personal wireless communication networks has evolved to a point of necessity. Attached to the phenomenal growth of the telecommunications industry in recent times is an escalating demand for higher data rates and efficient spectrum utilization. This demand is fuelling the advancement of third generation (3G), as well as future, wireless networks. Current 3G technologies are adding a dimension of mobility to services that have become an integral part of modem everyday life. Wideband code division multiple access (WCDMA) is the standardized multiple access scheme for 3G Universal Mobile Telecommunication System (UMTS). As an air interface solution, CDMA has received considerable interest over the past two decades and a great deal of current research is concerned with improving the application of CDMA in 3G systems. A factoring component of CDMA is multiuser detection (MUD), which is aimed at enhancing system capacity and performance, by optimally demodulating multiple interfering signals that overlap in time and frequency. This is a major research problem in multipoint-to-point communications. Due to the complexity associated with optimal maximum likelihood detection, many different sub-optimal solutions have been proposed. This focus of this dissertation is the application of neural networks for MUD, in a direct sequence CDMA (DS-CDMA) system. Specifically, it explores how the Hopfield recurrent neural network (RNN) can be employed to give yet another suboptimal solution to the optimization problem of MUD. There is great scope for neural networks in fields encompassing communications. This is primarily attributed to their non-linearity, adaptivity and key function as data classifiers. In the context of optimum multiuser detection, neural networks have been successfully employed to solve similar combinatorial optimization problems. The concepts of CDMA and MUD are discussed. The use of a vector-valued transmission model for DS-CDMA is illustrated, and common linear sub-optimal MUD schemes, as well as the maximum likelihood criterion, are reviewed. The performance of these sub-optimal MUD schemes is demonstrated. The Hopfield neural network (HNN) for combinatorial optimization is discussed. Basic concepts and techniques related to the field of statistical mechanics are introduced and it is shown how they may be employed to analyze neural classification. Stochastic techniques are considered in the context of improving the performance of the HNN. A neural-based receiver, which employs a stochastic HNN and a simulated annealing technique, is proposed. Its performance is analyzed in a communication channel that is affected by additive white Gaussian noise (AWGN) by way of simulation. The performance of the proposed scheme is compared to that of the single-user matched filter, linear decorrelating and minimum mean-square error detectors, as well as the classical HNN and the stochastic Hopfield network (SHN) detectors. Concluding, the feasibility of neural networks (in this case the HNN) for MUD in a DS-CDMA system is explored by quantifying the relative performance of the proposed model using simulation results and in view of implementation issues

    An investigation of the utility of monaural sound source separation via nonnegative matrix factorization applied to acoustic echo and reverberation mitigation for hands-free telephony

    Get PDF
    In this thesis we investigate the applicability and utility of Monaural Sound Source Separation (MSSS) via Nonnegative Matrix Factorization (NMF) for various problems related to audio for hands-free telephony. We first investigate MSSS via NMF as an alternative acoustic echo reduction approach to existing approaches such as Acoustic Echo Cancellation (AEC). To this end, we present the single-channel acoustic echo problem as an MSSS problem, in which the objective is to extract the users signal from a mixture also containing acoustic echo and noise. To perform separation, NMF is used to decompose the near-end microphone signal onto the union of two nonnegative bases in the magnitude Short Time Fourier Transform domain. One of these bases is for the spectral energy of the acoustic echo signal, and is formed from the in- coming far-end user’s speech, while the other basis is for the spectral energy of the near-end speaker, and is trained with speech data a priori. In comparison to AEC, the speaker extraction approach obviates Double-Talk Detection (DTD), and is demonstrated to attain its maximal echo mitigation performance immediately upon initiation and to maintain that performance during and after room changes for similar computational requirements. Speaker extraction is also shown to introduce distortion of the near-end speech signal during double-talk, which is quantified by means of a speech distortion measure and compared to that of AEC. Subsequently, we address Double-Talk Detection (DTD) for block-based AEC algorithms. We propose a novel block-based DTD algorithm that uses the available signals and the estimate of the echo signal that is produced by NMF-based speaker extraction to compute a suitably normalized correlation-based decision variable, which is compared to a fixed threshold to decide on doubletalk. Using a standard evaluation technique, the proposed algorithm is shown to have comparable detection performance to an existing conventional block-based DTD algorithm. It is also demonstrated to inherit the room change insensitivity of speaker extraction, with the proposed DTD algorithm generating minimal false doubletalk indications upon initiation and in response to room changes in comparison to the existing conventional DTD. We also show that this property allows its paired AEC to converge at a rate close to the optimum. Another focus of this thesis is the problem of inverting a single measurement of a non- minimum phase Room Impulse Response (RIR). We describe the process by which percep- tually detrimental all-pass phase distortion arises in reverberant speech filtered by the inverse of the minimum phase component of the RIR; in short, such distortion arises from inverting the magnitude response of the high-Q maximum phase zeros of the RIR. We then propose two novel partial inversion schemes that precisely mitigate this distortion. One of these schemes employs NMF-based MSSS to separate the all-pass phase distortion from the target speech in the magnitude STFT domain, while the other approach modifies the inverse minimum phase filter such that the magnitude response of the maximum phase zeros of the RIR is not fully compensated. Subjective listening tests reveal that the proposed schemes generally produce better quality output speech than a comparable inversion technique

    An investigation of the utility of monaural sound source separation via nonnegative matrix factorization applied to acoustic echo and reverberation mitigation for hands-free telephony

    Get PDF
    In this thesis we investigate the applicability and utility of Monaural Sound Source Separation (MSSS) via Nonnegative Matrix Factorization (NMF) for various problems related to audio for hands-free telephony. We first investigate MSSS via NMF as an alternative acoustic echo reduction approach to existing approaches such as Acoustic Echo Cancellation (AEC). To this end, we present the single-channel acoustic echo problem as an MSSS problem, in which the objective is to extract the users signal from a mixture also containing acoustic echo and noise. To perform separation, NMF is used to decompose the near-end microphone signal onto the union of two nonnegative bases in the magnitude Short Time Fourier Transform domain. One of these bases is for the spectral energy of the acoustic echo signal, and is formed from the in- coming far-end user’s speech, while the other basis is for the spectral energy of the near-end speaker, and is trained with speech data a priori. In comparison to AEC, the speaker extraction approach obviates Double-Talk Detection (DTD), and is demonstrated to attain its maximal echo mitigation performance immediately upon initiation and to maintain that performance during and after room changes for similar computational requirements. Speaker extraction is also shown to introduce distortion of the near-end speech signal during double-talk, which is quantified by means of a speech distortion measure and compared to that of AEC. Subsequently, we address Double-Talk Detection (DTD) for block-based AEC algorithms. We propose a novel block-based DTD algorithm that uses the available signals and the estimate of the echo signal that is produced by NMF-based speaker extraction to compute a suitably normalized correlation-based decision variable, which is compared to a fixed threshold to decide on doubletalk. Using a standard evaluation technique, the proposed algorithm is shown to have comparable detection performance to an existing conventional block-based DTD algorithm. It is also demonstrated to inherit the room change insensitivity of speaker extraction, with the proposed DTD algorithm generating minimal false doubletalk indications upon initiation and in response to room changes in comparison to the existing conventional DTD. We also show that this property allows its paired AEC to converge at a rate close to the optimum. Another focus of this thesis is the problem of inverting a single measurement of a non- minimum phase Room Impulse Response (RIR). We describe the process by which percep- tually detrimental all-pass phase distortion arises in reverberant speech filtered by the inverse of the minimum phase component of the RIR; in short, such distortion arises from inverting the magnitude response of the high-Q maximum phase zeros of the RIR. We then propose two novel partial inversion schemes that precisely mitigate this distortion. One of these schemes employs NMF-based MSSS to separate the all-pass phase distortion from the target speech in the magnitude STFT domain, while the other approach modifies the inverse minimum phase filter such that the magnitude response of the maximum phase zeros of the RIR is not fully compensated. Subjective listening tests reveal that the proposed schemes generally produce better quality output speech than a comparable inversion technique

    Towards an interactive framework for robot dancing applications

    Get PDF
    Estágio realizado no INESC-Porto e orientado pelo Prof. Doutor Fabien GouyonTese de mestrado integrado. Engenharia Electrotécnica e de Computadores - Major Telecomunicações. Faculdade de Engenharia. Universidade do Porto. 200

    Mask-based enhancement of very noisy speech

    Get PDF
    When speech is contaminated by high levels of additive noise, both its perceptual quality and its intelligibility are reduced. Studies show that conventional approaches to speech enhancement are able to improve quality but not intelligibility. However, in recent years, algorithms that estimate a time-frequency mask from noisy speech using a supervised machine learning approach and then apply this mask to the noisy speech have been shown to be capable of improving intelligibility. The most direct way of measuring intelligibility is to carry out listening tests with human test subjects. However, in situations where listening tests are impractical and where some additional uncertainty in the results is permissible, for example during the development phase of a speech enhancer, intrusive intelligibility metrics can provide an alternative to listening tests. This thesis begins by outlining a new intrusive intelligibility metric, WSTOI, that is a development of the existing STOI metric. WSTOI improves STOI by weighting the intelligibility contributions of different time-frequency regions with an estimate of their intelligibility content. The prediction accuracies of WSTOI and STOI are compared for a range of noises and noise suppression algorithms and it is found that WSTOI outperforms STOI in all tested conditions. The thesis then investigates the best choice of mask-estimation algorithm, target mask, and method of applying the estimated mask. A new target mask, the HSWOBM, is proposed that optimises a modified version of WSTOI with a higher frequency resolution. The HSWOBM is optimised for a stochastic noise signal to encourage a mask estimator trained on the HSWOBM to generalise better to unseen noise conditions. A high frequency resolution version of WSTOI is optimised as this gives improvements in predicted quality compared with optimising WSTOI. Of the tested approaches to target mask estimation, the best-performing approach uses a feed-forward neural network with a loss function based on WSTOI. The best-performing feature set is based on the gains produced by a classical speech enhancer and an estimate of the local voiced-speech-plus-noise to noise ratio in different time-frequency regions, which is obtained with the aid of a pitch estimator. When the estimated target mask is applied in the conventional way, by multiplying the speech by the mask in the time-frequency domain, it can result in speech with very poor perceptual quality. The final chapter of this thesis therefore investigates alternative approaches to applying the estimated mask to the noisy speech, in order to improve both intelligibility and quality. An approach is developed that uses the mask to supply prior information about the speech presence probability to a classical speech enhancer that minimises the expected squared error in the log spectral amplitudes. The proposed end-to-end enhancer outperforms existing algorithms in terms of predicted quality and intelligibility for most noise types.Open Acces

    Bio-motivated features and deep learning for robust speech recognition

    Get PDF
    Mención Internacional en el título de doctorIn spite of the enormous leap forward that the Automatic Speech Recognition (ASR) technologies has experienced over the last five years their performance under hard environmental condition is still far from that of humans preventing their adoption in several real applications. In this thesis the challenge of robustness of modern automatic speech recognition systems is addressed following two main research lines. The first one focuses on modeling the human auditory system to improve the robustness of the feature extraction stage yielding to novel auditory motivated features. Two main contributions are produced. On the one hand, a model of the masking behaviour of the Human Auditory System (HAS) is introduced, based on the non-linear filtering of a speech spectro-temporal representation applied simultaneously to both frequency and time domains. This filtering is accomplished by using image processing techniques, in particular mathematical morphology operations with an specifically designed Structuring Element (SE) that closely resembles the masking phenomena that take place in the cochlea. On the other hand, the temporal patterns of auditory-nerve firings are modeled. Most conventional acoustic features are based on short-time energy per frequency band discarding the information contained in the temporal patterns. Our contribution is the design of several types of feature extraction schemes based on the synchrony effect of auditory-nerve activity, showing that the modeling of this effect can indeed improve speech recognition accuracy in the presence of additive noise. Both models are further integrated into the well known Power Normalized Cepstral Coefficients (PNCC). The second research line addresses the problem of robustness in noisy environments by means of the use of Deep Neural Networks (DNNs)-based acoustic modeling and, in particular, of Convolutional Neural Networks (CNNs) architectures. A deep residual network scheme is proposed and adapted for our purposes, allowing Residual Networks (ResNets), originally intended for image processing tasks, to be used in speech recognition where the network input is small in comparison with usual image dimensions. We have observed that ResNets on their own already enhance the robustness of the whole system against noisy conditions. Moreover, our experiments demonstrate that their combination with the auditory motivated features devised in this thesis provide significant improvements in recognition accuracy in comparison to other state-of-the-art CNN-based ASR systems under mismatched conditions, while maintaining the performance in matched scenarios. The proposed methods have been thoroughly tested and compared with other state-of-the-art proposals for a variety of datasets and conditions. The obtained results prove that our methods outperform other state-of-the-art approaches and reveal that they are suitable for practical applications, specially where the operating conditions are unknown.El objetivo de esta tesis se centra en proponer soluciones al problema del reconocimiento de habla robusto; por ello, se han llevado a cabo dos líneas de investigación. En la primera líınea se han propuesto esquemas de extracción de características novedosos, basados en el modelado del comportamiento del sistema auditivo humano, modelando especialmente los fenómenos de enmascaramiento y sincronía. En la segunda, se propone mejorar las tasas de reconocimiento mediante el uso de técnicas de aprendizaje profundo, en conjunto con las características propuestas. Los métodos propuestos tienen como principal objetivo, mejorar la precisión del sistema de reconocimiento cuando las condiciones de operación no son conocidas, aunque el caso contrario también ha sido abordado. En concreto, nuestras principales propuestas son los siguientes: Simular el sistema auditivo humano con el objetivo de mejorar la tasa de reconocimiento en condiciones difíciles, principalmente en situaciones de alto ruido, proponiendo esquemas de extracción de características novedosos. Siguiendo esta dirección, nuestras principales propuestas se detallan a continuación: • Modelar el comportamiento de enmascaramiento del sistema auditivo humano, usando técnicas del procesado de imagen sobre el espectro, en concreto, llevando a cabo el diseño de un filtro morfológico que captura este efecto. • Modelar el efecto de la sincroní que tiene lugar en el nervio auditivo. • La integración de ambos modelos en los conocidos Power Normalized Cepstral Coefficients (PNCC). La aplicación de técnicas de aprendizaje profundo con el objetivo de hacer el sistema más robusto frente al ruido, en particular con el uso de redes neuronales convolucionales profundas, como pueden ser las redes residuales. Por último, la aplicación de las características propuestas en combinación con las redes neuronales profundas, con el objetivo principal de obtener mejoras significativas, cuando las condiciones de entrenamiento y test no coinciden.Programa Oficial de Doctorado en Multimedia y ComunicacionesPresidente: Javier Ferreiros López.- Secretario: Fernando Díaz de María.- Vocal: Rubén Solera Ureñ

    Optimal use of computing equipment in an automated industrial inspection context

    Get PDF
    This thesis deals with automatic defect detection. The objective was to develop the techniques required by a small manufacturing business to make cost-efficient use of inspection technology. In our work on inspection techniques we discuss image acquisition and the choice between custom and general-purpose processing hardware. We examine the classes of general-purpose computer available and study popular operating systems in detail. We highlight the advantages of a hybrid system interconnected via a local area network and develop a sophisticated suite of image-processing software based on it. We quantitatively study the performance of elements of the TCP/IP networking protocol suite and comment on appropriate protocol selection for parallel distributed applications. We implement our own distributed application based on these findings. In our work on inspection algorithms we investigate the potential uses of iterated function series and Fourier transform operators when preprocessing images of defects in aluminium plate acquired using a linescan camera. We employ a multi-layer perceptron neural network trained by backpropagation as a classifier. We examine the effect on the training process of the number of nodes in the hidden layer and the ability of the network to identify faults in images of aluminium plate. We investigate techniques for introducing positional independence into the network's behaviour. We analyse the pattern of weights induced in the network after training in order to gain insight into the logic of its internal representation. We conclude that the backpropagation training process is sufficiently computationally intensive so as to present a real barrier to further development in practical neural network techniques and seek ways to achieve a speed-up. Weconsider the training process as a search problem and arrive at a process involving multiple, parallel search "vectors" and aspects of genetic algorithms. We implement the system as the mentioned distributed application and comment on its performance

    Extrinsic and Intrinsic Control of Integrative Processes in Neural Systems

    Get PDF
    At the simplest dynamical level, neurons can be understood as integrators. That is, neurons accumulate excitation from afferent neurons until, eventually, a threshold is reached and they produce a spike. Here, we consider the control of integrative processes in neural circuits in two contexts. First, we consider the problem of extrinsic neurocontrol, or modulating the spiking activity of neural circuits using stimulation, as is desired in a wide range of neural engineering applications. From a control-theoretic standpoint, such a problem presents several interesting nuances, including discontinuity in the dynamics due to the spiking process, and the technological limitations associated with underactuation (i.e., many neurons controlled by the same stimulation input). We consider these factors in a canonical problem of selective spiking, wherein a particular integrative neuron is controlled to a spike, while other neurons remain below threshold. This problem is solved in an optimal control framework, wherein several new geometric phenomena associated with the aforementioned nuances are revealed. Further, in an effort to enable scaling to large populations, we develop relaxations and alternative approaches, including the use of statistical models within the control design framework. Following this treatment of extrinsic control, we turn attention to a scientifically-driven question pertaining to intrinsic control, i.e., how neurons in the brain may themselves be controlling higher-level perceptual processes. We specifically postulate that neural activity is decoded, or “read-out” in terms of a drift-diffusion process, so that spiking activity drives a latent state towards a detection/perception threshold. Under this premise, we optimize the neural spiking trajectories according to several empirical cost functions and show that the optimal responses are physiologically plausible. In this vein, we also examine the nature of \u27optimal evidence\u27 for the general class of threshold-based integrative decision problems
    corecore