30 research outputs found

    Unveiling the frontiers of deep learning: innovations shaping diverse domains

    Full text link
    Deep learning (DL) enables the development of computer models that are capable of learning, visualizing, optimizing, refining, and predicting data. In recent years, DL has been applied in a range of fields, including audio-visual data processing, agriculture, transportation prediction, natural language, biomedicine, disaster management, bioinformatics, drug design, genomics, face recognition, and ecology. To explore the current state of deep learning, it is necessary to investigate the latest developments and applications of deep learning in these disciplines. However, the literature is lacking in exploring the applications of deep learning in all potential sectors. This paper thus extensively investigates the potential applications of deep learning across all major fields of study as well as the associated benefits and challenges. As evidenced in the literature, DL exhibits accuracy in prediction and analysis, makes it a powerful computational tool, and has the ability to articulate itself and optimize, making it effective in processing data with no prior training. Given its independence from training data, deep learning necessitates massive amounts of data for effective analysis and processing, much like data volume. To handle the challenge of compiling huge amounts of medical, scientific, healthcare, and environmental data for use in deep learning, gated architectures like LSTMs and GRUs can be utilized. For multimodal learning, shared neurons in the neural network for all activities and specialized neurons for particular tasks are necessary.Comment: 64 pages, 3 figures, 3 table

    Air Traffic Management Abbreviation Compendium

    Get PDF
    As in all fields of work, an unmanageable number of abbreviations are used today in aviation for terms, definitions, commands, standards and technical descriptions. This applies in general to the areas of aeronautical communication, navigation and surveillance, cockpit and air traffic control working positions, passenger and cargo transport, and all other areas of flight planning, organization and guidance. In addition, many abbreviations are used more than once or have different meanings in different languages. In order to obtain an overview of the most common abbreviations used in air traffic management, organizations like EUROCONTROL, FAA, DWD and DLR have published lists of abbreviations in the past, which have also been enclosed in this document. In addition, abbreviations from some larger international projects related to aviation have been included to provide users with a directory as complete as possible. This means that the second edition of the Air Traffic Management Abbreviation Compendium includes now around 16,500 abbreviations and acronyms from the field of aviation

    Network monitoring and performance assessment: from statistical models to neural networks

    Full text link
    Máster en Investigación e Innovación en Tecnologías de la Información y las ComunicacionesIn the last few years, computer networks have been playing a key role in many different fields. Companies have also evolved around the internet, getting advantage of the huge capacity of diffusion. Nevertheless, this also means that computer networks and IT systems have become a critical element for the business. In case of interruption or malfunction of the systems, this could result in devastating economic impact. In this light, it is necessary to provide models to properly evaluate and characterize the computer networks. Focusing on modeling, one has many different alternatives: from classical options based on statistic to recent alternatives based on machine learning and deep learning. In this work, we want to study the different models available for each context, paying attention to the advantage and disadvantages to provide the best solution for each case. To cover the majority of the spectrum, three cases have been studied: time-unaware phenomena, where we look at the bias-variance trade-off, time-dependent phenomena, where we pay attention the trends of the time series, and text processing to process attributes obtained by DPI. For each case, several alternatives have been studied and solutions have been tested both with synthetic data and real-world data, showing the successfulness of the proposa

    Bayesian and echoic log-surprise for auditory saliency detection

    Get PDF
    Mención Internacional en el título de doctorAttention is defined as the mechanism that allows the brain to categorize and prioritize information acquired using our senses and act according to the environmental context and the available mental resources. The attention mechanism can be further subdivided into two types: top-down and bottomup. Top-down attention is goal or task-driven and implies that a participant has some previous knowledge about the task that he or she is trying to solve. Alternatively, bottom-up attention only depends on the perceived features of the target object and its surroundings and is a very fast mechanism that is believed to be crucial for human survival. Bottom-up attention is commonly known as saliency or salience, and can be defined as a property of the signals that are perceived by our senses that make them attentionally prominent for some reason. This thesis is related with the concept of saliency detection using automatic algorithms for audio signals. In recent years progress in the area of visual saliency research has been remarkable, a topic where the goal consists of detecting which objects or content from a visual scene are prominent enough to capture the attention of a spectator. However, this progress has not been carried out to other alternative modalities. This is the case of auditory saliency, where there is still no consensus about how to measure the saliency of an event, and consequently there are no specific labeled datasets to compare new algorithms and proposals. In this work two new auditory saliency detection algorithms are presented and evaluated. For their evaluation, we make use of Acoustic Event Detection/Classification datasets, whose labels include onset times among other aspects. We use such datasets and labeling since there is psychological evidence suggesting that human beings are quite sensitive to the spontaneous appearance of acoustic objects. We use three datasets: DCASE 2016 (Task 2), MIVIA road audio events and UPC-TALP, totalling 3400 labeled acoustic events. Regarding the algorithms that we employ for benchmarking, these comprise techniques for saliency detection designed by Kayser and Kalinli, a voice activity detector, an energy thresholding method and four music information retrieval onset detectors: NWPD, WPD, CD and SF. We put forward two auditory saliency algorithms: Bayesian Log-surprise and Echoic Log-surprise. The former is an evolution of Bayesian Surprise, a methodology that by means of the Kullback-Leibler divergence computed between two consecutive temporal windows is capable of detecting anomalous or salient events. As the output Surprise signal has some drawbacks that should be overcome, we introduce some improvements that led to the approach that we named Bayesian Log-surprise. These include an amplitude compression stage and the addition of perceptual knowledge to pre-process the input signal. The latter, named Echoic Log-surprise, fuses several Bayesian Log-surprise signals computed considering different memory lengths that represent different temporal scales. The fusion process is performed using statistical divergences, resulting in saliency signals with certain advantages such as a significant reduction in the background noise level and a noticeable increase in the detection scores. Moreover, since the original Echoic Log-surprise presents certain limitations, we propose a set of improvements: we test some alternative statistical divergences, we introduce a new fusion strategy and we change the thresholding mechanism used to determine if the final output signal is salient or not for a dynamic thresholding algorithm. Results show that the most significant modification in terms of performance is the latter, a proposal that reduces the dispersion observed in the scores produced by the system and enables online functioning. Finally, our last analysis concerns the robustness of all the algorithms presented in this thesis against environmental noise. We use noises of different natures, from stationary noise to pre-recorded noises acquired in real environments such as cafeterias, train stations, etc. The results suggest that for different signal-to-noise ratios the most robust algorithm is Echoic Log-surprise, since its detection capabilities are the least influenced by noise.La atención es definida como el mecanismo que permite a nuestro cerebro categorizar y priorizar la información percibida mediante nuestros sentidos, a la par que ayuda a actuar en función del contexto y los recursos mentales disponibles. Este mecanismo puede dividirse en dos variantes: top-down y bottom-up. La atención top-down posee un objetivo que el sujeto pretende cumplir, e implica que el individuo posee cierto conocimiento previo sobre la tarea que trata de realizar. Por otra parte, la atención bottom-up depende exclusivamente de las características físicas percibidas a partir de un objeto y su entorno, y actúa a partir de dicha información de forma autónoma y rápida. Se teoriza que dicho mecanismo es crucial para la supervivencia de los individuos frente a amenazas repentinas. La atención bottom-up es comúnmente denominada saliencia, y es definida como una propiedad de las señales que son percibidas por nuestros sentidos y que por algún motivo destacan sobre el resto de información adquirida. Esta tesis está relacionada con la detección automática de la saliencia en señales acústicas mediante la utilización de algoritmos. En los últimos años el avance en la investigación de la saliencia visual ha sido notable, un tema en el cual la principal meta consiste en detectar qué objetos o contenido de una escena visual son lo bastante prominentes para captar la atención de un espectador. Sin embargo, estos avances no han sido trasladados a otras modalidades. Tal es el caso de la saliencia auditiva, donde aún no existe consenso sobre cómo medir la prominencia de un evento acústico, y en consecuencia no existen bases de datos especializadas que permitan comparar nuevos algoritmos y modelos. En este trabajo evaluamos algunos algoritmos de detección de saliencia auditiva. Para ello, empleamos bases de datos para la detección y clasificación de eventos acústicos, cuyas etiquetas incluyen el tiempo de inicio (onset) de dichos eventos entre otras características. Nuestra hipótesis se basa en estudios psicológicos que sugieren que los seres humanos somos muy sensibles a la aparición de objetos acústicos. Empleamos tres bases de datos: DCASE 2016 (Task 2), MIVIA road audio events y UPC-TALP, las cuales suman en total 3400 eventos etiquetados. Respecto a los algoritmos utilizados en nuestro sistema de referencia (benchmark), incluimos los algoritmos de saliencia diseñados por Kayser y Kalinli, un detector de actividad vocal (VAD), un umbralizador energético y cuatro técnicas para la detección de onsets en música: NWPD, WPD, CD and SF. Presentamos dos algoritmos de saliencia auditiva: Bayesian Log-surprise y Echoic Log-surprise. El primero es una evolución de Bayesian Surprise, una metodología que utiliza la divergencia de Kullback-Leibler para detectar eventos salientes o anomalías entre ventanas consecutivas de tiempo. Dado que la señal producida por Bayesian Surprise posee ciertos inconvenientes introducimos una serie de mejoras, entre las que destacan una etapa de compresión de la amplitud de la señal de salida y el pre-procesado de la señal de entrada mediante la utilización de conocimiento perceptual. Denominamos a esta metodología Bayesian Log-surprise. Nuestro segundo algoritmo, denominado Echoic Log-surprise, combina la información de múltiples señales de saliencia producidas mediante Bayesian Log-surprise considerando distintas escalas temporales. El proceso de fusión se realiza mediante la utilización de divergencias estadísticas, y las señales de salida poseen un nivel de ruido menor a la par que un mayor rendimiento a la hora de detectar eventos salientes. Además, proponemos una serie de mejoras para Echoic Log-surprise dado que observamos que presentaba ciertas limitaciones: añadimos nuevas divergencias estadísticas al sistema para realizar la fusión, diseñamos una nueva estrategia para llevar a cabo dicho proceso y modificamos el sistema de umbralizado que originalmente se utilizaba para determinar si un fragmento de señal era saliente o no. Inicialmente dicho mecanismo era estático, y proponemos actualizarlo de tal forma se comporte de forma dinámica. Esta última demuestra ser la mejora más significativa en términos de rendimiento, ya que reduce la dispersión observada en las puntuaciones de evaluación entre distintos ficheros de audio, a la par que permite que el algoritmo funcione online. El último análisis que proponemos pretende estudiar la robustez de los algoritmos mencionados en esta tesis frente a ruido ambiental. Empleamos ruido de diversa índole, desde ruido blanco estacionario hasta señales pregrabadas en entornos reales tales y como cafeterías, estaciones de tren, etc. Los resultados sugieren que para distintos valores de relación señal/ruido el algoritmo más robusto es Echoic Log-surprise, dado que sus capacidades de detección son las menos afectadas por el ruido.Programa de Doctorado en Multimedia y Comunicaciones por la Universidad Carlos III de Madrid y la Universidad Rey Juan CarlosPresidente: Fernando Díaz de María.- Secretario: Rubén Solera Ureña.- Vocal: José Luis Pérez Córdob

    Application of generative models in speech processing tasks

    Get PDF
    Generative probabilistic and neural models of the speech signal are shown to be effective in speech synthesis and speech enhancement, where generating natural and clean speech is the goal. This thesis develops two probabilistic signal processing algorithms based on the source-filter model of speech production, and two based on neural generative models of the speech signal. They are a model-based speech enhancement algorithm with ad-hoc microphone array, called GRAB; a probabilistic generative model of speech called PAT; a neural generative F0 model called TEReTA; and a Bayesian enhancement network, call BaWN, that incorporates a neural generative model of speech, called WaveNet. PAT and TEReTA aim to develop better generative models for speech synthesis. BaWN and GRAB aim to improve the naturalness and noise robustness of speech enhancement algorithms. Probabilistic Acoustic Tube (PAT) is a probabilistic generative model for speech, whose basis is the source-filter model. The highlights of the model are threefold. First, it is among the very first works to build a complete probabilistic model for speech. Second, it has a well-designed model for the phase spectrum of speech, which has been hard to model and often neglected. Third, it models the AM-FM effects in speech, which are perceptually significant but often ignored in frame-based speech processing algorithms. Experiments show that the proposed model has good potential for a number of speech processing tasks. TEReTA generates pitch contours by incorporating a theoretical model of pitch planning, the piece-wise linear target approximation (TA) model, as the output layer of a deep recurrent neural network. It aims to model semantic variations in the F0 contour, which is challenging for existing network. By combining the TA model, TEReTA is able to memorize semantic context and capture the semantic variations. Experiments on contrastive focus verify TEReTA's ability in semantics modeling. BaWN is a neural network based algorithm for single-channel enhancement. The biggest challenges of the neural network based speech enhancement algorithm are the poor generalizability to unseen noises and unnaturalness of the output speech. By incorporating a neural generative model, WaveNet, in the Bayesian framework, where WaveNet predicts the prior for speech, and where a separate enhancement network incorporates the likelihood function, BaWN is able to achieve satisfactory generalizability and a good intelligibility score of its output, even when the noisy training set is small. GRAB is a beamforming algorithm for ad-hoc microphone arrays. The task of enhancing speech with ad-hoc microphone array is challenging because of the inaccuracy in position and interference calibration. Inspired by the source-filter model, GRAB does not rely on any position or interference calibration. Instead, it incorporates a source-filter speech model and minimizes the energy that cannot be accounted for by the model. Objective and subjective evaluations on both simulated and real-world data show that GRAB is able to suppress noise effectively while keeping the speech natural and dry. Final chapters discuss the implications of this work for future research in speech processing

    Reliable Linear, Sesquilinear and Bijective Operations On Integer Data Streams Via Numerical Entanglement

    Get PDF
    A new technique is proposed for fault-tolerant linear, sesquilinear and bijective (LSB) operations on MM integer data streams (M3M\geq3), such as: scaling, additions/subtractions, inner or outer vector products, permutations and convolutions. In the proposed method, the MM input integer data streams are linearly superimposed to form MM numerically-entangled integer data streams that are stored in-place of the original inputs. A series of LSB operations can then be performed directly using these entangled data streams. The results are extracted from the MM entangled output streams by additions and arithmetic shifts. Any soft errors affecting any single disentangled output stream are guaranteed to be detectable via a specific post-computation reliability check. In addition, when utilizing a separate processor core for each of the MM streams, the proposed approach can recover all outputs after any single fail-stop failure. Importantly, unlike algorithm-based fault tolerance (ABFT) methods, the number of operations required for the entanglement, extraction and validation of the results is linearly related to the number of the inputs and does not depend on the complexity of the performed LSB operations. We have validated our proposal in an Intel processor (Haswell architecture with AVX2 support) via fast Fourier transforms, circular convolutions, and matrix multiplication operations. Our analysis and experiments reveal that the proposed approach incurs between 0.03%0.03\% to 7%7\% reduction in processing throughput for a wide variety of LSB operations. This overhead is 5 to 1000 times smaller than that of the equivalent ABFT method that uses a checksum stream. Thus, our proposal can be used in fault-generating processor hardware or safety-critical applications, where high reliability is required without the cost of ABFT or modular redundancy.Comment: to appear in IEEE Trans. on Signal Processing, 201

    Traitement d'images de radiographie à faible dose : Débruitage et rehaussement de contraste conjoints et détection automatique de points de repère anatomiques pour l'estimation de la qualité des images

    Get PDF
    We aim at reducing the ALARA (As Low As Reasonably Achievable) dose limits for images acquired with EOS full-body system by means of image processing techniques. Two complementary approaches are studied. First, we define a post-processing method that optimizes the trade-off between acquired image quality and X-ray dose. The Non-Local means filter is extended to restore EOS images. We then study how to combine it with a multi-scale contrast enhancement technique. The image quality for the diagnosis is optimized by defining non-parametric noise containment maps that limit the increase of noise depending on the amount of local redundant information captured by the filter. Secondly, we estimate exposure index (EI) values on EOS images which give an immediate feedback on image quality to help radiographers to verify the correct exposure level of the X-ray examination. We propose a landmark detection based approach that is more robust to potential outliers than existing methods as it exploits the redundancy of local estimates. Finally, the proposed joint denoising and contrast enhancement technique significantly increases the image quality with respect to an algorithm used in clinical routine. Robust image quality indicators can be automatically associated with clinical EOS images. Given the consistency of the measures assessed on preview images, these indices could be used to drive an exposure management system in charge of defining the optimal radiation exposure.Nos travaux portent sur la réduction de la dose de rayonnement lors d'examens réalisés avec le Système de radiologie EOS. Deux approches complémentaires sont étudiées. Dans un premier temps, nous proposons une méthode de débruitage et de rehaussement de contraste conjoints pour optimiser le compromis entre la qualité des images et la dose de rayons X. Nous étendons le filtre à moyennes non locales pour restaurer les images EOS. Nous étudions ensuite comment combiner ce filtre à une méthode de rehaussement de contraste multi-échelles. La qualité des images cliniques est optimisée grâce à des fonctions limitant l'augmentation du bruit selon la quantité d’information locale redondante captée par le filtre. Dans un deuxième temps, nous estimons des indices d’exposition (EI) sur les images EOS afin de donner aux utilisateurs un retour immédiat sur la qualité de l'image acquise. Nous proposons ainsi une méthode reposant sur la détection de points de repère qui, grâce à l'exploitation de la redondance de mesures locales, est plus robuste à la présence de données aberrantes que les méthodes existantes. En conclusion, la méthode de débruitage et de rehaussement de contraste conjoints donne des meilleurs résultats que ceux obtenus par un algorithme exploité en routine clinique. La qualité des images EOS peut être quantifiée de manière robuste par des indices calculés automatiquement. Étant donnée la cohérence des mesures sur des images de pré-affichage, ces indices pourraient être utilisés en entrée d'un système de gestion automatique des expositions

    Precision-Energy-Throughput Scaling Of Generic Matrix Multiplication and Convolution Kernels Via Linear Projections

    Get PDF
    Generic matrix multiplication (GEMM) and one-dimensional convolution/cross-correlation (CONV) kernels often constitute the bulk of the compute- and memory-intensive processing within image/audio recognition and matching systems. We propose a novel method to scale the energy and processing throughput of GEMM and CONV kernels for such error-tolerant multimedia applications by adjusting the precision of computation. Our technique employs linear projections to the input matrix or signal data during the top-level GEMM and CONV blocking and reordering. The GEMM and CONV kernel processing then uses the projected inputs and the results are accumulated to form the final outputs. Throughput and energy scaling takes place by changing the number of projections computed by each kernel, which in turn produces approximate results, i.e. changes the precision of the performed computation. Results derived from a voltage- and frequency-scaled ARM Cortex A15 processor running face recognition and music matching algorithms demonstrate that the proposed approach allows for 280%~440% increase of processing throughput and 75%~80% decrease of energy consumption against optimized GEMM and CONV kernels without any impact in the obtained recognition or matching accuracy. Even higher gains can be obtained if one is willing to tolerate some reduction in the accuracy of the recognition and matching applications

    Research on performance enhancement for electromagnetic analysis and power analysis in cryptographic LSI

    Get PDF
    制度:新 ; 報告番号:甲3785号 ; 学位の種類:博士(工学) ; 授与年月日:2012/11/19 ; 早大学位記番号:新6161Waseda Universit
    corecore