30 research outputs found
Unveiling the frontiers of deep learning: innovations shaping diverse domains
Deep learning (DL) enables the development of computer models that are
capable of learning, visualizing, optimizing, refining, and predicting data. In
recent years, DL has been applied in a range of fields, including audio-visual
data processing, agriculture, transportation prediction, natural language,
biomedicine, disaster management, bioinformatics, drug design, genomics, face
recognition, and ecology. To explore the current state of deep learning, it is
necessary to investigate the latest developments and applications of deep
learning in these disciplines. However, the literature is lacking in exploring
the applications of deep learning in all potential sectors. This paper thus
extensively investigates the potential applications of deep learning across all
major fields of study as well as the associated benefits and challenges. As
evidenced in the literature, DL exhibits accuracy in prediction and analysis,
makes it a powerful computational tool, and has the ability to articulate
itself and optimize, making it effective in processing data with no prior
training. Given its independence from training data, deep learning necessitates
massive amounts of data for effective analysis and processing, much like data
volume. To handle the challenge of compiling huge amounts of medical,
scientific, healthcare, and environmental data for use in deep learning, gated
architectures like LSTMs and GRUs can be utilized. For multimodal learning,
shared neurons in the neural network for all activities and specialized neurons
for particular tasks are necessary.Comment: 64 pages, 3 figures, 3 table
Air Traffic Management Abbreviation Compendium
As in all fields of work, an unmanageable number of abbreviations are used today in aviation for terms, definitions, commands, standards and technical descriptions. This applies in general to the areas of aeronautical communication, navigation and surveillance, cockpit and air traffic control working positions, passenger and cargo transport, and all other areas of flight planning, organization and guidance. In addition, many abbreviations are used more than once or have different meanings in different languages.
In order to obtain an overview of the most common abbreviations used in air traffic management, organizations like EUROCONTROL, FAA, DWD and DLR have published lists of abbreviations in the past, which have also been enclosed in this document. In addition, abbreviations from some larger international projects related to aviation have been included to provide users with a directory as complete as possible. This means that the second edition of the Air Traffic Management Abbreviation Compendium includes now around 16,500 abbreviations and acronyms from the field of aviation
Network monitoring and performance assessment: from statistical models to neural networks
Máster en Investigación e Innovación en Tecnologías de la Información y las
ComunicacionesIn the last few years, computer networks have been playing a key role in many
different fields. Companies have also evolved around the internet, getting advantage of
the huge capacity of diffusion. Nevertheless, this also means that computer networks
and IT systems have become a critical element for the business. In case of interruption or
malfunction of the systems, this could result in devastating economic impact.
In this light, it is necessary to provide models to properly evaluate and characterize
the computer networks. Focusing on modeling, one has many different alternatives: from
classical options based on statistic to recent alternatives based on machine learning and
deep learning. In this work, we want to study the different models available for each
context, paying attention to the advantage and disadvantages to provide the best solution
for each case.
To cover the majority of the spectrum, three cases have been studied: time-unaware
phenomena, where we look at the bias-variance trade-off, time-dependent phenomena,
where we pay attention the trends of the time series, and text processing to process
attributes obtained by DPI. For each case, several alternatives have been studied and
solutions have been tested both with synthetic data and real-world data, showing the
successfulness of the proposa
Bayesian and echoic log-surprise for auditory saliency detection
Mención Internacional en el título de doctorAttention is defined as the mechanism that allows the brain to categorize
and prioritize information acquired using our senses and act according to
the environmental context and the available mental resources. The attention
mechanism can be further subdivided into two types: top-down and bottomup.
Top-down attention is goal or task-driven and implies that a participant
has some previous knowledge about the task that he or she is trying to solve.
Alternatively, bottom-up attention only depends on the perceived features
of the target object and its surroundings and is a very fast mechanism that
is believed to be crucial for human survival.
Bottom-up attention is commonly known as saliency or salience, and can
be defined as a property of the signals that are perceived by our senses that
make them attentionally prominent for some reason.
This thesis is related with the concept of saliency detection using automatic
algorithms for audio signals. In recent years progress in the area of
visual saliency research has been remarkable, a topic where the goal consists
of detecting which objects or content from a visual scene are prominent
enough to capture the attention of a spectator. However, this progress has
not been carried out to other alternative modalities. This is the case of auditory
saliency, where there is still no consensus about how to measure the
saliency of an event, and consequently there are no specific labeled datasets
to compare new algorithms and proposals.
In this work two new auditory saliency detection algorithms are presented
and evaluated. For their evaluation, we make use of Acoustic Event
Detection/Classification datasets, whose labels include onset times among
other aspects. We use such datasets and labeling since there is psychological
evidence suggesting that human beings are quite sensitive to the spontaneous
appearance of acoustic objects. We use three datasets: DCASE 2016
(Task 2), MIVIA road audio events and UPC-TALP, totalling 3400 labeled
acoustic events. Regarding the algorithms that we employ for benchmarking,
these comprise techniques for saliency detection designed by Kayser and
Kalinli, a voice activity detector, an energy thresholding method and four
music information retrieval onset detectors: NWPD, WPD, CD and SF.
We put forward two auditory saliency algorithms: Bayesian Log-surprise
and Echoic Log-surprise. The former is an evolution of Bayesian Surprise,
a methodology that by means of the Kullback-Leibler divergence computed
between two consecutive temporal windows is capable of detecting anomalous
or salient events. As the output Surprise signal has some drawbacks
that should be overcome, we introduce some improvements that led to the
approach that we named Bayesian Log-surprise. These include an amplitude
compression stage and the addition of perceptual knowledge to pre-process
the input signal.
The latter, named Echoic Log-surprise, fuses several Bayesian Log-surprise signals computed considering different memory lengths that represent different
temporal scales. The fusion process is performed using statistical
divergences, resulting in saliency signals with certain advantages such as a
significant reduction in the background noise level and a noticeable increase
in the detection scores.
Moreover, since the original Echoic Log-surprise presents certain limitations,
we propose a set of improvements: we test some alternative statistical
divergences, we introduce a new fusion strategy and we change the thresholding
mechanism used to determine if the final output signal is salient or
not for a dynamic thresholding algorithm. Results show that the most significant
modification in terms of performance is the latter, a proposal that
reduces the dispersion observed in the scores produced by the system and
enables online functioning.
Finally, our last analysis concerns the robustness of all the algorithms
presented in this thesis against environmental noise. We use noises of different
natures, from stationary noise to pre-recorded noises acquired in real
environments such as cafeterias, train stations, etc. The results suggest
that for different signal-to-noise ratios the most robust algorithm is Echoic
Log-surprise, since its detection capabilities are the least influenced by noise.La atención es definida como el mecanismo que permite a nuestro cerebro
categorizar y priorizar la información percibida mediante nuestros sentidos,
a la par que ayuda a actuar en función del contexto y los recursos mentales
disponibles. Este mecanismo puede dividirse en dos variantes: top-down y
bottom-up. La atención top-down posee un objetivo que el sujeto pretende
cumplir, e implica que el individuo posee cierto conocimiento previo sobre la
tarea que trata de realizar. Por otra parte, la atención bottom-up depende
exclusivamente de las características físicas percibidas a partir de un objeto
y su entorno, y actúa a partir de dicha información de forma autónoma y
rápida. Se teoriza que dicho mecanismo es crucial para la supervivencia de
los individuos frente a amenazas repentinas.
La atención bottom-up es comúnmente denominada saliencia, y es definida
como una propiedad de las señales que son percibidas por nuestros sentidos
y que por algún motivo destacan sobre el resto de información adquirida.
Esta tesis está relacionada con la detección automática de la saliencia en
señales acústicas mediante la utilización de algoritmos. En los últimos años
el avance en la investigación de la saliencia visual ha sido notable, un tema
en el cual la principal meta consiste en detectar qué objetos o contenido
de una escena visual son lo bastante prominentes para captar la atención
de un espectador. Sin embargo, estos avances no han sido trasladados a
otras modalidades. Tal es el caso de la saliencia auditiva, donde aún no
existe consenso sobre cómo medir la prominencia de un evento acústico,
y en consecuencia no existen bases de datos especializadas que permitan
comparar nuevos algoritmos y modelos.
En este trabajo evaluamos algunos algoritmos de detección de saliencia
auditiva. Para ello, empleamos bases de datos para la detección y clasificación
de eventos acústicos, cuyas etiquetas incluyen el tiempo de inicio
(onset) de dichos eventos entre otras características. Nuestra hipótesis se
basa en estudios psicológicos que sugieren que los seres humanos somos muy
sensibles a la aparición de objetos acústicos. Empleamos tres bases de datos:
DCASE 2016 (Task 2), MIVIA road audio events y UPC-TALP, las cuales
suman en total 3400 eventos etiquetados. Respecto a los algoritmos utilizados
en nuestro sistema de referencia (benchmark), incluimos los algoritmos
de saliencia diseñados por Kayser y Kalinli, un detector de actividad vocal
(VAD), un umbralizador energético y cuatro técnicas para la detección de
onsets en música: NWPD, WPD, CD and SF.
Presentamos dos algoritmos de saliencia auditiva: Bayesian Log-surprise
y Echoic Log-surprise. El primero es una evolución de Bayesian Surprise,
una metodología que utiliza la divergencia de Kullback-Leibler para detectar
eventos salientes o anomalías entre ventanas consecutivas de tiempo. Dado
que la señal producida por Bayesian Surprise posee ciertos inconvenientes
introducimos una serie de mejoras, entre las que destacan una etapa de compresión de la amplitud de la señal de salida y el pre-procesado de la señal de
entrada mediante la utilización de conocimiento perceptual. Denominamos
a esta metodología Bayesian Log-surprise.
Nuestro segundo algoritmo, denominado Echoic Log-surprise, combina la
información de múltiples señales de saliencia producidas mediante Bayesian
Log-surprise considerando distintas escalas temporales. El proceso de fusión
se realiza mediante la utilización de divergencias estadísticas, y las señales
de salida poseen un nivel de ruido menor a la par que un mayor rendimiento
a la hora de detectar eventos salientes.
Además, proponemos una serie de mejoras para Echoic Log-surprise
dado que observamos que presentaba ciertas limitaciones: añadimos nuevas
divergencias estadísticas al sistema para realizar la fusión, diseñamos una
nueva estrategia para llevar a cabo dicho proceso y modificamos el sistema de
umbralizado que originalmente se utilizaba para determinar si un fragmento
de señal era saliente o no. Inicialmente dicho mecanismo era estático, y
proponemos actualizarlo de tal forma se comporte de forma dinámica. Esta
última demuestra ser la mejora más significativa en términos de rendimiento,
ya que reduce la dispersión observada en las puntuaciones de evaluación entre
distintos ficheros de audio, a la par que permite que el algoritmo funcione
online.
El último análisis que proponemos pretende estudiar la robustez de los
algoritmos mencionados en esta tesis frente a ruido ambiental. Empleamos
ruido de diversa índole, desde ruido blanco estacionario hasta señales pregrabadas
en entornos reales tales y como cafeterías, estaciones de tren, etc.
Los resultados sugieren que para distintos valores de relación señal/ruido el
algoritmo más robusto es Echoic Log-surprise, dado que sus capacidades de
detección son las menos afectadas por el ruido.Programa de Doctorado en Multimedia y Comunicaciones por la Universidad Carlos III de Madrid y la Universidad Rey Juan CarlosPresidente: Fernando Díaz de María.- Secretario: Rubén Solera Ureña.- Vocal: José Luis Pérez Córdob
Application of generative models in speech processing tasks
Generative probabilistic and neural models of the speech signal are shown to be effective in speech synthesis and speech enhancement, where generating natural and clean speech is the goal. This thesis develops two probabilistic signal processing algorithms based on the source-filter model of speech production, and two based on neural generative models of the speech signal. They are a model-based speech enhancement algorithm with ad-hoc microphone array, called GRAB; a probabilistic generative model of speech called PAT; a neural generative F0 model called TEReTA; and a Bayesian enhancement network, call BaWN, that incorporates a neural generative model of speech, called WaveNet. PAT and TEReTA aim to develop better generative models for speech synthesis. BaWN and GRAB aim to improve the naturalness and noise robustness of speech enhancement algorithms.
Probabilistic Acoustic Tube (PAT) is a probabilistic generative model for speech, whose basis is the source-filter model. The highlights of the model are threefold. First, it is among the very first works to build a complete probabilistic model for speech. Second, it has a well-designed model for the phase spectrum of speech, which has been hard to model and often neglected. Third, it models the AM-FM effects in speech, which are perceptually significant but often ignored in frame-based speech processing algorithms. Experiments show that the proposed model has good potential for a number of speech processing tasks.
TEReTA generates pitch contours by incorporating a theoretical model of pitch planning, the piece-wise linear target approximation (TA) model, as the output layer of a deep recurrent neural network. It aims to model semantic variations in the F0 contour, which is challenging for existing network. By combining the TA model, TEReTA is able to memorize semantic context and capture the semantic variations. Experiments on contrastive focus verify TEReTA's ability in semantics modeling.
BaWN is a neural network based algorithm for single-channel enhancement. The biggest challenges of the neural network based speech enhancement algorithm are the poor generalizability to unseen noises and unnaturalness of the output speech. By incorporating a neural generative model, WaveNet, in the Bayesian framework, where WaveNet predicts the prior for speech, and where a separate enhancement network incorporates the likelihood function, BaWN is able to achieve satisfactory generalizability and a good intelligibility score of its output, even when the noisy training set is small.
GRAB is a beamforming algorithm for ad-hoc microphone arrays. The task of enhancing speech with ad-hoc microphone array is challenging because of the inaccuracy in position and interference calibration. Inspired by the source-filter model, GRAB does not rely on any position or interference calibration. Instead, it incorporates a source-filter speech model and minimizes the energy that cannot be accounted for by the model. Objective and subjective evaluations on both simulated and real-world data show that GRAB is able to suppress noise effectively while keeping the speech natural and dry.
Final chapters discuss the implications of this work for future research in speech processing
Reliable Linear, Sesquilinear and Bijective Operations On Integer Data Streams Via Numerical Entanglement
A new technique is proposed for fault-tolerant linear, sesquilinear and
bijective (LSB) operations on integer data streams (), such as:
scaling, additions/subtractions, inner or outer vector products, permutations
and convolutions. In the proposed method, the input integer data streams
are linearly superimposed to form numerically-entangled integer data
streams that are stored in-place of the original inputs. A series of LSB
operations can then be performed directly using these entangled data streams.
The results are extracted from the entangled output streams by additions
and arithmetic shifts. Any soft errors affecting any single disentangled output
stream are guaranteed to be detectable via a specific post-computation
reliability check. In addition, when utilizing a separate processor core for
each of the streams, the proposed approach can recover all outputs after
any single fail-stop failure. Importantly, unlike algorithm-based fault
tolerance (ABFT) methods, the number of operations required for the
entanglement, extraction and validation of the results is linearly related to
the number of the inputs and does not depend on the complexity of the performed
LSB operations. We have validated our proposal in an Intel processor (Haswell
architecture with AVX2 support) via fast Fourier transforms, circular
convolutions, and matrix multiplication operations. Our analysis and
experiments reveal that the proposed approach incurs between to
reduction in processing throughput for a wide variety of LSB operations. This
overhead is 5 to 1000 times smaller than that of the equivalent ABFT method
that uses a checksum stream. Thus, our proposal can be used in fault-generating
processor hardware or safety-critical applications, where high reliability is
required without the cost of ABFT or modular redundancy.Comment: to appear in IEEE Trans. on Signal Processing, 201
Traitement d'images de radiographie à faible dose : Débruitage et rehaussement de contraste conjoints et détection automatique de points de repère anatomiques pour l'estimation de la qualité des images
We aim at reducing the ALARA (As Low As Reasonably Achievable) dose limits for images acquired with EOS full-body system by means of image processing techniques. Two complementary approaches are studied. First, we define a post-processing method that optimizes the trade-off between acquired image quality and X-ray dose. The Non-Local means filter is extended to restore EOS images. We then study how to combine it with a multi-scale contrast enhancement technique. The image quality for the diagnosis is optimized by defining non-parametric noise containment maps that limit the increase of noise depending on the amount of local redundant information captured by the filter. Secondly, we estimate exposure index (EI) values on EOS images which give an immediate feedback on image quality to help radiographers to verify the correct exposure level of the X-ray examination. We propose a landmark detection based approach that is more robust to potential outliers than existing methods as it exploits the redundancy of local estimates. Finally, the proposed joint denoising and contrast enhancement technique significantly increases the image quality with respect to an algorithm used in clinical routine. Robust image quality indicators can be automatically associated with clinical EOS images. Given the consistency of the measures assessed on preview images, these indices could be used to drive an exposure management system in charge of defining the optimal radiation exposure.Nos travaux portent sur la réduction de la dose de rayonnement lors d'examens réalisés avec le Système de radiologie EOS. Deux approches complémentaires sont étudiées. Dans un premier temps, nous proposons une méthode de débruitage et de rehaussement de contraste conjoints pour optimiser le compromis entre la qualité des images et la dose de rayons X. Nous étendons le filtre à moyennes non locales pour restaurer les images EOS. Nous étudions ensuite comment combiner ce filtre à une méthode de rehaussement de contraste multi-échelles. La qualité des images cliniques est optimisée grâce à des fonctions limitant l'augmentation du bruit selon la quantité d’information locale redondante captée par le filtre. Dans un deuxième temps, nous estimons des indices d’exposition (EI) sur les images EOS afin de donner aux utilisateurs un retour immédiat sur la qualité de l'image acquise. Nous proposons ainsi une méthode reposant sur la détection de points de repère qui, grâce à l'exploitation de la redondance de mesures locales, est plus robuste à la présence de données aberrantes que les méthodes existantes. En conclusion, la méthode de débruitage et de rehaussement de contraste conjoints donne des meilleurs résultats que ceux obtenus par un algorithme exploité en routine clinique. La qualité des images EOS peut être quantifiée de manière robuste par des indices calculés automatiquement. Étant donnée la cohérence des mesures sur des images de pré-affichage, ces indices pourraient être utilisés en entrée d'un système de gestion automatique des expositions
Precision-Energy-Throughput Scaling Of Generic Matrix Multiplication and Convolution Kernels Via Linear Projections
Generic matrix multiplication (GEMM) and one-dimensional
convolution/cross-correlation (CONV) kernels often constitute the bulk of the
compute- and memory-intensive processing within image/audio recognition and
matching systems. We propose a novel method to scale the energy and processing
throughput of GEMM and CONV kernels for such error-tolerant multimedia
applications by adjusting the precision of computation. Our technique employs
linear projections to the input matrix or signal data during the top-level GEMM
and CONV blocking and reordering. The GEMM and CONV kernel processing then uses
the projected inputs and the results are accumulated to form the final outputs.
Throughput and energy scaling takes place by changing the number of projections
computed by each kernel, which in turn produces approximate results, i.e.
changes the precision of the performed computation. Results derived from a
voltage- and frequency-scaled ARM Cortex A15 processor running face recognition
and music matching algorithms demonstrate that the proposed approach allows for
280%~440% increase of processing throughput and 75%~80% decrease of energy
consumption against optimized GEMM and CONV kernels without any impact in the
obtained recognition or matching accuracy. Even higher gains can be obtained if
one is willing to tolerate some reduction in the accuracy of the recognition
and matching applications
Research on performance enhancement for electromagnetic analysis and power analysis in cryptographic LSI
制度:新 ; 報告番号:甲3785号 ; 学位の種類:博士(工学) ; 授与年月日:2012/11/19 ; 早大学位記番号:新6161Waseda Universit