20 research outputs found
Watch your Up-Convolution: CNN Based Generative Deep Neural Networks are Failing to Reproduce Spectral Distributions
Generative convolutional deep neural networks, e.g. popular GAN
architectures, are relying on convolution based up-sampling methods to produce
non-scalar outputs like images or video sequences. In this paper, we show that
common up-sampling methods, i.e. known as up-convolution or transposed
convolution, are causing the inability of such models to reproduce spectral
distributions of natural training data correctly. This effect is independent of
the underlying architecture and we show that it can be used to easily detect
generated data like deepfakes with up to 100% accuracy on public benchmarks.
To overcome this drawback of current generative models, we propose to add a
novel spectral regularization term to the training optimization objective. We
show that this approach not only allows to train spectral consistent GANs that
are avoiding high frequency errors. Also, we show that a correct approximation
of the frequency spectrum has positive effects on the training stability and
output quality of generative networks
Music-STAR: a Style Translation system for Audio-based Rearrangement
Music style translation has recently gained attention among music processing
studies. It aims to generate variations of existing music pieces by altering the style-variant characteristics of the original music piece, while content such as the melody
remains unchanged. These alterations could involve timbre translation, reharmonization,
or music rearrangement.
In this thesis, we plan to address music rearrangement, focusing on instrumentation, by processing waveforms of two-instrument pieces. Previous studies have achieved promising results utilizing time-frequency and symbolic music representations. Music translation on raw audio has also been investigated using single-instrument pieces. Although processing raw audio is more challenging, it embodies more detailed information about the performance, timbre, and dynamics of a music piece. To this end, we introduce Music-STAR, the first audio-based model that can transform the instruments of a multi-track piece into another set of instruments, resulting in a rearranged piece
Efficient bandwidth extension of musical signals using a differentiable harmonic plus noise model
The task of bandwidth extension addresses the generation of missing high
frequencies of audio signals based on knowledge of the low-frequency part of
the sound. This task applies to various problems, such as audio coding or audio
restoration. In this article, we focus on efficient bandwidth extension of
monophonic and polyphonic musical signals using a differentiable digital signal
processing (DDSP) model. Such a model is composed of a neural network part with
relatively few parameters trained to infer the parameters of a differentiable
digital signal processing model, which efficiently generates the output
full-band audio signal.
We first address bandwidth extension of monophonic signals, and then propose
two methods to explicitely handle polyphonic signals. The benefits of the
proposed models are first demonstrated on monophonic and polyphonic synthetic
data against a baseline and a deep-learning-based resnet model. The models are
next evaluated on recorded monophonic and polyphonic data, for a wide variety
of instruments and musical genres. We show that all proposed models surpass a
higher complexity deep learning model for an objective metric computed in the
frequency domain. A MUSHRA listening test confirms the superiority of the
proposed approach in terms of perceptual quality.Comment: Accepting for publication in EURASIP Journal on Audio, Speech, and
Music Processin
Banknote Authentication and Medical Image Diagnosis Using Feature Descriptors and Deep Learning Methods
Banknote recognition and medical image analysis have been the foci of image processing and pattern recognition research. As counterfeiters have taken advantage of the innovation in print media technologies for reproducing fake monies, hence the need to design systems which can reassure and protect citizens of the authenticity of banknotes in circulation. Similarly, many physicians must interpret medical images. But image analysis by humans is susceptible to error due to wide variations across interpreters, lethargy, and human subjectivity. Computer-aided diagnosis is vital to improvements in medical analysis, as they facilitate the identification of findings that need treatment and assist the expert’s workflow. Thus, this thesis is organized around three such problems related to Banknote Authentication and Medical Image Diagnosis. In our first research problem, we proposed a new banknote recognition approach that classifies the principal components of extracted HOG features. We further experimented on computing HOG descriptors from cells created from image patch vertices of SURF points and designed a feature reduction approach based on a high correlation and low variance filter. In our second research problem, we developed a mobile app for banknote identification and counterfeit detection using the Unity 3D software and evaluated its performance based on a Cascaded Ensemble approach. The algorithm was then extended to a client-server architecture using SIFT and SURF features reduced by Bag of Words and high correlation-based HOG vectors. In our third research problem, experiments were conducted on a pre-trained mobile app for medical image diagnosis using three convolutional layers with an Ensemble Classifier comprising PCA and bagging of five base learners. Also, we implemented a Bidirectional Generative Adversarial Network to mitigate the effect of the Binary Cross Entropy loss based on a Deep Convolutional Generative Adversarial Network as the generator and encoder with Capsule Network as the discriminator while experimenting on images with random composition and translation inferences. Lastly, we proposed a variant of the Single Image Super-resolution for medical analysis by redesigning the Super Resolution Generative Adversarial Network to increase the Peak Signal to Noise Ratio during image reconstruction by incorporating a loss function based on the mean square error of pixel space and Super Resolution Convolutional Neural Network layers
Recuperação de informação multimodal em repositórios de imagem médica
The proliferation of digital medical imaging modalities in hospitals and other
diagnostic facilities has created huge repositories of valuable data, often
not fully explored. Moreover, the past few years show a growing trend
of data production. As such, studying new ways to index, process and
retrieve medical images becomes an important subject to be addressed by
the wider community of radiologists, scientists and engineers. Content-based
image retrieval, which encompasses various methods, can exploit the visual
information of a medical imaging archive, and is known to be beneficial to
practitioners and researchers. However, the integration of the latest systems
for medical image retrieval into clinical workflows is still rare, and their
effectiveness still show room for improvement.
This thesis proposes solutions and methods for multimodal information
retrieval, in the context of medical imaging repositories. The major
contributions are a search engine for medical imaging studies supporting
multimodal queries in an extensible archive; a framework for automated
labeling of medical images for content discovery; and an assessment and
proposal of feature learning techniques for concept detection from medical
images, exhibiting greater potential than feature extraction algorithms that
were pertinently used in similar tasks. These contributions, each in their
own dimension, seek to narrow the scientific and technical gap towards
the development and adoption of novel multimodal medical image retrieval
systems, to ultimately become part of the workflows of medical practitioners,
teachers, and researchers in healthcare.A proliferação de modalidades de imagem médica digital, em hospitais,
clínicas e outros centros de diagnóstico, levou à criação de enormes
repositórios de dados, frequentemente não explorados na sua totalidade.
Além disso, os últimos anos revelam, claramente, uma tendência para o
crescimento da produção de dados. Portanto, torna-se importante estudar
novas maneiras de indexar, processar e recuperar imagens médicas, por
parte da comunidade alargada de radiologistas, cientistas e engenheiros. A
recuperação de imagens baseada em conteúdo, que envolve uma grande
variedade de métodos, permite a exploração da informação visual num
arquivo de imagem médica, o que traz benefícios para os médicos e
investigadores. Contudo, a integração destas soluções nos fluxos de trabalho
é ainda rara e a eficácia dos mais recentes sistemas de recuperação de
imagem médica pode ser melhorada.
A presente tese propõe soluções e métodos para recuperação de informação
multimodal, no contexto de repositórios de imagem médica. As contribuições
principais são as seguintes: um motor de pesquisa para estudos de imagem
médica com suporte a pesquisas multimodais num arquivo extensível; uma
estrutura para a anotação automática de imagens; e uma avaliação e
proposta de técnicas de representation learning para deteção automática de
conceitos em imagens médicas, exibindo maior potencial do que as técnicas
de extração de features visuais outrora pertinentes em tarefas semelhantes.
Estas contribuições procuram reduzir as dificuldades técnicas e científicas
para o desenvolvimento e adoção de sistemas modernos de recuperação de
imagem médica multimodal, de modo a que estes façam finalmente parte
das ferramentas típicas dos profissionais, professores e investigadores da área
da saúde.Programa Doutoral em Informátic
How good is good enough? Strategies for dealing with unreliable segmentation annotations of medical data
Medical image segmentation is an essential topic in computer vision and medical image analysis, because it enables the precise and accurate segmentation of organs and lesions for healthcare applications. Deep learning has dominated in medical image segmentation due to increasingly powerful computational resources, successful neural network architecture engineering, and access to large amounts of medical imaging data with high-quality annotations. However, annotating medical imaging data is time-consuming and expensive, and sometimes the annotations are unreliable.
This DPhil thesis presents a comprehensive study that explores deep learning techniques in medical image segmentation under various challenging situations of unreliable medical imaging data. These situations include: (1) conventional supervised learning to tackle comprehensive data annotation with full dense masks, (2) semi-supervised learning to tackle partial data annotation with full dense masks, (3) noise-robust learning to tackle comprehensive data annotation with noisy dense masks, and (4) weakly-supervised learning to tackle comprehensive data annotation with sketchy contours for network training.
The proposed medical image segmentation strategies improve deep learning techniques to effectively address a series of challenges in medical image analysis, including limited annotated data, noisy annotations, and sparse annotations. These advancements aim to bring deep learning techniques of medical image analysis into practical clinical scenarios. By overcoming these challenges, the strategies establish a more robust and reliable application of deep learning methods which is valuable for improving diagnostic precision and patient care outcomes in real-world clinical environments
WiFi-Based Human Activity Recognition Using Attention-Based BiLSTM
Recently, significant efforts have been made to explore human activity recognition (HAR) techniques that use information gathered by existing indoor wireless infrastructures through WiFi signals without demanding the monitored subject to carry a dedicated device. The key intuition is that different activities introduce different multi-paths in WiFi signals and generate different patterns in the time series of channel state information (CSI). In this paper, we propose and evaluate a full pipeline for a CSI-based human activity recognition framework for 12 activities in three different spatial environments using two deep learning models: ABiLSTM and CNN-ABiLSTM. Evaluation experiments have demonstrated that the proposed models outperform state-of-the-art models. Also, the experiments show that the proposed models can be applied to other environments with different configurations, albeit with some caveats. The proposed ABiLSTM model achieves an overall accuracy of 94.03%, 91.96%, and 92.59% across the 3 target environments. While the proposed CNN-ABiLSTM model reaches an accuracy of 98.54%, 94.25% and 95.09% across those same environments
BIOLOGICALLY-INFORMED COMPUTATIONAL MODELS OF HARMONIC SOUND DETECTION AND IDENTIFICATION
Harmonic sounds or harmonic components of sounds are often fused into a single percept by the auditory system. Although the exact neural mechanisms for harmonic sensitivity remain unclear, it arises presumably in the auditory cortex because subcortical neurons typically prefer only a single frequency. Pitch sensitive units and harmonic template units found in awake marmoset auditory cortex are sensitive to temporal and spectral periodicity, respectively. This thesis is a study of possible computational mechanisms underlying cortical harmonic selectivity.
To examine whether harmonic selectivity is related to statistical regularities of natural sounds, simulated auditory nerve responses to natural sounds were used in principal component analysis in comparison with independent component analysis, which yielded harmonic-sensitive model units with similar population distribution as real cortical neurons in terms of harmonic selectivity metrics. This result suggests that the variability of cortical harmonic selectivity may provide an efficient population representation of natural sounds.
Several network models of spectral selectivity mechanisms are investigated. As a side study, adding synaptic depletion to an integrate-and-fire model could explain the observed modulation-sensitive units, which are related to pitch-sensitive units but cannot account for precise temporal regularity. When a feed-forward network is trained to detect harmonics, the result is always a sieve, which is excited by integer multiples of the fundamental frequency and inhibited by half-integer multiples. The sieve persists over a wide variety of conditions including changing evaluation criteria, incorporating Dale’s principle, and adding a hidden layer. A recurrent network trained by Hebbian learning produces harmonic-selective by a novel dynamical mechanism that could be explained by a Lyapunov function which favors inputs that match the learned frequency correlations. These model neurons have sieve-like weights like the harmonic template units when probed by random harmonic stimuli, despite there being no sieve pattern anywhere in the network’s weights.
Online stimulus design has the potential to facilitate future experiments on nonlinear sensory neurons. We accelerated the sound-from-texture algorithm to enable online adaptive experimental design to maximize the activities of sparsely responding cortical units. We calculated the optimal stimuli for harmonic-selective units and investigated model-based information-theoretic method for stimulus optimization