513 research outputs found
Implementation of the Discrete Wavelet Transform Used in the Calibration of the Enzymatic Biosensors
A Survey on Deep Learning in Medical Image Analysis
Deep learning algorithms, in particular convolutional networks, have rapidly
become a methodology of choice for analyzing medical images. This paper reviews
the major deep learning concepts pertinent to medical image analysis and
summarizes over 300 contributions to the field, most of which appeared in the
last year. We survey the use of deep learning for image classification, object
detection, segmentation, registration, and other tasks and provide concise
overviews of studies per application area. Open challenges and directions for
future research are discussed.Comment: Revised survey includes expanded discussion section and reworked
introductory section on common deep architectures. Added missed papers from
before Feb 1st 201
Recommended from our members
A novel framework for high-quality voice source analysis and synthesis
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.The analysis, parameterization and modeling of voice source estimates obtained via inverse filtering of recorded speech are some of the most challenging areas of speech processing owing to the fact humans produce a wide range of voice source realizations and that the voice source estimates commonly contain artifacts due to the non-linear time-varying source-filter coupling. Currently, the most widely adopted representation of voice source signal is Liljencrants-Fant's (LF) model which was developed in late 1985. Due to the overly simplistic interpretation of voice source dynamics, LF model can not represent the fine temporal structure of glottal flow derivative realizations nor can it carry the sufficient spectral richness to facilitate a truly natural sounding speech synthesis. In this thesis we have introduced Characteristic Glottal Pulse Waveform Parameterization and Modeling (CGPWPM) which constitutes an entirely novel framework for voice source analysis, parameterization and reconstruction. In comparative evaluation of CGPWPM and LF model we have demonstrated that the proposed method is able to preserve higher levels of speaker dependant information from the voice source estimates and realize a more natural sounding speech synthesis. In general, we have shown that CGPWPM-based speech synthesis rates highly on the scale of absolute perceptual acceptability and that speech signals are faithfully reconstructed on consistent basis, across speakers, gender. We have applied CGPWPM to voice quality profiling and text-independent voice quality conversion method. The proposed voice conversion method is able to achieve the desired perceptual effects and the modified
speech remained as natural sounding and intelligible as natural speech. In this thesis, we have also developed an optimal wavelet thresholding strategy for voice source signals which is able to suppress aspiration noise and still retain both the slow and the rapid variations in the voice source estimate
Patient-Specific Method of Generating Parametric Maps of Patlak K(i) without Blood Sampling or Metabolite Correction: A Feasibility Study.
Currently, kinetic analyses using dynamic positron emission tomography (PET) experience very limited use despite their potential for improving quantitative accuracy in several clinical and research applications. For targeted volume applications, such as radiation treatment planning, treatment monitoring, and cerebral metabolic studies, the key to implementation of these methods is the determination of an arterial input function, which can include time-consuming analysis of blood samples for metabolite correction. Targeted kinetic applications would become practical for the clinic if blood sampling and metabolite correction could be avoided. To this end, we developed a novel method (Patlak-P) of generating parametric maps that is identical to Patlak K(i) (within a global scalar multiple) but does not require the determination of the arterial input function or metabolite correction. In this initial study, we show that Patlak-P (a) mimics Patlak K(i) images in terms of visual assessment and target-to-background (TB) ratios of regions of elevated uptake, (b) has higher visual contrast and (generally) better image quality than SUV, and (c) may have an important role in improving radiotherapy planning, therapy monitoring, and neurometabolism studies
Registration and statistical analysis of the tongue shape during speech production
This thesis analyzes the human tongue shape during speech production. First, a semi-supervised approach is derived for estimating the tongue shape from volumetric magnetic resonance imaging data of the human vocal tract. Results of this extraction are used to derive parametric tongue models. Next, a framework is presented for registering sparse motion capture data of the tongue by means of such a model. This method allows to generate full three-dimensional animations of the tongue. Finally, a multimodal and statistical text-to-speech system is developed that is able to synthesize audio and synchronized tongue motion from text.Diese Dissertation beschäftigt sich mit der Analyse der menschlichen Zungenform während der Sprachproduktion. Zunächst wird ein semi-überwachtes Verfahren vorgestellt, mit dessen Hilfe sich Zungenformen von volumetrischen Magnetresonanztomographie- Aufnahmen des menschlichen Vokaltrakts schätzen lassen. Die Ergebnisse dieses Extraktionsverfahrens werden genutzt, um ein parametrisches Zungenmodell zu konstruieren. Danach wird eine Methode hergeleitet, die ein solches Modell nutzt, um spärliche Bewegungsaufnahmen der Zunge zu registrieren. Dieser Ansatz erlaubt es, dreidimensionale Animationen der Zunge zu erstellen. Zuletzt wird ein multimodales und statistisches Text-to-Speech-System entwickelt, das in der Lage ist, Audio und die dazu synchrone Zungenbewegung zu synthetisieren.German Research Foundatio
Time-domain speech enhancement using generative adversarial networks
Speech enhancement improves recorded voice utterances to eliminate noise that might be impeding their intelligibility or compromising their quality. Typical speech enhancement systems are based on regression approaches that subtract noise or predict clean signals. Most of them do not operate directly on waveforms. In this work, we propose a generative approach to regenerate corrupted signals into a clean version by using generative adversarial networks on the raw signal. We also explore several variations of the proposed system, obtaining insights into proper architectural choices for an adversarially trained, convolutional autoencoder applied to speech. We conduct both objective and subjective evaluations to assess the performance of the proposed method. The former helps us choose among variations and better tune hyperparameters, while the latter is used in a listening experiment with 42 subjects, confirming the effectiveness of the approach in the real world. We also demonstrate the applicability of the approach for more generalized speech enhancement, where we have to regenerate voices from whispered signals.Peer ReviewedPostprint (author's final draft
Intelligent Grimm -- Open-ended Visual Storytelling via Latent Diffusion Models
Generative models have recently exhibited exceptional capabilities in
text-to-image generation, but still struggle to generate image sequences
coherently. In this work, we focus on a novel, yet challenging task of
generating a coherent image sequence based on a given storyline, denoted as
open-ended visual storytelling. We make the following three contributions: (i)
to fulfill the task of visual storytelling, we propose a learning-based
auto-regressive image generation model, termed as StoryGen, with a novel
vision-language context module, that enables to generate the current frame by
conditioning on the corresponding text prompt and preceding image-caption
pairs; (ii) to address the data shortage of visual storytelling, we collect
paired image-text sequences by sourcing from online videos and open-source
E-books, establishing processing pipeline for constructing a large-scale
dataset with diverse characters, storylines, and artistic styles, named
StorySalon; (iii) Quantitative experiments and human evaluations have validated
the superiority of our StoryGen, where we show StoryGen can generalize to
unseen characters without any optimization, and generate image sequences with
coherent content and consistent character. Code, dataset, and models are
available at https://haoningwu3639.github.io/StoryGen_Webpage/Comment: Accepted by CVPR 2024. Project Page:
https://haoningwu3639.github.io/StoryGen_Webpage
Integrated Segmentation and Interpolation of Sparse Data
International audienceWe address the two inherently related problems of segmentation and interpolation of 3D and 4D sparse data and propose a new method to integrate these stages in a level set framework. The interpolation process uses segmentation information rather than pixel intensities for increased robustness and accuracy. The method supports any spatial configurations of sets of 2D slices having arbitrary positions and orientations. We achieve this by introducing a new level set scheme based on the interpolation of the level set function by radial basis functions. The proposed method is validated quantitatively and/or subjectively on artificial data and MRI and CT scans, and is compared against the traditional sequential approach which interpolates the images first, using a state-of-the-art image interpolation method, and then segments the interpolated volume in 3D or 4D. In our experiments, the proposed framework yielded similar segmentation results to the sequential approach, but provided a more robust and accurate interpolation. In particular, the interpolation was more satisfactory in cases of large gaps, due to the method taking into account the global shape of the object, and it recovered better topologies at the extremities of the shapes where the objects disappear from the image slices. As a result, the complete integrated framework provided more satisfactory shape reconstructions than the sequential approach
- …