Search CORE

513 research outputs found

Implementation of the Discrete Wavelet Transform Used in the Calibration of the Enzymatic Biosensors

Author: Alonso Gustavo A.
Gutiérrez Juan Manuel
Marty Jean-Louis
Muñoz Roberto
Publication venue: 'IntechOpen'
Publication date: 12/09/2011
Field of study

IntechOpen

Crossref

A Survey on Deep Learning in Medical Image Analysis

Author: Bejnordi Babak Ehteshami
Ciompi Francesco
Ghafoorian Mohsen
Kooi Thijs
Litjens Geert
Setio Arnaud Arindra Adiyoso
Sánchez Clara I.
van der Laak Jeroen A. W. M.
van Ginneken Bram
Publication venue: 'Elsevier BV'
Publication date: 01/01/2017
Field of study

Deep learning algorithms, in particular convolutional networks, have rapidly become a methodology of choice for analyzing medical images. This paper reviews the major deep learning concepts pertinent to medical image analysis and summarizes over 300 contributions to the field, most of which appeared in the last year. We survey the use of deep learning for image classification, object detection, segmentation, registration, and other tasks and provide concise overviews of studies per application area. Open challenges and directions for future research are discussed.Comment: Revised survey includes expanded discussion section and reworked introductory section on common deep architectures. Added missed papers from before Feb 1st 201

arXiv.org e-Print Archive

Radboud Repository

Computational Analysis of Magnetic Resonance Images of the Upper Airways: Algorithms and Applications

Author: Jessica Condesso Delmoral
Publication venue
Publication date: 12/10/2015
Field of study

Repositório Aberto da Universidade do Porto

Recommended from our members

A novel framework for high-quality voice source analysis and synthesis

Author: Turajlic Emir
Publication venue: Brunel University School of Engineering and Design PhD Theses
Publication date: 01/01/2006
Field of study

This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.The analysis, parameterization and modeling of voice source estimates obtained via inverse filtering of recorded speech are some of the most challenging areas of speech processing owing to the fact humans produce a wide range of voice source realizations and that the voice source estimates commonly contain artifacts due to the non-linear time-varying source-filter coupling. Currently, the most widely adopted representation of voice source signal is Liljencrants-Fant's (LF) model which was developed in late 1985. Due to the overly simplistic interpretation of voice source dynamics, LF model can not represent the fine temporal structure of glottal flow derivative realizations nor can it carry the sufficient spectral richness to facilitate a truly natural sounding speech synthesis. In this thesis we have introduced Characteristic Glottal Pulse Waveform Parameterization and Modeling (CGPWPM) which constitutes an entirely novel framework for voice source analysis, parameterization and reconstruction. In comparative evaluation of CGPWPM and LF model we have demonstrated that the proposed method is able to preserve higher levels of speaker dependant information from the voice source estimates and realize a more natural sounding speech synthesis. In general, we have shown that CGPWPM-based speech synthesis rates highly on the scale of absolute perceptual acceptability and that speech signals are faithfully reconstructed on consistent basis, across speakers, gender. We have applied CGPWPM to voice quality profiling and text-independent voice quality conversion method. The proposed voice conversion method is able to achieve the desired perceptual effects and the modified speech remained as natural sounding and intelligible as natural speech. In this thesis, we have also developed an optimal wavelet thresholding strategy for voice source signals which is able to suppress aspiration noise and still retain both the slow and the rapid variations in the voice source estimate

Brunel University Research Archive

Patient-Specific Method of Generating Parametric Maps of Patlak K(i) without Blood Sampling or Metabolite Correction: A Feasibility Study.

Author: Franc Benjamin L
Sayre George A
Seo Youngho
Publication venue: eScholarship, University of California
Publication date: 01/01/2011
Field of study

Currently, kinetic analyses using dynamic positron emission tomography (PET) experience very limited use despite their potential for improving quantitative accuracy in several clinical and research applications. For targeted volume applications, such as radiation treatment planning, treatment monitoring, and cerebral metabolic studies, the key to implementation of these methods is the determination of an arterial input function, which can include time-consuming analysis of blood samples for metabolite correction. Targeted kinetic applications would become practical for the clinic if blood sampling and metabolite correction could be avoided. To this end, we developed a novel method (Patlak-P) of generating parametric maps that is identical to Patlak K(i) (within a global scalar multiple) but does not require the determination of the arterial input function or metabolite correction. In this initial study, we show that Patlak-P (a) mimics Patlak K(i) images in terms of visual assessment and target-to-background (TB) ratios of regions of elevated uptake, (b) has higher visual contrast and (generally) better image quality than SUV, and (c) may have an important role in improving radiotherapy planning, therapy monitoring, and neurometabolism studies

CiteSeerX

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

Registration and statistical analysis of the tongue shape during speech production

Author: Hewer Alexander
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/01/2019
Field of study

This thesis analyzes the human tongue shape during speech production. First, a semi-supervised approach is derived for estimating the tongue shape from volumetric magnetic resonance imaging data of the human vocal tract. Results of this extraction are used to derive parametric tongue models. Next, a framework is presented for registering sparse motion capture data of the tongue by means of such a model. This method allows to generate full three-dimensional animations of the tongue. Finally, a multimodal and statistical text-to-speech system is developed that is able to synthesize audio and synchronized tongue motion from text.Diese Dissertation beschäftigt sich mit der Analyse der menschlichen Zungenform während der Sprachproduktion. Zunächst wird ein semi-überwachtes Verfahren vorgestellt, mit dessen Hilfe sich Zungenformen von volumetrischen Magnetresonanztomographie- Aufnahmen des menschlichen Vokaltrakts schätzen lassen. Die Ergebnisse dieses Extraktionsverfahrens werden genutzt, um ein parametrisches Zungenmodell zu konstruieren. Danach wird eine Methode hergeleitet, die ein solches Modell nutzt, um spärliche Bewegungsaufnahmen der Zunge zu registrieren. Dieser Ansatz erlaubt es, dreidimensionale Animationen der Zunge zu erstellen. Zuletzt wird ein multimodales und statistisches Text-to-Speech-System entwickelt, das in der Lage ist, Audio und die dazu synchrone Zungenbewegung zu synthetisieren.German Research Foundatio

Universaar

Acronym

Time-domain speech enhancement using generative adversarial networks

Author: Bonafonte Cávez Antonio
Pascual de la Puente Santiago
Serra Joan
Publication venue: 'Elsevier BV'
Publication date: 01/11/2019
Field of study

Speech enhancement improves recorded voice utterances to eliminate noise that might be impeding their intelligibility or compromising their quality. Typical speech enhancement systems are based on regression approaches that subtract noise or predict clean signals. Most of them do not operate directly on waveforms. In this work, we propose a generative approach to regenerate corrupted signals into a clean version by using generative adversarial networks on the raw signal. We also explore several variations of the proposed system, obtaining insights into proper architectural choices for an adversarially trained, convolutional autoencoder applied to speech. We conduct both objective and subjective evaluations to assess the performance of the proposed method. The former helps us choose among variations and better tune hyperparameters, while the latter is used in a listening experiment with 42 subjects, confirming the effectiveness of the approach in the real world. We also demonstrate the applicability of the approach for more generalized speech enhancement, where we have to regenerate voices from whispered signals.Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

Intelligent Grimm -- Open-ended Visual Storytelling via Latent Diffusion Models

Author: Liu Chang
Wang Yanfeng
Wu Haoning
Xie Weidi
Zhang Xiaoyun
Zhong Yujie
Publication venue
Publication date: 04/03/2024
Field of study

Generative models have recently exhibited exceptional capabilities in text-to-image generation, but still struggle to generate image sequences coherently. In this work, we focus on a novel, yet challenging task of generating a coherent image sequence based on a given storyline, denoted as open-ended visual storytelling. We make the following three contributions: (i) to fulfill the task of visual storytelling, we propose a learning-based auto-regressive image generation model, termed as StoryGen, with a novel vision-language context module, that enables to generate the current frame by conditioning on the corresponding text prompt and preceding image-caption pairs; (ii) to address the data shortage of visual storytelling, we collect paired image-text sequences by sourcing from online videos and open-source E-books, establishing processing pipeline for constructing a large-scale dataset with diverse characters, storylines, and artistic styles, named StorySalon; (iii) Quantitative experiments and human evaluations have validated the superiority of our StoryGen, where we show StoryGen can generalize to unseen characters without any optimization, and generate image sequences with coherent content and consistent character. Code, dataset, and models are available at https://haoningwu3639.github.io/StoryGen_Webpage/Comment: Accepted by CVPR 2024. Project Page: https://haoningwu3639.github.io/StoryGen_Webpage

arXiv.org e-Print Archive

Integrated Segmentation and Interpolation of Sparse Data

Author: Hamilton Mark
Mirmehdi Majid
Paiement Adeline
Xie Xianghua
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2013
Field of study

International audienceWe address the two inherently related problems of segmentation and interpolation of 3D and 4D sparse data and propose a new method to integrate these stages in a level set framework. The interpolation process uses segmentation information rather than pixel intensities for increased robustness and accuracy. The method supports any spatial configurations of sets of 2D slices having arbitrary positions and orientations. We achieve this by introducing a new level set scheme based on the interpolation of the level set function by radial basis functions. The proposed method is validated quantitatively and/or subjectively on artificial data and MRI and CT scans, and is compared against the traditional sequential approach which interpolates the images first, using a state-of-the-art image interpolation method, and then segments the interpolated volume in 3D or 4D. In our experiments, the proposed framework yielded similar segmentation results to the sequential approach, but provided a more robust and accurate interpolation. In particular, the interpolation was more satisfactory in cases of large gaps, due to the method taking into account the global shape of the object, and it recovered better topologies at the extremities of the shapes where the objects disappear from the image slices. As a result, the complete integrated framework provided more satisfactory shape reconstructions than the sequential approach