799 research outputs found
It Takes (Only) Two: Adversarial Generator-Encoder Networks
We present a new autoencoder-type architecture that is trainable in an
unsupervised mode, sustains both generation and inference, and has the quality
of conditional and unconditional samples boosted by adversarial learning.
Unlike previous hybrids of autoencoders and adversarial networks, the
adversarial game in our approach is set up directly between the encoder and the
generator, and no external mappings are trained in the process of learning. The
game objective compares the divergences of each of the real and the generated
data distributions with the prior distribution in the latent space. We show
that direct generator-vs-encoder game leads to a tight coupling of the two
components, resulting in samples and reconstructions of a comparable quality to
some recently-proposed more complex architectures
Describing Videos by Exploiting Temporal Structure
Recent progress in using recurrent neural networks (RNNs) for image
description has motivated the exploration of their application for video
description. However, while images are static, working with videos requires
modeling their dynamic temporal structure and then properly integrating that
information into a natural language description. In this context, we propose an
approach that successfully takes into account both the local and global
temporal structure of videos to produce descriptions. First, our approach
incorporates a spatial temporal 3-D convolutional neural network (3-D CNN)
representation of the short temporal dynamics. The 3-D CNN representation is
trained on video action recognition tasks, so as to produce a representation
that is tuned to human motion and behavior. Second we propose a temporal
attention mechanism that allows to go beyond local temporal modeling and learns
to automatically select the most relevant temporal segments given the
text-generating RNN. Our approach exceeds the current state-of-art for both
BLEU and METEOR metrics on the Youtube2Text dataset. We also present results on
a new, larger and more challenging dataset of paired video and natural language
descriptions.Comment: Accepted to ICCV15. This version comes with code release and
supplementary materia
Rapid feasibility assessment of components to be formed through hot stamping: A deep learning approach
The novel non-isothermal Hot Forming and cold die Quenching (HFQ) process can enable the cost-effective production of complex shaped, high strength aluminium alloy panel components. However, the unfamiliarity of designing for the new process prevents its widescale adoption in industrial settings. Recent research efforts focus on the development of advanced material models for finite element simulations, used to assess the feasibility of new component designs for the HFQ process. However, FE simulations take place late in design processes, require forming process expertise and are unsuitable for early-stage design explorations. To address these limitations, this study presents a novel application of a Convolutional Neural Network (CNN) based surrogate as a means of rapid manufacturing feasibility assessment for components to be formed using the HFQ process. A diverse dataset containing variations in component geometry, blank shapes, and processing parameters, together with corresponding physical fields is generated and used to train the model. The results show that near indistinguishable full field predictions are obtained in real time from the model when compared with HFQ simulations. This technique provides an invaluable tool to aid component design and decision making at the onset of a design process for complex-shaped components formed under HFQ conditions
Deep Learning Framework for Spleen Volume Estimation from 2D Cross-sectional Views
Abnormal spleen enlargement (splenomegaly) is regarded as a clinical
indicator for a range of conditions, including liver disease, cancer and blood
diseases. While spleen length measured from ultrasound images is a commonly
used surrogate for spleen size, spleen volume remains the gold standard metric
for assessing splenomegaly and the severity of related clinical conditions.
Computed tomography is the main imaging modality for measuring spleen volume,
but it is less accessible in areas where there is a high prevalence of
splenomegaly (e.g., the Global South). Our objective was to enable automated
spleen volume measurement from 2D cross-sectional segmentations, which can be
obtained from ultrasound imaging. In this study, we describe a variational
autoencoder-based framework to measure spleen volume from single- or dual-view
2D spleen segmentations. We propose and evaluate three volume estimation
methods within this framework. We also demonstrate how 95% confidence intervals
of volume estimates can be produced to make our method more clinically useful.
Our best model achieved mean relative volume accuracies of 86.62% and 92.58%
for single- and dual-view segmentations, respectively, surpassing the
performance of the clinical standard approach of linear regression using manual
measurements and a comparative deep learning-based 2D-3D reconstruction-based
approach. The proposed spleen volume estimation framework can be integrated
into standard clinical workflows which currently use 2D ultrasound images to
measure spleen length. To the best of our knowledge, this is the first work to
achieve direct 3D spleen volume estimation from 2D spleen segmentations.Comment: 22 pages, 7 figure
Diffusion-based Molecule Generation with Informative Prior Bridges
AI-based molecule generation provides a promising approach to a large area of
biomedical sciences and engineering, such as antibody design, hydrolase
engineering, or vaccine development. Because the molecules are governed by
physical laws, a key challenge is to incorporate prior information into the
training procedure to generate high-quality and realistic molecules. We propose
a simple and novel approach to steer the training of diffusion-based generative
models with physical and statistics prior information. This is achieved by
constructing physically informed diffusion bridges, stochastic processes that
guarantee to yield a given observation at the fixed terminal time. We develop a
Lyapunov function based method to construct and determine bridges, and propose
a number of proposals of informative prior bridges for both high-quality
molecule generation and uniformity-promoted 3D point cloud generation. With
comprehensive experiments, we show that our method provides a powerful approach
to the 3D generation task, yielding molecule structures with better quality and
stability scores and more uniformly distributed point clouds of high qualities
SPU-Net: Self-Supervised Point Cloud Upsampling by Coarse-to-Fine Reconstruction with Self-Projection Optimization
The task of point cloud upsampling aims to acquire dense and uniform point
sets from sparse and irregular point sets. Although significant progress has
been made with deep learning models, they require ground-truth dense point sets
as the supervision information, which can only trained on synthetic paired
training data and are not suitable for training under real-scanned sparse data.
However, it is expensive and tedious to obtain large scale paired sparse-dense
point sets for training from real scanned sparse data. To address this problem,
we propose a self-supervised point cloud upsampling network, named SPU-Net, to
capture the inherent upsampling patterns of points lying on the underlying
object surface. Specifically, we propose a coarse-to-fine reconstruction
framework, which contains two main components: point feature extraction and
point feature expansion, respectively. In the point feature extraction, we
integrate self-attention module with graph convolution network (GCN) to
simultaneously capture context information inside and among local regions. In
the point feature expansion, we introduce a hierarchically learnable folding
strategy to generate the upsampled point sets with learnable 2D grids.
Moreover, to further optimize the noisy points in the generated point sets, we
propose a novel self-projection optimization associated with uniform and
reconstruction terms, as a joint loss, to facilitate the self-supervised point
cloud upsampling. We conduct various experiments on both synthetic and
real-scanned datasets, and the results demonstrate that we achieve comparable
performance to the state-of-the-art supervised methods
Image Quality Improvement of Medical Images using Deep Learning for Computer-aided Diagnosis
Retina image analysis is an important screening tool for early detection of multiple dis eases such as diabetic retinopathy which greatly impairs visual function. Image analy sis and pathology detection can be accomplished both by ophthalmologists and by the
use of computer-aided diagnosis systems. Advancements in hardware technology led to
more portable and less expensive imaging devices for medical image acquisition. This
promotes large scale remote diagnosis by clinicians as well as the implementation of
computer-aided diagnosis systems for local routine disease screening. However, lower cost equipment generally results in inferior quality images. This may jeopardize the
reliability of the acquired images and thus hinder the overall performance of the diagnos tic tool. To solve this open challenge, we carried out an in-depth study on using different
deep learning-based frameworks for improving retina image quality while maintaining
the underlying morphological information for the diagnosis. Our results demonstrate
that using a Cycle Generative Adversarial Network for unpaired image-to-image trans lation leads to successful transformations of retina images from a low- to a high-quality
domain. The visual evidence of this improvement was quantitatively affirmed by the two
proposed validation methods. The first used a retina image quality classifier to confirm a
significant prediction label shift towards a quality enhance. On average, a 50% increase
of images being classified as high-quality was verified. The second analysed the perfor mance modifications of a diabetic retinopathy detection algorithm upon being trained
with the quality-improved images. The latter led to strong evidence that the proposed
solution satisfies the requirement of maintaining the images’ original information for
diagnosis, and that it assures a pathology-assessment more sensitive to the presence of
pathological signs. These experimental results confirm the potential effectiveness of our
solution in improving retina image quality for diagnosis. Along with the addressed con tributions, we analysed how the construction of the data sets representing the low-quality
domain impacts the quality translation efficiency. Our findings suggest that by tackling
the problem more selectively, that is, constructing data sets more homogeneous in terms
of their image defects, we can obtain more accentuated quality transformations
A Generative Dialogue System for Reminiscence Therapy
With people living longer than ever, the number of cases with neurodegenerative diseases such as Alzheimer's or cognitive impairment increases steadily. In Spain it affects more than 1.2 million patients and it is estimated that in 2050 more than 100 million people will be affected. While there are not effective treatments for this terminal disease, therapies such as reminiscence, that stimulate memories of the patient's past are recommended, as they encourage the communication and produce mental and emotional benefits on the patient. Currently, reminiscence therapy takes place in hospitals or residences, where the therapists are located. Since people that receive this therapy are old and may have mobility difficulties, we present an AI solution to guide older adults through reminiscence sessions by using their laptop or smartphone. Our solution consists in a generative dialogue system composed of two deep learning architectures to recognize image and text content. An Encoder-Decoder with Attention is trained to generate questions from photos provided by the user, which is composed of a pretrained Convolution Neural Network to encode the picture, and a Long Short-Term Memory to decode the image features and generate the question. The second architecture is a sequence-to-sequence model that provides feedback to engage the user in the conversation. Thanks to the experiments, we realise that we obtain the best performance by training the dialogue model with Persona-Dataset and fine-tuning it with Cornell Movie-Dialogues dataset. Finally, we integrate Telegram as the interface for the user to interact with Elisabot, our trained conversational agent.El número de casos con enfermedades neurodegenerativas como el Alzheimer o el deterioro cognitivo aumenta de manera constante. En España afecta a más de 1,2 millones de pacientes y se estima que en 2050 se verán afectados más de 100 millones de personas. Si bien no existen tratamientos efectivos para esta enfermedad terminal, se recomiendan terapias como la reminiscencia, que estimulan los recuerdos del pasado y fomentan la comunicación del paciente. Actualmente, la terapia de reminiscencia se realiza en hospitales o residencias, donde se encuentran los terapeutas. Dado que las personas que reciben esta terapia son mayores y pueden tener dificultades de movilidad, presentamos una solución basada en inteligencia artificial para guiar a los usuarios a través de sesiones de reminiscencia utilizando su portátil o teléfono inteligente. Nuestra solución consiste en un sistema de diálogo generativo compuesto por dos arquitecturas de aprendizaje profundo para reconocer el contenido de imagen y texto. Por un lado, un Codificador-Descodificador con Attention para generar preguntas basadas en el contenido de las fotografÃas proporcionadas por los usuarios formado por una red neuronal convolucional (CNN) que codifica las imágenes y una LSTM que genera las preguntes palabra a palabra. La segunda arquitectura consiste en un modelo sequence-to-sequence que genera comentarios a las respuestas de los usuarios para enriquecer la conversación.. Después de realizar varios experimentos, vemos que obtenemos el mejor comportamiento entrenando el modelo con los datos de Persona-chat y ajustando el modelo con fine-tune de Cornell Movie-Dialogue. Finalmente, integramos Telegram como interfaz porque el usuario interactue con nuestro agente Elisabot.El nombre de casos amb malalties neurodegeneratives com l'Alzheimer o el deteriorament cognitiu augmenta constantment. A Espanya afecta més d'1,2 milions de pacients i es calcula que el 2050 es veuran més de 100 milions de casos. Si bé no hi ha tractaments eficaços per a aquesta malaltia, es recomanen terà pies com la reminiscència, que estimulen els records del pacient i fomenten la comunicació. Actualment, la terà pia de reminiscència es fa en hospitals o residències, que és on es troben els terapeutes. Com que les persones que reben aquesta terà pia són grans i poden tenir dificultats de mobilitat, presentem una solució basada en Intel·ligència Artificial per realitzar sessions de reminiscència mitjançant l'ús del portà til o del telèfon mòbil. La nostra solució consisteix en un sistema de dià leg generatiu format per dues arquitectures d?aprenentatge profund que reconeixen el contingut d'imatges i de text. Per una banda, un Codificador-Descodificador amb Attention per generar preguntes basades en el contingut de les fotografies, proporcionades pels usuaris, format per una xarxa neuronal convolucional (CNN) que codifica les imatges i una LSTM que genera les preguntes paraula a paraula. La segona arquitectura consisteix en un model sequence-to-sequence que genera comentaris a les respostesdels usuaris per enriquir la conversa.Despres de realitzar diversos experiments, veiem que obtenim el millor comportament en-trenant el model de conversa amb les dades de Persona-chati ajustant el model amb fine-tune de Cornell Movie-Dialogues. Finalment, integremTelegram com a interficie perquè l'usuari interactuï amb la Elisabo
- …