799 research outputs found

    It Takes (Only) Two: Adversarial Generator-Encoder Networks

    Full text link
    We present a new autoencoder-type architecture that is trainable in an unsupervised mode, sustains both generation and inference, and has the quality of conditional and unconditional samples boosted by adversarial learning. Unlike previous hybrids of autoencoders and adversarial networks, the adversarial game in our approach is set up directly between the encoder and the generator, and no external mappings are trained in the process of learning. The game objective compares the divergences of each of the real and the generated data distributions with the prior distribution in the latent space. We show that direct generator-vs-encoder game leads to a tight coupling of the two components, resulting in samples and reconstructions of a comparable quality to some recently-proposed more complex architectures

    Describing Videos by Exploiting Temporal Structure

    Full text link
    Recent progress in using recurrent neural networks (RNNs) for image description has motivated the exploration of their application for video description. However, while images are static, working with videos requires modeling their dynamic temporal structure and then properly integrating that information into a natural language description. In this context, we propose an approach that successfully takes into account both the local and global temporal structure of videos to produce descriptions. First, our approach incorporates a spatial temporal 3-D convolutional neural network (3-D CNN) representation of the short temporal dynamics. The 3-D CNN representation is trained on video action recognition tasks, so as to produce a representation that is tuned to human motion and behavior. Second we propose a temporal attention mechanism that allows to go beyond local temporal modeling and learns to automatically select the most relevant temporal segments given the text-generating RNN. Our approach exceeds the current state-of-art for both BLEU and METEOR metrics on the Youtube2Text dataset. We also present results on a new, larger and more challenging dataset of paired video and natural language descriptions.Comment: Accepted to ICCV15. This version comes with code release and supplementary materia

    Rapid feasibility assessment of components to be formed through hot stamping: A deep learning approach

    Get PDF
    The novel non-isothermal Hot Forming and cold die Quenching (HFQ) process can enable the cost-effective production of complex shaped, high strength aluminium alloy panel components. However, the unfamiliarity of designing for the new process prevents its widescale adoption in industrial settings. Recent research efforts focus on the development of advanced material models for finite element simulations, used to assess the feasibility of new component designs for the HFQ process. However, FE simulations take place late in design processes, require forming process expertise and are unsuitable for early-stage design explorations. To address these limitations, this study presents a novel application of a Convolutional Neural Network (CNN) based surrogate as a means of rapid manufacturing feasibility assessment for components to be formed using the HFQ process. A diverse dataset containing variations in component geometry, blank shapes, and processing parameters, together with corresponding physical fields is generated and used to train the model. The results show that near indistinguishable full field predictions are obtained in real time from the model when compared with HFQ simulations. This technique provides an invaluable tool to aid component design and decision making at the onset of a design process for complex-shaped components formed under HFQ conditions

    Deep Learning Framework for Spleen Volume Estimation from 2D Cross-sectional Views

    Full text link
    Abnormal spleen enlargement (splenomegaly) is regarded as a clinical indicator for a range of conditions, including liver disease, cancer and blood diseases. While spleen length measured from ultrasound images is a commonly used surrogate for spleen size, spleen volume remains the gold standard metric for assessing splenomegaly and the severity of related clinical conditions. Computed tomography is the main imaging modality for measuring spleen volume, but it is less accessible in areas where there is a high prevalence of splenomegaly (e.g., the Global South). Our objective was to enable automated spleen volume measurement from 2D cross-sectional segmentations, which can be obtained from ultrasound imaging. In this study, we describe a variational autoencoder-based framework to measure spleen volume from single- or dual-view 2D spleen segmentations. We propose and evaluate three volume estimation methods within this framework. We also demonstrate how 95% confidence intervals of volume estimates can be produced to make our method more clinically useful. Our best model achieved mean relative volume accuracies of 86.62% and 92.58% for single- and dual-view segmentations, respectively, surpassing the performance of the clinical standard approach of linear regression using manual measurements and a comparative deep learning-based 2D-3D reconstruction-based approach. The proposed spleen volume estimation framework can be integrated into standard clinical workflows which currently use 2D ultrasound images to measure spleen length. To the best of our knowledge, this is the first work to achieve direct 3D spleen volume estimation from 2D spleen segmentations.Comment: 22 pages, 7 figure

    Diffusion-based Molecule Generation with Informative Prior Bridges

    Full text link
    AI-based molecule generation provides a promising approach to a large area of biomedical sciences and engineering, such as antibody design, hydrolase engineering, or vaccine development. Because the molecules are governed by physical laws, a key challenge is to incorporate prior information into the training procedure to generate high-quality and realistic molecules. We propose a simple and novel approach to steer the training of diffusion-based generative models with physical and statistics prior information. This is achieved by constructing physically informed diffusion bridges, stochastic processes that guarantee to yield a given observation at the fixed terminal time. We develop a Lyapunov function based method to construct and determine bridges, and propose a number of proposals of informative prior bridges for both high-quality molecule generation and uniformity-promoted 3D point cloud generation. With comprehensive experiments, we show that our method provides a powerful approach to the 3D generation task, yielding molecule structures with better quality and stability scores and more uniformly distributed point clouds of high qualities

    SPU-Net: Self-Supervised Point Cloud Upsampling by Coarse-to-Fine Reconstruction with Self-Projection Optimization

    Full text link
    The task of point cloud upsampling aims to acquire dense and uniform point sets from sparse and irregular point sets. Although significant progress has been made with deep learning models, they require ground-truth dense point sets as the supervision information, which can only trained on synthetic paired training data and are not suitable for training under real-scanned sparse data. However, it is expensive and tedious to obtain large scale paired sparse-dense point sets for training from real scanned sparse data. To address this problem, we propose a self-supervised point cloud upsampling network, named SPU-Net, to capture the inherent upsampling patterns of points lying on the underlying object surface. Specifically, we propose a coarse-to-fine reconstruction framework, which contains two main components: point feature extraction and point feature expansion, respectively. In the point feature extraction, we integrate self-attention module with graph convolution network (GCN) to simultaneously capture context information inside and among local regions. In the point feature expansion, we introduce a hierarchically learnable folding strategy to generate the upsampled point sets with learnable 2D grids. Moreover, to further optimize the noisy points in the generated point sets, we propose a novel self-projection optimization associated with uniform and reconstruction terms, as a joint loss, to facilitate the self-supervised point cloud upsampling. We conduct various experiments on both synthetic and real-scanned datasets, and the results demonstrate that we achieve comparable performance to the state-of-the-art supervised methods

    Image Quality Improvement of Medical Images using Deep Learning for Computer-aided Diagnosis

    Get PDF
    Retina image analysis is an important screening tool for early detection of multiple dis eases such as diabetic retinopathy which greatly impairs visual function. Image analy sis and pathology detection can be accomplished both by ophthalmologists and by the use of computer-aided diagnosis systems. Advancements in hardware technology led to more portable and less expensive imaging devices for medical image acquisition. This promotes large scale remote diagnosis by clinicians as well as the implementation of computer-aided diagnosis systems for local routine disease screening. However, lower cost equipment generally results in inferior quality images. This may jeopardize the reliability of the acquired images and thus hinder the overall performance of the diagnos tic tool. To solve this open challenge, we carried out an in-depth study on using different deep learning-based frameworks for improving retina image quality while maintaining the underlying morphological information for the diagnosis. Our results demonstrate that using a Cycle Generative Adversarial Network for unpaired image-to-image trans lation leads to successful transformations of retina images from a low- to a high-quality domain. The visual evidence of this improvement was quantitatively affirmed by the two proposed validation methods. The first used a retina image quality classifier to confirm a significant prediction label shift towards a quality enhance. On average, a 50% increase of images being classified as high-quality was verified. The second analysed the perfor mance modifications of a diabetic retinopathy detection algorithm upon being trained with the quality-improved images. The latter led to strong evidence that the proposed solution satisfies the requirement of maintaining the images’ original information for diagnosis, and that it assures a pathology-assessment more sensitive to the presence of pathological signs. These experimental results confirm the potential effectiveness of our solution in improving retina image quality for diagnosis. Along with the addressed con tributions, we analysed how the construction of the data sets representing the low-quality domain impacts the quality translation efficiency. Our findings suggest that by tackling the problem more selectively, that is, constructing data sets more homogeneous in terms of their image defects, we can obtain more accentuated quality transformations

    A Generative Dialogue System for Reminiscence Therapy

    Get PDF
    With people living longer than ever, the number of cases with neurodegenerative diseases such as Alzheimer's or cognitive impairment increases steadily. In Spain it affects more than 1.2 million patients and it is estimated that in 2050 more than 100 million people will be affected. While there are not effective treatments for this terminal disease, therapies such as reminiscence, that stimulate memories of the patient's past are recommended, as they encourage the communication and produce mental and emotional benefits on the patient. Currently, reminiscence therapy takes place in hospitals or residences, where the therapists are located. Since people that receive this therapy are old and may have mobility difficulties, we present an AI solution to guide older adults through reminiscence sessions by using their laptop or smartphone. Our solution consists in a generative dialogue system composed of two deep learning architectures to recognize image and text content. An Encoder-Decoder with Attention is trained to generate questions from photos provided by the user, which is composed of a pretrained Convolution Neural Network to encode the picture, and a Long Short-Term Memory to decode the image features and generate the question. The second architecture is a sequence-to-sequence model that provides feedback to engage the user in the conversation. Thanks to the experiments, we realise that we obtain the best performance by training the dialogue model with Persona-Dataset and fine-tuning it with Cornell Movie-Dialogues dataset. Finally, we integrate Telegram as the interface for the user to interact with Elisabot, our trained conversational agent.El número de casos con enfermedades neurodegenerativas como el Alzheimer o el deterioro cognitivo aumenta de manera constante. En España afecta a más de 1,2 millones de pacientes y se estima que en 2050 se verán afectados más de 100 millones de personas. Si bien no existen tratamientos efectivos para esta enfermedad terminal, se recomiendan terapias como la reminiscencia, que estimulan los recuerdos del pasado y fomentan la comunicación del paciente. Actualmente, la terapia de reminiscencia se realiza en hospitales o residencias, donde se encuentran los terapeutas. Dado que las personas que reciben esta terapia son mayores y pueden tener dificultades de movilidad, presentamos una solución basada en inteligencia artificial para guiar a los usuarios a través de sesiones de reminiscencia utilizando su portátil o teléfono inteligente. Nuestra solución consiste en un sistema de diálogo generativo compuesto por dos arquitecturas de aprendizaje profundo para reconocer el contenido de imagen y texto. Por un lado, un Codificador-Descodificador con Attention para generar preguntas basadas en el contenido de las fotografías proporcionadas por los usuarios formado por una red neuronal convolucional (CNN) que codifica las imágenes y una LSTM que genera las preguntes palabra a palabra. La segunda arquitectura consiste en un modelo sequence-to-sequence que genera comentarios a las respuestas de los usuarios para enriquecer la conversación.. Después de realizar varios experimentos, vemos que obtenemos el mejor comportamiento entrenando el modelo con los datos de Persona-chat y ajustando el modelo con fine-tune de Cornell Movie-Dialogue. Finalmente, integramos Telegram como interfaz porque el usuario interactue con nuestro agente Elisabot.El nombre de casos amb malalties neurodegeneratives com l'Alzheimer o el deteriorament cognitiu augmenta constantment. A Espanya afecta més d'1,2 milions de pacients i es calcula que el 2050 es veuran més de 100 milions de casos. Si bé no hi ha tractaments eficaços per a aquesta malaltia, es recomanen teràpies com la reminiscència, que estimulen els records del pacient i fomenten la comunicació. Actualment, la teràpia de reminiscència es fa en hospitals o residències, que és on es troben els terapeutes. Com que les persones que reben aquesta teràpia són grans i poden tenir dificultats de mobilitat, presentem una solució basada en Intel·ligència Artificial per realitzar sessions de reminiscència mitjançant l'ús del portàtil o del telèfon mòbil. La nostra solució consisteix en un sistema de diàleg generatiu format per dues arquitectures d?aprenentatge profund que reconeixen el contingut d'imatges i de text. Per una banda, un Codificador-Descodificador amb Attention per generar preguntes basades en el contingut de les fotografies, proporcionades pels usuaris, format per una xarxa neuronal convolucional (CNN) que codifica les imatges i una LSTM que genera les preguntes paraula a paraula. La segona arquitectura consisteix en un model sequence-to-sequence que genera comentaris a les respostesdels usuaris per enriquir la conversa.Despres de realitzar diversos experiments, veiem que obtenim el millor comportament en-trenant el model de conversa amb les dades de Persona-chati ajustant el model amb fine-tune de Cornell Movie-Dialogues. Finalment, integremTelegram com a interficie perquè l'usuari interactuï amb la Elisabo
    • …
    corecore