Search CORE

3,123 research outputs found

A Survey on Deep Learning in Medical Image Analysis

Author: Bejnordi Babak Ehteshami
Ciompi Francesco
Ghafoorian Mohsen
Kooi Thijs
Litjens Geert
Setio Arnaud Arindra Adiyoso
Sánchez Clara I.
van der Laak Jeroen A. W. M.
van Ginneken Bram
Publication venue: 'Elsevier BV'
Publication date: 01/01/2017
Field of study

Deep learning algorithms, in particular convolutional networks, have rapidly become a methodology of choice for analyzing medical images. This paper reviews the major deep learning concepts pertinent to medical image analysis and summarizes over 300 contributions to the field, most of which appeared in the last year. We survey the use of deep learning for image classification, object detection, segmentation, registration, and other tasks and provide concise overviews of studies per application area. Open challenges and directions for future research are discussed.Comment: Revised survey includes expanded discussion section and reworked introductory section on common deep architectures. Added missed papers from before Feb 1st 201

arXiv.org e-Print Archive

Radboud Repository

Deep Learning in Cardiology

Author: Bizopoulos Paschalis
Koutsouris Dimitrios
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 03/02/2021
Field of study

The medical field is creating large amount of data that physicians are unable to decipher and use efficiently. Moreover, rule-based expert systems are inefficient in solving complicated medical tasks or for creating insights using big data. Deep learning has emerged as a more accurate and effective technology in a wide range of medical problems such as diagnosis, prediction and intervention. Deep learning is a representation learning method that consists of layers that transform the data non-linearly, thus, revealing hierarchical relationships and structures. In this review we survey deep learning application papers that use structured data, signal and imaging modalities from cardiology. We discuss the advantages and limitations of applying deep learning in cardiology that also apply in medicine in general, while proposing certain directions as the most viable for clinical use.Comment: 27 pages, 2 figures, 10 table

arXiv.org e-Print Archive

Linking Image and Text with 2-Way Nets

Author: Eisenschtat Aviv
Wolf Lior
Publication venue
Publication date: 10/02/2017
Field of study

Linking two data sources is a basic building block in numerous computer vision problems. Canonical Correlation Analysis (CCA) achieves this by utilizing a linear optimizer in order to maximize the correlation between the two views. Recent work makes use of non-linear models, including deep learning techniques, that optimize the CCA loss in some feature space. In this paper, we introduce a novel, bi-directional neural network architecture for the task of matching vectors from two data sources. Our approach employs two tied neural network channels that project the two views into a common, maximally correlated space using the Euclidean loss. We show a direct link between the correlation-based loss and Euclidean loss, enabling the use of Euclidean loss for correlation maximization. To overcome common Euclidean regression optimization problems, we modify well-known techniques to our problem, including batch normalization and dropout. We show state of the art results on a number of computer vision matching tasks including MNIST image matching and sentence-image matching on the Flickr8k, Flickr30k and COCO datasets.Comment: 14 pages, 2 figures, 6 table

arXiv.org e-Print Archive

Crossref

Domain Randomization and Generative Models for Robotic Grasping

Author: Abbeel Pieter
Andrychowicz Marcin
Biewald Lukas
Duan Rocky
Handa Ankur
Kumar Vikash
McGrew Bob
Schneider Jonas
Tobin Joshua
Welinder Peter
Zaremba Wojciech
Publication venue
Publication date: 03/04/2018
Field of study

Deep learning-based robotic grasping has made significant progress thanks to algorithmic improvements and increased data availability. However, state-of-the-art models are often trained on as few as hundreds or thousands of unique object instances, and as a result generalization can be a challenge. In this work, we explore a novel data generation pipeline for training a deep neural network to perform grasp planning that applies the idea of domain randomization to object synthesis. We generate millions of unique, unrealistic procedurally generated objects, and train a deep neural network to perform grasp planning on these objects. Since the distribution of successful grasps for a given object can be highly multimodal, we propose an autoregressive grasp planning model that maps sensor inputs of a scene to a probability distribution over possible grasps. This model allows us to sample grasps efficiently at test time (or avoid sampling entirely). We evaluate our model architecture and data generation pipeline in simulation and the real world. We find we can achieve a

>

90% success rate on previously unseen realistic objects at test time in simulation despite having only been trained on random objects. We also demonstrate an 80% success rate on real-world grasp attempts despite having only been trained on random simulated objects.Comment: 8 pages, 11 figures. Submitted to 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2018

arXiv.org e-Print Archive

Crossref

Sequence to Sequence -- Video to Text

Author: Darrell Trevor
Donahue Jeff
Mooney Raymond
Rohrbach Marcus
Saenko Kate
Venugopalan Subhashini
Publication venue
Publication date: 19/10/2015
Field of study

Real-world videos often have complex dynamics; and methods for generating open-domain video descriptions should be sensitive to temporal structure and allow both input (sequence of frames) and output (sequence of words) of variable length. To approach this problem, we propose a novel end-to-end sequence-to-sequence model to generate captions for videos. For this we exploit recurrent neural networks, specifically LSTMs, which have demonstrated state-of-the-art performance in image caption generation. Our LSTM model is trained on video-sentence pairs and learns to associate a sequence of video frames to a sequence of words in order to generate a description of the event in the video clip. Our model naturally is able to learn the temporal structure of the sequence of frames as well as the sequence model of the generated sentences, i.e. a language model. We evaluate several variants of our model that exploit different visual features on a standard set of YouTube videos and two movie description datasets (M-VAD and MPII-MD).Comment: ICCV 2015 camera-ready. Includes code, project page and LSMDC challenge result

arXiv.org e-Print Archive

Crossref