8,740 research outputs found
The Neuro-Symbolic Concept Learner: Interpreting Scenes, Words, and Sentences From Natural Supervision
We propose the Neuro-Symbolic Concept Learner (NS-CL), a model that learns
visual concepts, words, and semantic parsing of sentences without explicit
supervision on any of them; instead, our model learns by simply looking at
images and reading paired questions and answers. Our model builds an
object-based scene representation and translates sentences into executable,
symbolic programs. To bridge the learning of two modules, we use a
neuro-symbolic reasoning module that executes these programs on the latent
scene representation. Analogical to human concept learning, the perception
module learns visual concepts based on the language description of the object
being referred to. Meanwhile, the learned visual concepts facilitate learning
new words and parsing new sentences. We use curriculum learning to guide the
searching over the large compositional space of images and language. Extensive
experiments demonstrate the accuracy and efficiency of our model on learning
visual concepts, word representations, and semantic parsing of sentences.
Further, our method allows easy generalization to new object attributes,
compositions, language concepts, scenes and questions, and even new program
domains. It also empowers applications including visual question answering and
bidirectional image-text retrieval.Comment: ICLR 2019 (Oral). Project page: http://nscl.csail.mit.edu
ScaleTrotter: Illustrative Visual Travels Across Negative Scales
We present ScaleTrotter, a conceptual framework for an interactive,
multi-scale visualization of biological mesoscale data and, specifically,
genome data. ScaleTrotter allows viewers to smoothly transition from the
nucleus of a cell to the atomistic composition of the DNA, while bridging
several orders of magnitude in scale. The challenges in creating an interactive
visualization of genome data are fundamentally different in several ways from
those in other domains like astronomy that require a multi-scale representation
as well. First, genome data has intertwined scale levels---the DNA is an
extremely long, connected molecule that manifests itself at all scale levels.
Second, elements of the DNA do not disappear as one zooms out---instead the
scale levels at which they are observed group these elements differently.
Third, we have detailed information and thus geometry for the entire dataset
and for all scale levels, posing a challenge for interactive visual
exploration. Finally, the conceptual scale levels for genome data are close in
scale space, requiring us to find ways to visually embed a smaller scale into a
coarser one. We address these challenges by creating a new multi-scale
visualization concept. We use a scale-dependent camera model that controls the
visual embedding of the scales into their respective parents, the rendering of
a subset of the scale hierarchy, and the location, size, and scope of the view.
In traversing the scales, ScaleTrotter is roaming between 2D and 3D visual
representations that are depicted in integrated visuals. We discuss,
specifically, how this form of multi-scale visualization follows from the
specific characteristics of the genome data and describe its implementation.
Finally, we discuss the implications of our work to the general illustrative
depiction of multi-scale data
NiftyNet: a deep-learning platform for medical imaging
Medical image analysis and computer-assisted intervention problems are
increasingly being addressed with deep-learning-based solutions. Established
deep-learning platforms are flexible but do not provide specific functionality
for medical image analysis and adapting them for this application requires
substantial implementation effort. Thus, there has been substantial duplication
of effort and incompatible infrastructure developed across many research
groups. This work presents the open-source NiftyNet platform for deep learning
in medical imaging. The ambition of NiftyNet is to accelerate and simplify the
development of these solutions, and to provide a common mechanism for
disseminating research outputs for the community to use, adapt and build upon.
NiftyNet provides a modular deep-learning pipeline for a range of medical
imaging applications including segmentation, regression, image generation and
representation learning applications. Components of the NiftyNet pipeline
including data loading, data augmentation, network architectures, loss
functions and evaluation metrics are tailored to, and take advantage of, the
idiosyncracies of medical image analysis and computer-assisted intervention.
NiftyNet is built on TensorFlow and supports TensorBoard visualization of 2D
and 3D images and computational graphs by default.
We present 3 illustrative medical image analysis applications built using
NiftyNet: (1) segmentation of multiple abdominal organs from computed
tomography; (2) image regression to predict computed tomography attenuation
maps from brain magnetic resonance images; and (3) generation of simulated
ultrasound images for specified anatomical poses.
NiftyNet enables researchers to rapidly develop and distribute deep learning
solutions for segmentation, regression, image generation and representation
learning applications, or extend the platform to new applications.Comment: Wenqi Li and Eli Gibson contributed equally to this work. M. Jorge
Cardoso and Tom Vercauteren contributed equally to this work. 26 pages, 6
figures; Update includes additional applications, updated author list and
formatting for journal submissio
- …