15,141 research outputs found
Semantic Image Synthesis via Adversarial Learning
In this paper, we propose a way of synthesizing realistic images directly
with natural language description, which has many useful applications, e.g.
intelligent image manipulation. We attempt to accomplish such synthesis: given
a source image and a target text description, our model synthesizes images to
meet two requirements: 1) being realistic while matching the target text
description; 2) maintaining other image features that are irrelevant to the
text description. The model should be able to disentangle the semantic
information from the two modalities (image and text), and generate new images
from the combined semantics. To achieve this, we proposed an end-to-end neural
architecture that leverages adversarial learning to automatically learn
implicit loss functions, which are optimized to fulfill the aforementioned two
requirements. We have evaluated our model by conducting experiments on
Caltech-200 bird dataset and Oxford-102 flower dataset, and have demonstrated
that our model is capable of synthesizing realistic images that match the given
descriptions, while still maintain other features of original images.Comment: Accepted to ICCV 201
Exploiting Sentence Embedding for Medical Question Answering
Despite the great success of word embedding, sentence embedding remains a
not-well-solved problem. In this paper, we present a supervised learning
framework to exploit sentence embedding for the medical question answering
task. The learning framework consists of two main parts: 1) a sentence
embedding producing module, and 2) a scoring module. The former is developed
with contextual self-attention and multi-scale techniques to encode a sentence
into an embedding tensor. This module is shortly called Contextual
self-Attention Multi-scale Sentence Embedding (CAMSE). The latter employs two
scoring strategies: Semantic Matching Scoring (SMS) and Semantic Association
Scoring (SAS). SMS measures similarity while SAS captures association between
sentence pairs: a medical question concatenated with a candidate choice, and a
piece of corresponding supportive evidence. The proposed framework is examined
by two Medical Question Answering(MedicalQA) datasets which are collected from
real-world applications: medical exam and clinical diagnosis based on
electronic medical records (EMR). The comparison results show that our proposed
framework achieved significant improvements compared to competitive baseline
approaches. Additionally, a series of controlled experiments are also conducted
to illustrate that the multi-scale strategy and the contextual self-attention
layer play important roles for producing effective sentence embedding, and the
two kinds of scoring strategies are highly complementary to each other for
question answering problems.Comment: 8 page
Loss Rank Mining: A General Hard Example Mining Method for Real-time Detectors
Modern object detectors usually suffer from low accuracy issues, as
foregrounds always drown in tons of backgrounds and become hard examples during
training. Compared with those proposal-based ones, real-time detectors are in
far more serious trouble since they renounce the use of region-proposing stage
which is used to filter a majority of backgrounds for achieving real-time
rates. Though foregrounds as hard examples are in urgent need of being mined
from tons of backgrounds, a considerable number of state-of-the-art real-time
detectors, like YOLO series, have yet to profit from existing hard example
mining methods, as using these methods need detectors fit series of
prerequisites. In this paper, we propose a general hard example mining method
named Loss Rank Mining (LRM) to fill the gap. LRM is a general method for
real-time detectors, as it utilizes the final feature map which exists in all
real-time detectors to mine hard examples. By using LRM, some elements
representing easy examples in final feature map are filtered and detectors are
forced to concentrate on hard examples during training. Extensive experiments
validate the effectiveness of our method. With our method, the improvements of
YOLOv2 detector on auto-driving related dataset KITTI and more general dataset
PASCAL VOC are over 5% and 2% mAP, respectively. In addition, LRM is the first
hard example mining strategy which could fit YOLOv2 perfectly and make it
better applied in series of real scenarios where both real-time rates and
accurate detection are strongly demanded.Comment: 8 pages, 6 figure
Phenotype-based and Self-learning Inter-individual Sleep Apnea Screening with a Level IV Monitoring System
Purpose: We propose a phenotype-based artificial intelligence system that can
self-learn and is accurate for screening purposes, and test it on a Level IV
monitoring system. Methods: Based on the physiological knowledge, we
hypothesize that the phenotype information will allow us to find subjects from
a well-annotated database that share similar sleep apnea patterns. Therefore,
for a new-arriving subject, we can establish a prediction model from the
existing database that is adaptive to the subject. We test the proposed
algorithm on a database consisting of 62 subjects with the signals recorded
from a Level IV wearable device measuring the thoracic and abdominal movements
and the SpO2. Results: With the leave-one cross validation, the accuracy of the
proposed algorithm to screen subjects with an apnea-hypopnea index greater or
equal to 15 is 93.6%, the positive likelihood ratio is 6.8, and the negative
likelihood ratio is 0.03. Conclusion: The results confirm the hypothesis and
show that the proposed algorithm has great potential to screen patients with
SAS
- …