38,853 research outputs found
Self-critical Sequence Training for Image Captioning
Recently it has been shown that policy-gradient methods for reinforcement
learning can be utilized to train deep end-to-end systems directly on
non-differentiable metrics for the task at hand. In this paper we consider the
problem of optimizing image captioning systems using reinforcement learning,
and show that by carefully optimizing our systems using the test metrics of the
MSCOCO task, significant gains in performance can be realized. Our systems are
built using a new optimization approach that we call self-critical sequence
training (SCST). SCST is a form of the popular REINFORCE algorithm that, rather
than estimating a "baseline" to normalize the rewards and reduce variance,
utilizes the output of its own test-time inference algorithm to normalize the
rewards it experiences. Using this approach, estimating the reward signal (as
actor-critic methods must do) and estimating normalization (as REINFORCE
algorithms typically do) is avoided, while at the same time harmonizing the
model with respect to its test-time inference procedure. Empirically we find
that directly optimizing the CIDEr metric with SCST and greedy decoding at
test-time is highly effective. Our results on the MSCOCO evaluation sever
establish a new state-of-the-art on the task, improving the best result in
terms of CIDEr from 104.9 to 114.7.Comment: CVPR 2017 + additional analysis + fixed baseline results, 16 page
MDNet: A Semantically and Visually Interpretable Medical Image Diagnosis Network
The inability to interpret the model prediction in semantically and visually
meaningful ways is a well-known shortcoming of most existing computer-aided
diagnosis methods. In this paper, we propose MDNet to establish a direct
multimodal mapping between medical images and diagnostic reports that can read
images, generate diagnostic reports, retrieve images by symptom descriptions,
and visualize attention, to provide justifications of the network diagnosis
process. MDNet includes an image model and a language model. The image model is
proposed to enhance multi-scale feature ensembles and utilization efficiency.
The language model, integrated with our improved attention mechanism, aims to
read and explore discriminative image feature descriptions from reports to
learn a direct mapping from sentence words to image pixels. The overall network
is trained end-to-end by using our developed optimization strategy. Based on a
pathology bladder cancer images and its diagnostic reports (BCIDR) dataset, we
conduct sufficient experiments to demonstrate that MDNet outperforms
comparative baselines. The proposed image model obtains state-of-the-art
performance on two CIFAR datasets as well.Comment: CVPR2017 Ora
Deep metric learning to rank
We propose a novel deep metric learning method by revisiting the learning to rank approach. Our method, named FastAP, optimizes the rank-based Average Precision measure, using an approximation derived from distance quantization. FastAP has a low complexity compared to existing methods, and is tailored for stochastic gradient descent. To fully exploit the benefits of the ranking formulation, we also propose a new minibatch sampling scheme, as well as a simple heuristic to enable large-batch training. On three few-shot image retrieval datasets, FastAP consistently outperforms competing methods, which often involve complex optimization heuristics or costly model ensembles.Accepted manuscrip
Air Quality Prediction in Smart Cities Using Machine Learning Technologies Based on Sensor Data: A Review
The influence of machine learning technologies is rapidly increasing and penetrating almost in every field, and air pollution prediction is not being excluded from those fields. This paper covers the revision of the studies related to air pollution prediction using machine learning algorithms based on sensor data in the context of smart cities. Using the most popular databases and executing the corresponding filtration, the most relevant papers were selected. After thorough reviewing those papers, the main features were extracted, which served as a base to link and compare them to each other. As a result, we can conclude that: (1) instead of using simple machine learning techniques, currently, the authors apply advanced and sophisticated techniques, (2) China was the leading country in terms of a case study, (3) Particulate matter with diameter equal to 2.5 micrometers was the main prediction target, (4) in 41% of the publications the authors carried out the prediction for the next day, (5) 66% of the studies used data had an hourly rate, (6) 49% of the papers used open data and since 2016 it had a tendency to increase, and (7) for efficient air quality prediction it is important to consider the external factors such as weather conditions, spatial characteristics, and temporal features
- …