3,214 research outputs found
MDNet: A Semantically and Visually Interpretable Medical Image Diagnosis Network
The inability to interpret the model prediction in semantically and visually
meaningful ways is a well-known shortcoming of most existing computer-aided
diagnosis methods. In this paper, we propose MDNet to establish a direct
multimodal mapping between medical images and diagnostic reports that can read
images, generate diagnostic reports, retrieve images by symptom descriptions,
and visualize attention, to provide justifications of the network diagnosis
process. MDNet includes an image model and a language model. The image model is
proposed to enhance multi-scale feature ensembles and utilization efficiency.
The language model, integrated with our improved attention mechanism, aims to
read and explore discriminative image feature descriptions from reports to
learn a direct mapping from sentence words to image pixels. The overall network
is trained end-to-end by using our developed optimization strategy. Based on a
pathology bladder cancer images and its diagnostic reports (BCIDR) dataset, we
conduct sufficient experiments to demonstrate that MDNet outperforms
comparative baselines. The proposed image model obtains state-of-the-art
performance on two CIFAR datasets as well.Comment: CVPR2017 Ora
Deep Architectures and Ensembles for Semantic Video Classification
This work addresses the problem of accurate semantic labelling of short
videos. To this end, a multitude of different deep nets, ranging from
traditional recurrent neural networks (LSTM, GRU), temporal agnostic networks
(FV,VLAD,BoW), fully connected neural networks mid-stage AV fusion and others.
Additionally, we also propose a residual architecture-based DNN for video
classification, with state-of-the art classification performance at
significantly reduced complexity. Furthermore, we propose four new approaches
to diversity-driven multi-net ensembling, one based on fast correlation measure
and three incorporating a DNN-based combiner. We show that significant
performance gains can be achieved by ensembling diverse nets and we investigate
factors contributing to high diversity. Based on the extensive YouTube8M
dataset, we provide an in-depth evaluation and analysis of their behaviour. We
show that the performance of the ensemble is state-of-the-art achieving the
highest accuracy on the YouTube-8M Kaggle test data. The performance of the
ensemble of classifiers was also evaluated on the HMDB51 and UCF101 datasets,
and show that the resulting method achieves comparable accuracy with
state-of-the-art methods using similar input features
- …