166 research outputs found
MAT: A Multimodal Attentive Translator for Image Captioning
In this work we formulate the problem of image captioning as a multimodal
translation task. Analogous to machine translation, we present a
sequence-to-sequence recurrent neural networks (RNN) model for image caption
generation. Different from most existing work where the whole image is
represented by convolutional neural network (CNN) feature, we propose to
represent the input image as a sequence of detected objects which feeds as the
source sequence of the RNN model. In this way, the sequential representation of
an image can be naturally translated to a sequence of words, as the target
sequence of the RNN model. To represent the image in a sequential way, we
extract the objects features in the image and arrange them in a order using
convolutional neural networks. To further leverage the visual information from
the encoded objects, a sequential attention layer is introduced to selectively
attend to the objects that are related to generate corresponding words in the
sentences. Extensive experiments are conducted to validate the proposed
approach on popular benchmark dataset, i.e., MS COCO, and the proposed model
surpasses the state-of-the-art methods in all metrics following the dataset
splits of previous work. The proposed approach is also evaluated by the
evaluation server of MS COCO captioning challenge, and achieves very
competitive results, e.g., a CIDEr of 1.029 (c5) and 1.064 (c40)
Feature screening for clustering analysis
In this paper, we consider feature screening for ultrahigh dimensional
clustering analyses. Based on the observation that the marginal distribution of
any given feature is a mixture of its conditional distributions in different
clusters, we propose to screen clustering features by independently evaluating
the homogeneity of each feature's mixture distribution. Important
cluster-relevant features have heterogeneous components in their mixture
distributions and unimportant features have homogeneous components. The
well-known EM-test statistic is used to evaluate the homogeneity. Under general
parametric settings, we establish the tail probability bounds of the EM-test
statistic for the homogeneous and heterogeneous features, and further show that
the proposed screening procedure can achieve the sure independent screening and
even the consistency in selection properties. Limiting distribution of the
EM-test statistic is also obtained for general parametric distributions. The
proposed method is computationally efficient, can accurately screen for
important cluster-relevant features and help to significantly improve
clustering, as demonstrated in our extensive simulation and real data analyses
- …