69,626 research outputs found

    DIANet: Dense-and-Implicit Attention Network

    Full text link
    Attention networks have successfully boosted the performance in various vision problems. Previous works lay emphasis on designing a new attention module and individually plug them into the networks. Our paper proposes a novel-and-simple framework that shares an attention module throughout different network layers to encourage the integration of layer-wise information and this parameter-sharing module is referred as Dense-and-Implicit-Attention (DIA) unit. Many choices of modules can be used in the DIA unit. Since Long Short Term Memory (LSTM) has a capacity of capturing long-distance dependency, we focus on the case when the DIA unit is the modified LSTM (refer as DIA-LSTM). Experiments on benchmark datasets show that the DIA-LSTM unit is capable of emphasizing layer-wise feature interrelation and leads to significant improvement of image classification accuracy. We further empirically show that the DIA-LSTM has a strong regularization ability on stabilizing the training of deep networks by the experiments with the removal of skip connections or Batch Normalization in the whole residual network. The code is released at https://github.com/gbup-group/DIANet

    An Expressive Deep Model for Human Action Parsing from A Single Image

    Full text link
    This paper aims at one newly raising task in vision and multimedia research: recognizing human actions from still images. Its main challenges lie in the large variations in human poses and appearances, as well as the lack of temporal motion information. Addressing these problems, we propose to develop an expressive deep model to naturally integrate human layout and surrounding contexts for higher level action understanding from still images. In particular, a Deep Belief Net is trained to fuse information from different noisy sources such as body part detection and object detection. To bridge the semantic gap, we used manually labeled data to greatly improve the effectiveness and efficiency of the pre-training and fine-tuning stages of the DBN training. The resulting framework is shown to be robust to sometimes unreliable inputs (e.g., imprecise detections of human parts and objects), and outperforms the state-of-the-art approaches.Comment: 6 pages, 8 figures, ICME 201

    Recognizing Focal Liver Lesions in Contrast-Enhanced Ultrasound with Discriminatively Trained Spatio-Temporal Model

    Full text link
    The aim of this study is to provide an automatic computational framework to assist clinicians in diagnosing Focal Liver Lesions (FLLs) in Contrast-Enhancement Ultrasound (CEUS). We represent FLLs in a CEUS video clip as an ensemble of Region-of-Interests (ROIs), whose locations are modeled as latent variables in a discriminative model. Different types of FLLs are characterized by both spatial and temporal enhancement patterns of the ROIs. The model is learned by iteratively inferring the optimal ROI locations and optimizing the model parameters. To efficiently search the optimal spatial and temporal locations of the ROIs, we propose a data-driven inference algorithm by combining effective spatial and temporal pruning. The experiments show that our method achieves promising results on the largest dataset in the literature (to the best of our knowledge), which we have made publicly available.Comment: 5 pages, 1 figure

    Joint Syntacto-Discourse Parsing and the Syntacto-Discourse Treebank

    Full text link
    Discourse parsing has long been treated as a stand-alone problem independent from constituency or dependency parsing. Most attempts at this problem are pipelined rather than end-to-end, sophisticated, and not self-contained: they assume gold-standard text segmentations (Elementary Discourse Units), and use external parsers for syntactic features. In this paper we propose the first end-to-end discourse parser that jointly parses in both syntax and discourse levels, as well as the first syntacto-discourse treebank by integrating the Penn Treebank with the RST Treebank. Built upon our recent span-based constituency parser, this joint syntacto-discourse parser requires no preprocessing whatsoever (such as segmentation or feature extraction), achieves the state-of-the-art end-to-end discourse parsing accuracy.Comment: Accepted at EMNLP 201

    Modular invariance for conformal full field algebras

    Full text link
    Let V^L and V^R be simple vertex operator algebras satisfying certain natural uniqueness-of-vacuum, complete reducibility and cofiniteness conditions and let F be a conformal full field algebra over the tensor product of V^L and V^R. We prove that the q_\tau-\bar{q_\tau}-traces (natural traces involving q_\tau=e^{2\pi i\tau} and \bar{q_\tau}=\bar{e^{2\pi i\tau}}) of geometrically modified genus-zero correlation functions for F are convergent in suitable regions and can be extended to doubly periodic functions with periods 1 and \tau. We obtain necessary and sufficient conditions for these functions to be modular invariant. In the case that V^L=V^R and F is one of those constructed by the authors in \cite{HK}, we prove that all these functions are modular invariant.Comment: 54 page
    • …
    corecore