Search CORE

69,626 research outputs found

DIANet: Dense-and-Implicit Attention Network

Author: Huang Zhongzhan
Liang Mingfu
Liang Senwei
Yang Haizhao
Publication venue
Publication date: 23/09/2019
Field of study

Attention networks have successfully boosted the performance in various vision problems. Previous works lay emphasis on designing a new attention module and individually plug them into the networks. Our paper proposes a novel-and-simple framework that shares an attention module throughout different network layers to encourage the integration of layer-wise information and this parameter-sharing module is referred as Dense-and-Implicit-Attention (DIA) unit. Many choices of modules can be used in the DIA unit. Since Long Short Term Memory (LSTM) has a capacity of capturing long-distance dependency, we focus on the case when the DIA unit is the modified LSTM (refer as DIA-LSTM). Experiments on benchmark datasets show that the DIA-LSTM unit is capable of emphasizing layer-wise feature interrelation and leads to significant improvement of image classification accuracy. We further empirically show that the DIA-LSTM has a strong regularization ability on stabilizing the training of deep networks by the experiments with the removal of skip connections or Batch Normalization in the whole residual network. The code is released at https://github.com/gbup-group/DIANet

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

An Expressive Deep Model for Human Action Parsing from A Single Image

Author: Huang Rui
Liang Zhujin
Lin Liang
Wang Xiaolong
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 02/02/2015
Field of study

This paper aims at one newly raising task in vision and multimedia research: recognizing human actions from still images. Its main challenges lie in the large variations in human poses and appearances, as well as the lack of temporal motion information. Addressing these problems, we propose to develop an expressive deep model to naturally integrate human layout and surrounding contexts for higher level action understanding from still images. In particular, a Deep Belief Net is trained to fuse information from different noisy sources such as body part detection and object detection. To bridge the semantic gap, we used manually labeled data to greatly improve the effectiveness and efficiency of the pre-training and fine-tuning stages of the DBN training. The resulting framework is shown to be robust to sometimes unreliable inputs (e.g., imprecise detections of human parts and objects), and outperforms the state-of-the-art approaches.Comment: 6 pages, 8 figures, ICME 201

arXiv.org e-Print Archive

Crossref

Recognizing Focal Liver Lesions in Contrast-Enhanced Ultrasound with Discriminatively Trained Spatio-Temporal Model

Author: Cao Qingxing
Huang Rui
Liang Xiaodan
Lin Liang
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 03/02/2015
Field of study

The aim of this study is to provide an automatic computational framework to assist clinicians in diagnosing Focal Liver Lesions (FLLs) in Contrast-Enhancement Ultrasound (CEUS). We represent FLLs in a CEUS video clip as an ensemble of Region-of-Interests (ROIs), whose locations are modeled as latent variables in a discriminative model. Different types of FLLs are characterized by both spatial and temporal enhancement patterns of the ROIs. The model is learned by iteratively inferring the optimal ROI locations and optimizing the model parameters. To efficiently search the optimal spatial and temporal locations of the ROIs, we propose a data-driven inference algorithm by combining effective spatial and temporal pruning. The experiments show that our method achieves promising results on the largest dataset in the literature (to the best of our knowledge), which we have made publicly available.Comment: 5 pages, 1 figure

arXiv.org e-Print Archive

Crossref

Joint Syntacto-Discourse Parsing and the Syntacto-Discourse Treebank

Author: Huang Liang
Zhao Kai
Publication venue
Publication date: 01/01/2017
Field of study

Discourse parsing has long been treated as a stand-alone problem independent from constituency or dependency parsing. Most attempts at this problem are pipelined rather than end-to-end, sophisticated, and not self-contained: they assume gold-standard text segmentations (Elementary Discourse Units), and use external parsers for syntactic features. In this paper we propose the first end-to-end discourse parser that jointly parses in both syntax and discourse levels, as well as the first syntacto-discourse treebank by integrating the Penn Treebank with the RST Treebank. Built upon our recent span-based constituency parser, this joint syntacto-discourse parser requires no preprocessing whatsoever (such as segmentation or feature extraction), achieves the state-of-the-art end-to-end discourse parsing accuracy.Comment: Accepted at EMNLP 201

arXiv.org e-Print Archive

Crossref

Modular invariance for conformal full field algebras

Author: Huang Yi-Zhi
Kong Liang
Publication venue: 'American Mathematical Society (AMS)'
Publication date: 01/01/2006
Field of study

Let V^L and V^R be simple vertex operator algebras satisfying certain natural uniqueness-of-vacuum, complete reducibility and cofiniteness conditions and let F be a conformal full field algebra over the tensor product of V^L and V^R. We prove that the q_\tau-\bar{q_\tau}-traces (natural traces involving q_\tau=e^{2\pi i\tau} and \bar{q_\tau}=\bar{e^{2\pi i\tau}}) of geometrically modified genus-zero correlation functions for F are convergent in suitable regions and can be extended to doubly periodic functions with periods 1 and \tau. We obtain necessary and sufficient conditions for these functions to be modular invariant. In the case that V^L=V^R and F is one of those constructed by the authors in \cite{HK}, we prove that all these functions are modular invariant.Comment: 54 page

arXiv.org e-Print Archive

CiteSeerX

Crossref