48 research outputs found

    Context Perception Parallel Decoder for Scene Text Recognition

    Full text link
    Scene text recognition (STR) methods have struggled to attain high accuracy and fast inference speed. Autoregressive (AR)-based STR model uses the previously recognized characters to decode the next character iteratively. It shows superiority in terms of accuracy. However, the inference speed is slow also due to this iteration. Alternatively, parallel decoding (PD)-based STR model infers all the characters in a single decoding pass. It has advantages in terms of inference speed but worse accuracy, as it is difficult to build a robust recognition context in such a pass. In this paper, we first present an empirical study of AR decoding in STR. In addition to constructing a new AR model with the top accuracy, we find out that the success of AR decoder lies also in providing guidance on visual context perception rather than language modeling as claimed in existing studies. As a consequence, we propose Context Perception Parallel Decoder (CPPD) to decode the character sequence in a single PD pass. CPPD devises a character counting module and a character ordering module. Given a text instance, the former infers the occurrence count of each character, while the latter deduces the character reading order and placeholders. Together with the character prediction task, they construct a context that robustly tells what the character sequence is and where the characters appear, well mimicking the context conveyed by AR decoding. Experiments on both English and Chinese benchmarks demonstrate that CPPD models achieve highly competitive accuracy. Moreover, they run approximately 7x faster than their AR counterparts, and are also among the fastest recognizers. The code will be released soon

    Unsupervised Cross-Domain Rumor Detection with Contrastive Learning and Cross-Attention

    No full text
    Massive rumors usually appear along with breaking news or trending topics, seriously hindering the truth. Existing rumor detection methods are mostly focused on the same domain, thus have poor performance in cross-domain scenarios due to domain shift. In this work, we propose an end-to-end instance-wise and prototype-wise contrastive learning model with cross-attention mechanism for cross-domain rumor detection. The model not only performs cross-domain feature alignment, but also enforces target samples to align with the corresponding prototypes of a given source domain. Since target labels in a target domain are unavailable, we use a clustering-based approach with carefully initialized centers by a batch of source domain samples to produce pseudo labels. Moreover, we use a cross-attention mechanism on a pair of source data and target data with the same labels to learn domain-invariant representations. Because the samples in a domain pair tend to express similar semantic patterns especially on the people’s attitudes (e.g., supporting or denying) towards the same category of rumors, the discrepancy between a pair of source domain and target domain will be decreased. We conduct experiments on four groups of cross-domain datasets and show that our proposed model achieves state-of-the-art performance

    Deep Embedded Clustering with Distribution Consistency Preservation for Attributed Networks

    Full text link
    Many complex systems in the real world can be characterized by attributed networks. To mine the potential information in these networks, deep embedded clustering, which obtains node representations and clusters simultaneously, has been paid much attention in recent years. Under the assumption of consistency for data in different views, the cluster structure of network topology and that of node attributes should be consistent for an attributed network. However, many existing methods ignore this property, even though they separately encode node representations from network topology and node attributes meanwhile clustering nodes on representation vectors learnt from one of the views. Therefore, in this study, we propose an end-to-end deep embedded clustering model for attributed networks. It utilizes graph autoencoder and node attribute autoencoder to respectively learn node representations and cluster assignments. In addition, a distribution consistency constraint is introduced to maintain the latent consistency of cluster distributions of two views. Extensive experiments on several datasets demonstrate that the proposed model achieves significantly better or competitive performance compared with the state-of-the-art methods. The source code can be found at https://github.com/Zhengymm/DCP.Comment: 28 pages, 5 figure

    A Generative Node-attribute Network Model for Detecting Generalized Structure

    Full text link
    Exploring meaningful structural regularities embedded in networks is a key to understanding and analyzing the structure and function of a network. The node-attribute information can help improve such understanding and analysis. However, most of the existing methods focus on detecting traditional communities, i.e., groupings of nodes with dense internal connections and sparse external ones. In this paper, based on the connectivity behavior of nodes and homogeneity of attributes, we propose a principle model (named GNAN), which can generate both topology information and attribute information. The new model can detect not only community structure, but also a range of other types of structure in networks, such as bipartite structure, core-periphery structure, and their mixture structure, which are collectively referred to as generalized structure. The proposed model that combines topological information and node-attribute information can detect communities more accurately than the model that only uses topology information. The dependency between attributes and communities can be automatically learned by our model and thus we can ignore the attributes that do not contain useful information. The model parameters are inferred by using the expectation-maximization algorithm. And a case study is provided to show the ability of our model in the semantic interpretability of communities. Experiments on both synthetic and real-world networks show that the new model is competitive with other state-of-the-art models
    corecore