263 research outputs found

    Attention Clusters: Purely Attention Based Local Feature Integration for Video Classification

    Full text link
    Recently, substantial research effort has focused on how to apply CNNs or RNNs to better extract temporal patterns from videos, so as to improve the accuracy of video classification. In this paper, however, we show that temporal information, especially longer-term patterns, may not be necessary to achieve competitive results on common video classification datasets. We investigate the potential of a purely attention based local feature integration. Accounting for the characteristics of such features in video classification, we propose a local feature integration framework based on attention clusters, and introduce a shifting operation to capture more diverse signals. We carefully analyze and compare the effect of different attention mechanisms, cluster sizes, and the use of the shifting operation, and also investigate the combination of attention clusters for multimodal integration. We demonstrate the effectiveness of our framework on three real-world video classification datasets. Our model achieves competitive results across all of these. In particular, on the large-scale Kinetics dataset, our framework obtains an excellent single model accuracy of 79.4% in terms of the top-1 and 94.0% in terms of the top-5 accuracy on the validation set. The attention clusters are the backbone of our winner solution at ActivityNet Kinetics Challenge 2017. Code and models will be released soon.Comment: The backbone of the winner solution at ActivityNet Kinetics Challenge 201

    EAST: An Efficient and Accurate Scene Text Detector

    Full text link
    Previous approaches for scene text detection have already achieved promising performances across various benchmarks. However, they usually fall short when dealing with challenging scenarios, even when equipped with deep neural network models, because the overall performance is determined by the interplay of multiple stages and components in the pipelines. In this work, we propose a simple yet powerful pipeline that yields fast and accurate text detection in natural scenes. The pipeline directly predicts words or text lines of arbitrary orientations and quadrilateral shapes in full images, eliminating unnecessary intermediate steps (e.g., candidate aggregation and word partitioning), with a single neural network. The simplicity of our pipeline allows concentrating efforts on designing loss functions and neural network architecture. Experiments on standard datasets including ICDAR 2015, COCO-Text and MSRA-TD500 demonstrate that the proposed algorithm significantly outperforms state-of-the-art methods in terms of both accuracy and efficiency. On the ICDAR 2015 dataset, the proposed algorithm achieves an F-score of 0.7820 at 13.2fps at 720p resolution.Comment: Accepted to CVPR 2017, fix equation (3

    BigTrans: Augmenting Large Language Models with Multilingual Translation Capability over 100 Languages

    Full text link
    Large language models (LLMs) demonstrate promising translation performance among various natural languages. However, many LLMs especially the open-sourced ones, such as BLOOM and LLaMA, are English-dominant and support only dozens of natural languages, making the potential of LLMs on language translation less explored. In this work, we present BigTrans which adapts LLaMA that covers only 20 languages and enhances it with multilingual translation capability on more than 100 languages. BigTrans is built upon LLaMA-13B and it is optimized in three steps. First, we continue training LLaMA with massive Chinese monolingual data. Second, we continue training the model with a large-scale parallel dataset that covers 102 natural languages. Third, we instruct-tune the foundation model with multilingual translation instructions, leading to our BigTrans model. The preliminary experiments on multilingual translation show that BigTrans performs comparably with ChatGPT and Google Translate in many languages and even outperforms ChatGPT in 8 language pairs. We release the BigTrans model and hope it can advance the research progress.Comment: 12 pages, 4 figures. Our model is available at https://github.com/ZNLP/BigTran

    Conditional ablation of HDAC3 in islet beta cells results in glucose intolerance and enhanced susceptibility to STZ-induced diabetes

    Get PDF
    Histone deacetylases (HDACs) are enzymes that regulate gene expression by modifying chromatin structure through removal of acetyl groups from target histones or non-histone proteins. Previous in vitro studies suggest that HDACs may be novel pharmacological targets in immune-mediated islet β-cell destruction. However, the role of specific HDAC in islet β-cell development and function remain unclear. Here, we generated a conditional islet β-cells specific HDAC3 deletion mouse model to determine the consequences of HDAC3 depletion on islet β-cell differentiation, maintenance and function. Islet morphology, insulin secretion, glucose tolerance, and multiple low-dose streptozotocin (STZ)-induced diabetes incidence were evaluated and compared between HDAC3 knockout and wild type littermate controls. Mice with β-cell-specific HDAC3 deletion displayed decreased pancreatic insulin content, disrupted glucose-stimulated insulin secretion, with intermittent spontaneous diabetes and dramatically enhanced susceptibility to STZ-induced diabetes. Furthermore, islet β-cell line, MIN6 cells with siRNA-mediated HDAC3 silence, showed decreased insulin gene transcription, which was mediated, at least partially, through the upregulation of suppressors of cytokine signaling 3 (SOCS3). These results indicate the critical role of HDAC3 in normal β-cell differentiation, maintenance and function

    Finding regions of interest using location based social media

    Get PDF
    The discovery of regions of interest in city groups is increasingly important in recent years. In this light, we propose and investigate a novel problem called Region Discovery query (RD query) that finds regions of interest with respect to a user's current geographic location. Given a set of spatial objects O and a query location q, if a circular region ω is with high spatial-object density and is spatially close to q, it is returned by the query and is recommended to users. This type of query can bring significant benefit to users in many useful applications such as trip planning and region recommendation. The RD query faces a big challenge: how to prune the search space in the spatial and density domains. To overcome the challenge and process the RD query efficiently, we propose a novel collaboration search method and we define a pair of bounds to prune the search space effectively. The performance of the RD query is studied by extensive experiments on real and synthetic spatial data

    The transition from incoherent to coherent random laser in defect waveguide based on organic/inorganic hybrid laser dye

    Get PDF
    This paper systematically demonstrated a variety of experimental phenomena of random lasers (RLs) of N,N′-di-(3-(isobutyl polyhedral oligomeric silsesquioxanes)propyl) perylene diimide (DPP) organic/inorganic hybrid laser dye, which is composed of perylene diimide (PDI) as gain media and polyhedral oligomeric silsesquioxanes (POSS) as scattering media at a mole ratio of 1:2. In this work, we observe the transition from incoherent RL in the DPP-doped solutions and polymer membrane systems using dip-coating method to coherent RL in the polymer membrane system with defect waveguide using semi-polymerization (SP) coating method. Meanwhile, we found that the hybrid dye-DPP has a long lasing lifetime compared with the traditional laser dyes, which indicates that the POSS group can suppress the photo-bleaching effect to extend the working life of laser dyes
    • …
    corecore