36 research outputs found

    Slide-Transformer: Hierarchical Vision Transformer with Local Self-Attention

    Full text link
    Self-attention mechanism has been a key factor in the recent progress of Vision Transformer (ViT), which enables adaptive feature extraction from global contexts. However, existing self-attention methods either adopt sparse global attention or window attention to reduce the computation complexity, which may compromise the local feature learning or subject to some handcrafted designs. In contrast, local attention, which restricts the receptive field of each query to its own neighboring pixels, enjoys the benefits of both convolution and self-attention, namely local inductive bias and dynamic feature selection. Nevertheless, current local attention modules either use inefficient Im2Col function or rely on specific CUDA kernels that are hard to generalize to devices without CUDA support. In this paper, we propose a novel local attention module, Slide Attention, which leverages common convolution operations to achieve high efficiency, flexibility and generalizability. Specifically, we first re-interpret the column-based Im2Col function from a new row-based perspective and use Depthwise Convolution as an efficient substitution. On this basis, we propose a deformed shifting module based on the re-parameterization technique, which further relaxes the fixed key/value positions to deformed features in the local region. In this way, our module realizes the local attention paradigm in both efficient and flexible manner. Extensive experiments show that our slide attention module is applicable to a variety of advanced Vision Transformer models and compatible with various hardware devices, and achieves consistently improved performances on comprehensive benchmarks. Code is available at https://github.com/LeapLabTHU/Slide-Transformer.Comment: Accepted to CVPR202

    The role studies of fixed-wings in underwater fan-wing thrusters

    No full text
    Fan-wings, as a form of aircraft propulsion, have shown effectiveness for underwater propulsion. The underwater fan-wing thruster (UFT), which is composed of a fixed-wing and a cross-flow fan under the fixed-wing, can generate significate vertical force underwater. Although some experiments and simulations are carried out to analyze the UFT, the role of the fixed-wing to the UFT has not been researched in depth. In this paper, four parameters: the front opening angle, rear opening angle, upper horizontal length, and incoming flow angle are selected for the role study in four aspects, including vertical force, thrust force, torque, and efficiency. Computational fluid dynamics (CFD) simulations in different rotational speeds of the cross-flow fan and incoming flow velocities conditions are carried out for the UFTs with different parameters, which concurred with towing experiments. Self-driving experiments are also carried out for studying the actual cruise performance of the UFT with different parameters. After that, mechanisms of the effects are analyzed on many velocity magnitude cloud maps from CFD simulation. A physical model is utilized to explain some of the unique fluid phenomena. Finally, the role of the fixed-wing is summarized, and some principle on parameter setting is proposed

    I Know the Relationships: Zero-Shot Action Recognition via Two-Stream Graph Convolutional Networks and Knowledge Graphs

    No full text
    Recently, with the ever-growing action categories, zero-shot action recognition (ZSAR) has been achieved by automatically mining the underlying concepts (e.g., actions, attributes) in videos. However, most existing methods only exploit the visual cues of these concepts but ignore external knowledge information for modeling explicit relationships between them. In fact, humans have remarkable ability to transfer knowledge learned from familiar classes to recognize unfamiliar classes. To narrow the knowledge gap between existing methods and humans, we propose an end-to-end ZSAR framework based on a structured knowledge graph, which can jointly model the relationships between action-attribute, action-action, and attribute-attribute. To effectively leverage the knowledge graph, we design a novel Two-Stream Graph Convolutional Network (TS-GCN) consisting of a classifier branch and an instance branch. Specifically, the classifier branch takes the semantic-embedding vectors of all the concepts as input, then generates the classifiers for action categories. The instance branch maps the attribute embeddings and scores of each video instance into an attribute-feature space. Finally, the generated classifiers are evaluated on the attribute features of each video, and a classification loss is adopted for optimizing the whole network. In addition, a self-attention module is utilized to model the temporal information of videos. Extensive experimental results on three realistic action benchmarks Olympic Sports, HMDB51 and UCF101 demonstrate the favorable performance of our proposed framework

    Deep Relative Tracking

    No full text

    Spatial Heterogeneity of Typical Ecosystem Services and Their Relationships in Different Ecological–Functional Zones in Beijing–Tianjin–Hebei Region, China

    No full text
    Recognizing changes in ecosystem services (ES) and their relationships is the basis of achieving sustainable regional development. Regional collaborative development has become the core strategy of the development of the Beijing–Tianjin–Hebei (BTH) region. However, sub regions have different ecological changes and relationships. Here, we quantify and map ES, including water yield, sediment retention, carbon sequestration and grain productive capacity in 2000, 2005, 2010 and 2015, using several biophysical models and explore the relationships of spatial correction, trade-offs and synergies among multiple ES in different spatial scales. Results across the four years show that the quality and variation tendency of ES from each region are spatially heterogeneous. The relationship between ES that are not significant in the entire region shows different correlations in individual ecological–functional zones. From the perspective of regional disparity, the effect of land use factor and correlative mechanisms among ES are analyzed. To observe the spatiotemporal variations and relationships of ES in individual regions, land use management policies are proposed on the basis of the results of the relationships among ES

    Application of Improved Eclat Algorithm in Students’ Evaluation of Teaching

    No full text
    The evaluation system of students is to find a way to solve the status way according to the exact needs of students and the teaching requirements of teachers, so as to improve the teaching level of teachers and improve the quality of school education. This paper uses the real evaluation sample and uses the data mining association rule algorithm to comprehensively analyze the massive data of the evaluation data and the basic information of the teacher. The purpose is to obtain the association rules between the teacher’s comprehensive information and its evaluation results. Using the evaluation data to explore its core issues. In this paper, the Eclat algorithm of association rules improves the problem of insufficient memory and occupying a large amount of time when searching for frequent itemsets in the data. The breadth-first algorithm is added to save operation time and improve the efficiency of the algorithm. The effectiveness of the improved algorithm is verified by comparative experiments and applied to the evaluation system so as to provide suggestions for the professional development of teachers from an objective perspective, and to build a harmonious, "people-oriented" evaluation system for students

    Multimodal hyperspectral remote sensing: an overview and perspective

    No full text
    International audienceSince the advent of hyperspectral remote sensing in the 1980s, it has made important achievements in aerospace and aviation field and been applied in many fields. Conventional hyperspectral imaging spectrometer extends the number of spectral bands to dozens or hundreds, and provides spatial distribution of the reflected solar radiation from the scene of observation at the same time. Nowadays, with the fast development of new technology in the fields of information and photoelectricity sensing, and the popularity of unmanned aerial vehicle, hyperspectral remote sensing imaging presents the new trends of multimodality and acquires integration information while keeping high or very-high spectral resolution, especially, high temporal even real time sensing and stereo sensing. Therefore, three important modes of hyperspectral imaging come into existence: (1) multitemporal hyperspectral imaging, which refers to the observation of same region at different dates; (2) hyperspectral video imaging, which captures full frame spectral images in real-time; (3) hyperspectral stereo imaging, which obtains the full dimension information (including 2D image, elevation, and spectra) of observed scene. Along this perspective, firstly, the current researches on hyperspectral remote sensing and image processing are briefly reviewed, and then, comprehensive descriptions of the aforementioned three main hyperspectral imaging modes are carried out from the following four aspects: fundamental principle of new mode of hyperspectral imaging, corresponding scientific data acquisition, data processing and application, and potential challenges in data representation, feature learning and interpretation. Through the analysis of development trend of hyperspectral imaging and current research situation, we hope to provide a direction for future research on multimodal hyperspectral remote sensing
    corecore