25 research outputs found

    Advancing Vision Transformers with Group-Mix Attention

    Full text link
    Vision Transformers (ViTs) have been shown to enhance visual recognition through modeling long-range dependencies with multi-head self-attention (MHSA), which is typically formulated as Query-Key-Value computation. However, the attention map generated from the Query and Key captures only token-to-token correlations at one single granularity. In this paper, we argue that self-attention should have a more comprehensive mechanism to capture correlations among tokens and groups (i.e., multiple adjacent tokens) for higher representational capacity. Thereby, we propose Group-Mix Attention (GMA) as an advanced replacement for traditional self-attention, which can simultaneously capture token-to-token, token-to-group, and group-to-group correlations with various group sizes. To this end, GMA splits the Query, Key, and Value into segments uniformly and performs different group aggregations to generate group proxies. The attention map is computed based on the mixtures of tokens and group proxies and used to re-combine the tokens and groups in Value. Based on GMA, we introduce a powerful backbone, namely GroupMixFormer, which achieves state-of-the-art performance in image classification, object detection, and semantic segmentation with fewer parameters than existing models. For instance, GroupMixFormer-L (with 70.3M parameters and 384^2 input) attains 86.2% Top-1 accuracy on ImageNet-1K without external data, while GroupMixFormer-B (with 45.8M parameters) attains 51.2% mIoU on ADE20K

    MetaBEV: Solving Sensor Failures for BEV Detection and Map Segmentation

    Full text link
    Perception systems in modern autonomous driving vehicles typically take inputs from complementary multi-modal sensors, e.g., LiDAR and cameras. However, in real-world applications, sensor corruptions and failures lead to inferior performances, thus compromising autonomous safety. In this paper, we propose a robust framework, called MetaBEV, to address extreme real-world environments involving overall six sensor corruptions and two extreme sensor-missing situations. In MetaBEV, signals from multiple sensors are first processed by modal-specific encoders. Subsequently, a set of dense BEV queries are initialized, termed meta-BEV. These queries are then processed iteratively by a BEV-Evolving decoder, which selectively aggregates deep features from either LiDAR, cameras, or both modalities. The updated BEV representations are further leveraged for multiple 3D prediction tasks. Additionally, we introduce a new M2oE structure to alleviate the performance drop on distinct tasks in multi-task joint learning. Finally, MetaBEV is evaluated on the nuScenes dataset with 3D object detection and BEV map segmentation tasks. Experiments show MetaBEV outperforms prior arts by a large margin on both full and corrupted modalities. For instance, when the LiDAR signal is missing, MetaBEV improves 35.5% detection NDS and 17.7% segmentation mIoU upon the vanilla BEVFusion model; and when the camera signal is absent, MetaBEV still achieves 69.2% NDS and 53.7% mIoU, which is even higher than previous works that perform on full-modalities. Moreover, MetaBEV performs fairly against previous methods in both canonical perception and multi-task learning settings, refreshing state-of-the-art nuScenes BEV map segmentation with 70.4% mIoU.Comment: Project page: https://chongjiange.github.io/metabev.htm

    DeepAccident: A Motion and Accident Prediction Benchmark for V2X Autonomous Driving

    Full text link
    Safety is the primary priority of autonomous driving. Nevertheless, no published dataset currently supports the direct and explainable safety evaluation for autonomous driving. In this work, we propose DeepAccident, a large-scale dataset generated via a realistic simulator containing diverse accident scenarios that frequently occur in real-world driving. The proposed DeepAccident dataset contains 57K annotated frames and 285K annotated samples, approximately 7 times more than the large-scale nuScenes dataset with 40k annotated samples. In addition, we propose a new task, end-to-end motion and accident prediction, based on the proposed dataset, which can be used to directly evaluate the accident prediction ability for different autonomous driving algorithms. Furthermore, for each scenario, we set four vehicles along with one infrastructure to record data, thus providing diverse viewpoints for accident scenarios and enabling V2X (vehicle-to-everything) research on perception and prediction tasks. Finally, we present a baseline V2X model named V2XFormer that demonstrates superior performance for motion and accident prediction and 3D object detection compared to the single-vehicle model

    Speed Co-Augmentation for Unsupervised Audio-Visual Pre-training

    Full text link
    This work aims to improve unsupervised audio-visual pre-training. Inspired by the efficacy of data augmentation in visual contrastive learning, we propose a novel speed co-augmentation method that randomly changes the playback speeds of both audio and video data. Despite its simplicity, the speed co-augmentation method possesses two compelling attributes: (1) it increases the diversity of audio-visual pairs and doubles the size of negative pairs, resulting in a significant enhancement in the learned representations, and (2) it changes the strict correlation between audio-visual pairs but introduces a partial relationship between the augmented pairs, which is modeled by our proposed SoftInfoNCE loss to further boost the performance. Experimental results show that the proposed method significantly improves the learned representations when compared to vanilla audio-visual contrastive learning.Comment: Published at the CVPR 2023 Sight and Sound worksho

    AMOS: A Large-Scale Abdominal Multi-Organ Benchmark for Versatile Medical Image Segmentation

    Full text link
    Despite the considerable progress in automatic abdominal multi-organ segmentation from CT/MRI scans in recent years, a comprehensive evaluation of the models' capabilities is hampered by the lack of a large-scale benchmark from diverse clinical scenarios. Constraint by the high cost of collecting and labeling 3D medical data, most of the deep learning models to date are driven by datasets with a limited number of organs of interest or samples, which still limits the power of modern deep models and makes it difficult to provide a fully comprehensive and fair estimate of various methods. To mitigate the limitations, we present AMOS, a large-scale, diverse, clinical dataset for abdominal organ segmentation. AMOS provides 500 CT and 100 MRI scans collected from multi-center, multi-vendor, multi-modality, multi-phase, multi-disease patients, each with voxel-level annotations of 15 abdominal organs, providing challenging examples and test-bed for studying robust segmentation algorithms under diverse targets and scenarios. We further benchmark several state-of-the-art medical segmentation models to evaluate the status of the existing methods on this new challenging dataset. We have made our datasets, benchmark servers, and baselines publicly available, and hope to inspire future research. Information can be found at https://amos22.grand-challenge.org

    Dynamic doping and Cottrell atmosphere optimize the thermoelectric performance of n-type PbTe

    Full text link
    High thermoelectric energy conversion efficiency requires a large figure-of-merit, zT, over a broad temperature range. To achieve this, we optimize the carrier concentrations of n-type PbTe from room up to hot-end temperatures by co-doping Bi and Ag. Bi is an efficient n-type dopant in PbTe, often leading to excessive carrier concentration at room temperature. As revealed by density functional theory calculations, the formation of Bi and Ag defect complexes is exploited to optimize the room temperature carrier concentration. At elevated temperatures, we demonstrate the dynamic dissolution of Ag2Te precipitates in PbTe in situ by heating in a scanning transmission electron microscope. The release of n-type Ag interstitials with increasing temperature fulfills the requirement of higher carrier concentrations at the hot end. Moreover, as characterized by atom probe tomography, Ag atoms aggregate along parallel dislocation arrays to form Cottrell atmospheres. This results in enhanced phonon scattering and leads to a low lattice thermal conductivity. As a result of the synergy of dynamic doping and phonon scattering at decorated dislocations, an average zT of 1.0 is achieved in n-type Bi/Ag-codoped PbTe between 400 and 825 K. Introducing dopants with temperature-dependent solubility and strong interaction with dislocation cores enables simultaneous optimization of the average power factor and thermal conductivity, providing a new concept to exploit in the field of thermoelectrics

    Associations of risk factor burden and genetic predisposition with the 10-year risk of atrial fibrillation: observations from a large prospective study of 348,904 participants

    Get PDF
    BackgroundUnderstanding the effects of risk factor burden and genetic predisposition on the long-term risk of atrial fibrillation (AF) is important to improve public health initiatives. However, the 10-year risk of AF considering risk factor burden and genetic predisposition is unknown.MethodsA total of 348,904 genetically unrelated participants without AF at baseline from the UK were categorized into three groups: index ages 45 years (n = 84,206), 55 years (n=117,520), and 65 years (n=147,178). Optimal, borderline, or elevated risk factor burden was determined by body mass index, blood pressure, diabetes mellitus, alcohol consumption, smoking status, and history of myocardial infarction or heart failure. Genetic predisposition was estimated using the polygenic risk score (PRS), constructed using 165 predefined genetic risk variants. The combined effects of risk factor burden and PRS on the risk of incident AF in 10 years were estimated for each index age. Fine and Gray models were developed to predict the 10-year risk of AF.ResultsThe overall 10-year risk of AF was 0.67% (95% CI: 0.61-0.73%) for index age 45 years, 2.05% (95% CI: 1.96-2.13%) for index age 55 years, and 6.34% (95% CI: 6.21-6.46%) for index age 65 years, respectively. An optimal risk factor burden was associated with later AF onset regardless of genetic predisposition and sex (P ConclusionsRisk factor burden together with a genetic predisposition is associated with the 10-year risk of AF. Our results may be helpful in selecting high-risk individuals for primary prevention of AF and facilitating subsequent health interventions

    Application of X-Ray Inspection for Ultra High Voltage Gas-Insulated Switchgear

    No full text
    corecore