172 research outputs found

    Advancing Vision Transformers with Group-Mix Attention

    Full text link
    Vision Transformers (ViTs) have been shown to enhance visual recognition through modeling long-range dependencies with multi-head self-attention (MHSA), which is typically formulated as Query-Key-Value computation. However, the attention map generated from the Query and Key captures only token-to-token correlations at one single granularity. In this paper, we argue that self-attention should have a more comprehensive mechanism to capture correlations among tokens and groups (i.e., multiple adjacent tokens) for higher representational capacity. Thereby, we propose Group-Mix Attention (GMA) as an advanced replacement for traditional self-attention, which can simultaneously capture token-to-token, token-to-group, and group-to-group correlations with various group sizes. To this end, GMA splits the Query, Key, and Value into segments uniformly and performs different group aggregations to generate group proxies. The attention map is computed based on the mixtures of tokens and group proxies and used to re-combine the tokens and groups in Value. Based on GMA, we introduce a powerful backbone, namely GroupMixFormer, which achieves state-of-the-art performance in image classification, object detection, and semantic segmentation with fewer parameters than existing models. For instance, GroupMixFormer-L (with 70.3M parameters and 384^2 input) attains 86.2% Top-1 accuracy on ImageNet-1K without external data, while GroupMixFormer-B (with 45.8M parameters) attains 51.2% mIoU on ADE20K

    MetaBEV: Solving Sensor Failures for BEV Detection and Map Segmentation

    Full text link
    Perception systems in modern autonomous driving vehicles typically take inputs from complementary multi-modal sensors, e.g., LiDAR and cameras. However, in real-world applications, sensor corruptions and failures lead to inferior performances, thus compromising autonomous safety. In this paper, we propose a robust framework, called MetaBEV, to address extreme real-world environments involving overall six sensor corruptions and two extreme sensor-missing situations. In MetaBEV, signals from multiple sensors are first processed by modal-specific encoders. Subsequently, a set of dense BEV queries are initialized, termed meta-BEV. These queries are then processed iteratively by a BEV-Evolving decoder, which selectively aggregates deep features from either LiDAR, cameras, or both modalities. The updated BEV representations are further leveraged for multiple 3D prediction tasks. Additionally, we introduce a new M2oE structure to alleviate the performance drop on distinct tasks in multi-task joint learning. Finally, MetaBEV is evaluated on the nuScenes dataset with 3D object detection and BEV map segmentation tasks. Experiments show MetaBEV outperforms prior arts by a large margin on both full and corrupted modalities. For instance, when the LiDAR signal is missing, MetaBEV improves 35.5% detection NDS and 17.7% segmentation mIoU upon the vanilla BEVFusion model; and when the camera signal is absent, MetaBEV still achieves 69.2% NDS and 53.7% mIoU, which is even higher than previous works that perform on full-modalities. Moreover, MetaBEV performs fairly against previous methods in both canonical perception and multi-task learning settings, refreshing state-of-the-art nuScenes BEV map segmentation with 70.4% mIoU.Comment: Project page: https://chongjiange.github.io/metabev.htm

    DeepAccident: A Motion and Accident Prediction Benchmark for V2X Autonomous Driving

    Full text link
    Safety is the primary priority of autonomous driving. Nevertheless, no published dataset currently supports the direct and explainable safety evaluation for autonomous driving. In this work, we propose DeepAccident, a large-scale dataset generated via a realistic simulator containing diverse accident scenarios that frequently occur in real-world driving. The proposed DeepAccident dataset contains 57K annotated frames and 285K annotated samples, approximately 7 times more than the large-scale nuScenes dataset with 40k annotated samples. In addition, we propose a new task, end-to-end motion and accident prediction, based on the proposed dataset, which can be used to directly evaluate the accident prediction ability for different autonomous driving algorithms. Furthermore, for each scenario, we set four vehicles along with one infrastructure to record data, thus providing diverse viewpoints for accident scenarios and enabling V2X (vehicle-to-everything) research on perception and prediction tasks. Finally, we present a baseline V2X model named V2XFormer that demonstrates superior performance for motion and accident prediction and 3D object detection compared to the single-vehicle model

    Speed Co-Augmentation for Unsupervised Audio-Visual Pre-training

    Full text link
    This work aims to improve unsupervised audio-visual pre-training. Inspired by the efficacy of data augmentation in visual contrastive learning, we propose a novel speed co-augmentation method that randomly changes the playback speeds of both audio and video data. Despite its simplicity, the speed co-augmentation method possesses two compelling attributes: (1) it increases the diversity of audio-visual pairs and doubles the size of negative pairs, resulting in a significant enhancement in the learned representations, and (2) it changes the strict correlation between audio-visual pairs but introduces a partial relationship between the augmented pairs, which is modeled by our proposed SoftInfoNCE loss to further boost the performance. Experimental results show that the proposed method significantly improves the learned representations when compared to vanilla audio-visual contrastive learning.Comment: Published at the CVPR 2023 Sight and Sound worksho

    Comparison of Analysis and Spectral Nudging Techniques for Dynamical Downscaling with the WRF Model over China

    Get PDF
    To overcome the problem that the horizontal resolution of global climate models may be too low to resolve features which are important at the regional or local scales, dynamical downscaling has been extensively used. However, dynamical downscaling results generally drift away from large-scale driving fields. The nudging technique can be used to balance the performance of dynamical downscaling at large and small scales, but the performances of the two nudging techniques (analysis nudging and spectral nudging) are debated. Moreover, dynamical downscaling is now performed at the convection-permitting scale to reduce the parameterization uncertainty and obtain the finer resolution. To compare the performances of the two nudging techniques in this study, three sensitivity experiments (with no nudging, analysis nudging, and spectral nudging) covering a period of two months with a grid spacing of 6 km over continental China are conducted to downscale the 1-degree National Centers for Environmental Prediction (NCEP) dataset with the Weather Research and Forecasting (WRF) model. Compared with observations, the results show that both of the nudging experiments decrease the bias of conventional meteorological elements near the surface and at different heights during the process of dynamical downscaling. However, spectral nudging outperforms analysis nudging for predicting precipitation, and analysis nudging outperforms spectral nudging for the simulation of air humidity and wind speed

    Relationships of beans intake with chronic kidney disease in rural adults: A large-scale cross-sectional study

    Get PDF
    Background and aimsDietary factors play an important role in the development of chronic kidney disease (CKD). However, evidence on the relationship of beans consumption with CKD remains limited and inconclusive, especially in the middle-and low-income populations. The current study aimed to investigate the relationships of beans intake with indicators of kidney injury and CKD prevalence in rural adults.MethodsA total of 20,733 rural adults from the Henan Rural Cohort Study in 2018–2022 were included. The total beans intake was collected using a validated food frequency questionnaire. Indicators of kidney injury and CKD was determined by the estimated glomerular filtration rate and the urinary albumin to creatinine ratio. Generalized linear regression and logistic regression models were applied to estimate the relationship of beans intake with continuous and dichotomized indicators of renal function, respectively.ResultsOf the 20,733 participants, 2,676 (12.91%) subjects were identified as CKD patients. After adjusting for potential confounders, participants in the higher quartiles of beans intake had a lower prevalence of CKD (odds ratio and 95% confidence interval, OR (95%CI); Q2: 0.968(0.866–1.082); Q3: 0.836(0.744–0.939); Q4: 0.854(0.751–0.970)) and albuminuria (Q2: 0.982(0.875–1.102); Q3: 0.846(0.750–0.954); Q4: 0.852 (0.746–0.973)), compared with the Q1. Per 50 g/day increment in beans intake was significantly associated with a 5 and 4% decreased prevalence of albuminuria and CKD, respectively. These inverse relationships were also significant in the subgroups of men, elder, and high-income participants (p < 0.05).ConclusionDietary beans intake was inversely associated with the prevalence of albuminuria and CKD in rural adults, suggesting that promoting soy food intake might help reduce the occurrence of CKD in rural adults

    Simultaneous radical cystectomy and nephroureterectomy in the treatment of panurothelial carcinoma: a systematic review and single-arm meta-analysis

    Get PDF
    BackgroundPanurothelial carcinoma is a rare and aggressive malignancy that requires effective treatment strategies to enhance patient outcomes.MethodsWe conducted a systematic search of English publications in databases including PubMed, Embase, Cochrane Library, and Web of Science up to May 2023. The quality of the literature was assessed using the Newcastle-Ottawa Scale (NOS) and the Methodological Quality and Synthesis of Case Series and Case Reports tool. Data statistics and analysis were performed using Stata 15.1 software (StataSE, USA).ResultsSix studies involving 339 patients were included in the analysis. Meta-analysis revealed that Simultaneous Radical Cystectomy and Nephroureterectomy had 2-year and 5-year overall survival rates of 68% (95% CI 60%-76%, I2 = 12.4%, P < 0.001) and 44% (95% CI 36%-53%, I2 = 0, P < 0.001), respectively. The 2-year and 5-year progression-free survival rates were 91% (95% CI 86%-95%, I2 = 95%, P < 0.001) and 65% (95% CI 58%-73%, I2 = 91.5%, P < 0.001), respectively. The 2-year and 5-year cancer-specific survival rates were 73% (95% CI 66%-81%, I2 = 16.7%, P < 0.001) and 57% (95% CI 49%-66%, I2 = 0, P < 0.001), respectively. Additionally, the incidence of minor complications was 19% (95% CI 15%-23%, P < 0.01), major complications was 49% (95% CI 34%-63%, P < 0.01), and the intraoperative blood transfusion rate was 53% (95% CI 44%-61%, P < 0.01).ConclusionsSimultaneous radical cystectomy and nephroureterectomy represent feasible approaches for the treatment of Panurothelial carcinoma. Nonetheless, a comprehensive assessment of the surgical risks and benefits is imperative, and larger-scale prospective cohort studies are required to validate therapeutic efficacy. Systematic review registrationhttps://www.crd.york.ac.uk/PROSPERO, identifier CRD42023426401

    Is first pregnancy age associated with hypertension in the Chinese rural women population?

    Get PDF
    IntroductionThe purpose of this study was to investigate the relationship between first pregnancy age and hypertension later in the life of women from Chinese rural areas.MethodsIn total, 13,493 women were enrolled in the Henan Rural Cohort study. Logistic regression and linear regression were used to evaluate the association between first pregnancy age and hypertension and blood pressure indicators [including systolic blood pressure (SBP), diastolic blood pressure (DBP), and mean arterial pressure (MAP)]. The restricted cubic spline was used to examine the dose–response relationship between the first pregnancy age and hypertension or blood pressure indicators.ResultsAfter adjusting for potential confounders, each 1-year increase in first pregnancy age was associated with a 0.221 mmHg increase in SBP values, a 0.153 mmHg increase in DBP values, and a 0.176 mmHg decrease in MAP values (all P < 0.05). The β of SBP, DBP, and MAP showed a trend of first increasing and then decreasing with increasing first pregnancy age and there was no statistical significance after first pregnancy age beyond 33 years on SBP, DBP, and MAP, respectively. A 1-year increment in first pregnancy age was associated with a 2.9% [OR (95% CI): 1.029 (1.010, 1.048)] higher odds of prevalent hypertension. The odds of hypertension increased sharply and then eventually leveled off with an increment of first pregnancy age after adjusting for potential confounders.ConclusionFirst pregnancy age might increase the risk of hypertension later in life and might be an independent risk factor for hypertension in women
    • …
    corecore