194 research outputs found

    DQ-Det: Learning Dynamic Query Combinations for Transformer-based Object Detection and Segmentation

    Full text link
    Transformer-based detection and segmentation methods use a list of learned detection queries to retrieve information from the transformer network and learn to predict the location and category of one specific object from each query. We empirically find that random convex combinations of the learned queries are still good for the corresponding models. We then propose to learn a convex combination with dynamic coefficients based on the high-level semantics of the image. The generated dynamic queries, named modulated queries, better capture the prior of object locations and categories in the different images. Equipped with our modulated queries, a wide range of DETR-based models achieve consistent and superior performance across multiple tasks including object detection, instance segmentation, panoptic segmentation, and video instance segmentation.Comment: 12 pages, 4 figures, ICML 202

    The Devil is in the Details: A Deep Dive into the Rabbit Hole of Data Filtering

    Full text link
    The quality of pre-training data plays a critical role in the performance of foundation models. Popular foundation models often design their own recipe for data filtering, which makes it hard to analyze and compare different data filtering approaches. DataComp is a new benchmark dedicated to evaluating different methods for data filtering. This paper describes our learning and solution when participating in the DataComp challenge. Our filtering strategy includes three stages: single-modality filtering, cross-modality filtering, and data distribution alignment. We integrate existing methods and propose new solutions, such as computing CLIP score on horizontally flipped images to mitigate the interference of scene text, using vision and language models to retrieve training samples for target downstream tasks, rebalancing the data distribution to improve the efficiency of allocating the computational budget, etc. We slice and dice our design choices, provide in-depth analysis, and discuss open questions. Our approach outperforms the best method from the DataComp paper by over 4% on the average performance of 38 tasks and by over 2% on ImageNet.Comment: 12 pages, 10 figure

    Rumor Detection with Diverse Counterfactual Evidence

    Full text link
    The growth in social media has exacerbated the threat of fake news to individuals and communities. This draws increasing attention to developing efficient and timely rumor detection methods. The prevailing approaches resort to graph neural networks (GNNs) to exploit the post-propagation patterns of the rumor-spreading process. However, these methods lack inherent interpretation of rumor detection due to the black-box nature of GNNs. Moreover, these methods suffer from less robust results as they employ all the propagation patterns for rumor detection. In this paper, we address the above issues with the proposed Diverse Counterfactual Evidence framework for Rumor Detection (DCE-RD). Our intuition is to exploit the diverse counterfactual evidence of an event graph to serve as multi-view interpretations, which are further aggregated for robust rumor detection results. Specifically, our method first designs a subgraph generation strategy to efficiently generate different subgraphs of the event graph. We constrain the removal of these subgraphs to cause the change in rumor detection results. Thus, these subgraphs naturally serve as counterfactual evidence for rumor detection. To achieve multi-view interpretation, we design a diversity loss inspired by Determinantal Point Processes (DPP) to encourage diversity among the counterfactual evidence. A GNN-based rumor detection model further aggregates the diverse counterfactual evidence discovered by the proposed DCE-RD to achieve interpretable and robust rumor detection results. Extensive experiments on two real-world datasets show the superior performance of our method. Our code is available at https://github.com/Vicinity111/DCE-RD

    Camouflaged Image Synthesis Is All You Need to Boost Camouflaged Detection

    Full text link
    Camouflaged objects that blend into natural scenes pose significant challenges for deep-learning models to detect and synthesize. While camouflaged object detection is a crucial task in computer vision with diverse real-world applications, this research topic has been constrained by limited data availability. We propose a framework for synthesizing camouflage data to enhance the detection of camouflaged objects in natural scenes. Our approach employs a generative model to produce realistic camouflage images, which can be used to train existing object detection models. Specifically, we use a camouflage environment generator supervised by a camouflage distribution classifier to synthesize the camouflage images, which are then fed into our generator to expand the dataset. Our framework outperforms the current state-of-the-art method on three datasets (COD10k, CAMO, and CHAMELEON), demonstrating its effectiveness in improving camouflaged object detection. This approach can serve as a plug-and-play data generation and augmentation module for existing camouflaged object detection tasks and provides a novel way to introduce more diversity and distributions into current camouflage datasets

    Effect of density and total weight on flow depth, velocity, and stresses in loess debris flows

    Get PDF
    Debris flows that involve loess material produce important damage around the world. However, the kinematics of such processes are poorly understood. To better understand these kinematics, we used a flume to measure the kinematics of debris flows with different mixture densities and weights. We used sensors to measure pore fluid pressure and total normal stress. We measured flow patterns, velocities, and depths using a high-speed camera and laser range finder to identify the temporal evolution of the flow behavior and the corresponding peaks. We constructed fitting functions for the relationships between the maximum values of the experimental parameters. The hydrographs of the debris flows could be divided into four phases: increase to a first minor peak, a subsequent smooth increase to a second peak, fluctuation until a third major peak, and a final continuous decrease. The flow depth, velocity, total normal stress, and pore fluid pressure were strongly related to the mixture density and total mixture weight. We defined the corresponding relationships between the flow parameters and mixture kinematics. Linear and exponential relationships described the maximum flow depth and the mixture weight and density, respectively. The flow velocity was linearly related to the weight and density. The pore fluid pressure and total normal stress were linearly related to the weight, but logarithmically related to the density. The regression goodness of fit for all functions was >0.93. Therefore, these functions are accurate and could be used to predict the consequences of loess debris flows. Our results provide an improved understanding of the effects of mixture density and weight on the kinematics of debris flows in loess areas, and can help landscape managers prevent and design improved engineering solutions.Peer ReviewedPostprint (published version

    Learning Cross-modality Information Bottleneck Representation for Heterogeneous Person Re-Identification

    Full text link
    Visible-Infrared person re-identification (VI-ReID) is an important and challenging task in intelligent video surveillance. Existing methods mainly focus on learning a shared feature space to reduce the modality discrepancy between visible and infrared modalities, which still leave two problems underexplored: information redundancy and modality complementarity. To this end, properly eliminating the identity-irrelevant information as well as making up for the modality-specific information are critical and remains a challenging endeavor. To tackle the above problems, we present a novel mutual information and modality consensus network, namely CMInfoNet, to extract modality-invariant identity features with the most representative information and reduce the redundancies. The key insight of our method is to find an optimal representation to capture more identity-relevant information and compress the irrelevant parts by optimizing a mutual information bottleneck trade-off. Besides, we propose an automatically search strategy to find the most prominent parts that identify the pedestrians. To eliminate the cross- and intra-modality variations, we also devise a modality consensus module to align the visible and infrared modalities for task-specific guidance. Moreover, the global-local feature representations can also be acquired for key parts discrimination. Experimental results on four benchmarks, i.e., SYSU-MM01, RegDB, Occluded-DukeMTMC, Occluded-REID, Partial-REID and Partial\_iLIDS dataset, have demonstrated the effectiveness of CMInfoNet
    • …
    corecore