194 research outputs found
DQ-Det: Learning Dynamic Query Combinations for Transformer-based Object Detection and Segmentation
Transformer-based detection and segmentation methods use a list of learned
detection queries to retrieve information from the transformer network and
learn to predict the location and category of one specific object from each
query. We empirically find that random convex combinations of the learned
queries are still good for the corresponding models. We then propose to learn a
convex combination with dynamic coefficients based on the high-level semantics
of the image. The generated dynamic queries, named modulated queries, better
capture the prior of object locations and categories in the different images.
Equipped with our modulated queries, a wide range of DETR-based models achieve
consistent and superior performance across multiple tasks including object
detection, instance segmentation, panoptic segmentation, and video instance
segmentation.Comment: 12 pages, 4 figures, ICML 202
The Devil is in the Details: A Deep Dive into the Rabbit Hole of Data Filtering
The quality of pre-training data plays a critical role in the performance of
foundation models. Popular foundation models often design their own recipe for
data filtering, which makes it hard to analyze and compare different data
filtering approaches. DataComp is a new benchmark dedicated to evaluating
different methods for data filtering. This paper describes our learning and
solution when participating in the DataComp challenge. Our filtering strategy
includes three stages: single-modality filtering, cross-modality filtering, and
data distribution alignment. We integrate existing methods and propose new
solutions, such as computing CLIP score on horizontally flipped images to
mitigate the interference of scene text, using vision and language models to
retrieve training samples for target downstream tasks, rebalancing the data
distribution to improve the efficiency of allocating the computational budget,
etc. We slice and dice our design choices, provide in-depth analysis, and
discuss open questions. Our approach outperforms the best method from the
DataComp paper by over 4% on the average performance of 38 tasks and by over 2%
on ImageNet.Comment: 12 pages, 10 figure
Rumor Detection with Diverse Counterfactual Evidence
The growth in social media has exacerbated the threat of fake news to
individuals and communities. This draws increasing attention to developing
efficient and timely rumor detection methods. The prevailing approaches resort
to graph neural networks (GNNs) to exploit the post-propagation patterns of the
rumor-spreading process. However, these methods lack inherent interpretation of
rumor detection due to the black-box nature of GNNs. Moreover, these methods
suffer from less robust results as they employ all the propagation patterns for
rumor detection. In this paper, we address the above issues with the proposed
Diverse Counterfactual Evidence framework for Rumor Detection (DCE-RD). Our
intuition is to exploit the diverse counterfactual evidence of an event graph
to serve as multi-view interpretations, which are further aggregated for robust
rumor detection results. Specifically, our method first designs a subgraph
generation strategy to efficiently generate different subgraphs of the event
graph. We constrain the removal of these subgraphs to cause the change in rumor
detection results. Thus, these subgraphs naturally serve as counterfactual
evidence for rumor detection. To achieve multi-view interpretation, we design a
diversity loss inspired by Determinantal Point Processes (DPP) to encourage
diversity among the counterfactual evidence. A GNN-based rumor detection model
further aggregates the diverse counterfactual evidence discovered by the
proposed DCE-RD to achieve interpretable and robust rumor detection results.
Extensive experiments on two real-world datasets show the superior performance
of our method. Our code is available at https://github.com/Vicinity111/DCE-RD
Camouflaged Image Synthesis Is All You Need to Boost Camouflaged Detection
Camouflaged objects that blend into natural scenes pose significant
challenges for deep-learning models to detect and synthesize. While camouflaged
object detection is a crucial task in computer vision with diverse real-world
applications, this research topic has been constrained by limited data
availability. We propose a framework for synthesizing camouflage data to
enhance the detection of camouflaged objects in natural scenes. Our approach
employs a generative model to produce realistic camouflage images, which can be
used to train existing object detection models. Specifically, we use a
camouflage environment generator supervised by a camouflage distribution
classifier to synthesize the camouflage images, which are then fed into our
generator to expand the dataset. Our framework outperforms the current
state-of-the-art method on three datasets (COD10k, CAMO, and CHAMELEON),
demonstrating its effectiveness in improving camouflaged object detection. This
approach can serve as a plug-and-play data generation and augmentation module
for existing camouflaged object detection tasks and provides a novel way to
introduce more diversity and distributions into current camouflage datasets
Effect of density and total weight on flow depth, velocity, and stresses in loess debris flows
Debris flows that involve loess material produce important damage around the world. However, the kinematics of such processes are poorly understood. To better understand these kinematics, we used a flume to measure the kinematics of debris flows with different mixture densities and weights. We used sensors to measure pore fluid pressure and total normal stress. We measured flow patterns, velocities, and depths using a high-speed camera and laser range finder to identify the temporal evolution of the flow behavior and the corresponding peaks. We constructed fitting functions for the relationships between the maximum values of the experimental parameters. The hydrographs of the debris flows could be divided into four phases: increase to a first minor peak, a subsequent smooth increase to a second peak, fluctuation until a third major peak, and a final continuous decrease. The flow depth, velocity, total normal stress, and pore fluid pressure were strongly related to the mixture density and total mixture weight. We defined the corresponding relationships between the flow parameters and mixture kinematics. Linear and exponential relationships described the maximum flow depth and the mixture weight and density, respectively. The flow velocity was linearly related to the weight and density. The pore fluid pressure and total normal stress were linearly related to the weight, but logarithmically related to the density. The regression goodness of fit for all functions was >0.93. Therefore, these functions are accurate and could be used to predict the consequences of loess debris flows. Our results provide an improved understanding of the effects of mixture density and weight on the kinematics of debris flows in loess areas, and can help landscape managers prevent and design improved engineering solutions.Peer ReviewedPostprint (published version
Learning Cross-modality Information Bottleneck Representation for Heterogeneous Person Re-Identification
Visible-Infrared person re-identification (VI-ReID) is an important and
challenging task in intelligent video surveillance. Existing methods mainly
focus on learning a shared feature space to reduce the modality discrepancy
between visible and infrared modalities, which still leave two problems
underexplored: information redundancy and modality complementarity. To this
end, properly eliminating the identity-irrelevant information as well as making
up for the modality-specific information are critical and remains a challenging
endeavor. To tackle the above problems, we present a novel mutual information
and modality consensus network, namely CMInfoNet, to extract modality-invariant
identity features with the most representative information and reduce the
redundancies. The key insight of our method is to find an optimal
representation to capture more identity-relevant information and compress the
irrelevant parts by optimizing a mutual information bottleneck trade-off.
Besides, we propose an automatically search strategy to find the most prominent
parts that identify the pedestrians. To eliminate the cross- and intra-modality
variations, we also devise a modality consensus module to align the visible and
infrared modalities for task-specific guidance. Moreover, the global-local
feature representations can also be acquired for key parts discrimination.
Experimental results on four benchmarks, i.e., SYSU-MM01, RegDB,
Occluded-DukeMTMC, Occluded-REID, Partial-REID and Partial\_iLIDS dataset, have
demonstrated the effectiveness of CMInfoNet
- …