382 research outputs found
Pixel-wise Orthogonal Decomposition for Color Illumination Invariant and Shadow-free Image
In this paper, we propose a novel, effective and fast method to obtain a
color illumination invariant and shadow-free image from a single outdoor image.
Different from state-of-the-art methods for shadow-free image that either need
shadow detection or statistical learning, we set up a linear equation set for
each pixel value vector based on physically-based shadow invariants, deduce a
pixel-wise orthogonal decomposition for its solutions, and then get an
illumination invariant vector for each pixel value vector on an image. The
illumination invariant vector is the unique particular solution of the linear
equation set, which is orthogonal to its free solutions. With this illumination
invariant vector and Lab color space, we propose an algorithm to generate a
shadow-free image which well preserves the texture and color information of the
original image. A series of experiments on a diverse set of outdoor images and
the comparisons with the state-of-the-art methods validate our method.Comment: This paper has been published in Optics Express, Vol. 23, Issue 3,
pp. 2220-2239. The final version is available on
http://dx.doi.org/10.1364/OE.23.002220. Please refer to that version when
citing this pape
Teaching Practice and Research on BIM-Based Assembled Building Measurement and Valuation
As the construction industry and information technology are being deeply integrated, it is of great practical significance to rapidly develop the practical training course Measurement and Valuation of Prefabricated Construction based on building information modeling (BIM). In this paper, we comprehensively analyze the status quo of teaching of the course Measurement and Valuation of Prefabricated Construction in China as well as the existing problems about the BIM-integrated teaching. Further, we take into account four aspects as per the talent training program: course system setup, syllabus development, teacher team building, and construction of a BIM-based construction cost practice base. Then, we put forward a teaching reform mode of the course Measurement and Valuation of Prefabricated Construction, namely a new "five-in-one" teaching mode, which matches new technology and adapts to the transformation and upgrading of China's construction industry. We perform a teaching reform in the BIM-based practical training course Measurement and Valuation of Prefabricated Construction for construction cost majors in School of Construction and Engineering at a Guangxi university. This reform is fruitful and provides a theoretical reference for carrying out the similar reform across China. Keywords: prefabricated construction; measurement and valuation course; BIM; teaching practice and research DOI: 10.7176/JEP/10-27-01 Publication date:September 30th 201
DST-Adapter: Disentangled-and-Deformable Spatio-Temporal Adapter for Few-shot Action Recognition
Adapting large pre-trained image models to few-shot action recognition has
proven to be an effective and efficient strategy for learning robust feature
extractors, which is essential for few-shot learning. Typical fine-tuning based
adaptation paradigm is prone to overfitting in the few-shot learning scenarios
and offers little modeling flexibility for learning temporal features in video
data. In this work we present the Disentangled-and-Deformable Spatio-Temporal
Adapter (DST-Adapter), which is a novel adapter tuning framework
well-suited for few-shot action recognition due to lightweight design and low
parameter-learning overhead. It is designed in a dual-pathway architecture to
encode spatial and temporal features in a disentangled manner. In particular,
we devise the anisotropic Deformable Spatio-Temporal Attention module as the
core component of DST-Adapter, which can be tailored with anisotropic
sampling densities along spatial and temporal domains to learn spatial and
temporal features specifically in corresponding pathways, allowing our
DST-Adapter to encode features in a global view in 3D spatio-temporal space
while maintaining a lightweight design. Extensive experiments with
instantiations of our method on both pre-trained ResNet and ViT demonstrate the
superiority of our method over state-of-the-art methods for few-shot action
recognition. Our method is particularly well-suited to challenging scenarios
where temporal dynamics are critical for action recognition
Multi-Faceted Distillation of Base-Novel Commonality for Few-shot Object Detection
Most of existing methods for few-shot object detection follow the fine-tuning
paradigm, which potentially assumes that the class-agnostic generalizable
knowledge can be learned and transferred implicitly from base classes with
abundant samples to novel classes with limited samples via such a two-stage
training strategy. However, it is not necessarily true since the object
detector can hardly distinguish between class-agnostic knowledge and
class-specific knowledge automatically without explicit modeling. In this work
we propose to learn three types of class-agnostic commonalities between base
and novel classes explicitly: recognition-related semantic commonalities,
localization-related semantic commonalities and distribution commonalities. We
design a unified distillation framework based on a memory bank, which is able
to perform distillation of all three types of commonalities jointly and
efficiently. Extensive experiments demonstrate that our method can be readily
integrated into most of existing fine-tuning based methods and consistently
improve the performance by a large margin
Exploring Self- and Cross-Triplet Correlations for Human-Object Interaction Detection
Human-Object Interaction (HOI) detection plays a vital role in scene
understanding, which aims to predict the HOI triplet in the form of <human,
object, action>. Existing methods mainly extract multi-modal features (e.g.,
appearance, object semantics, human pose) and then fuse them together to
directly predict HOI triplets. However, most of these methods focus on seeking
for self-triplet aggregation, but ignore the potential cross-triplet
dependencies, resulting in ambiguity of action prediction. In this work, we
propose to explore Self- and Cross-Triplet Correlations (SCTC) for HOI
detection. Specifically, we regard each triplet proposal as a graph where
Human, Object represent nodes and Action indicates edge, to aggregate
self-triplet correlation. Also, we try to explore cross-triplet dependencies by
jointly considering instance-level, semantic-level, and layout-level relations.
Besides, we leverage the CLIP model to assist our SCTC obtain interaction-aware
feature by knowledge distillation, which provides useful action clues for HOI
detection. Extensive experiments on HICO-DET and V-COCO datasets verify the
effectiveness of our proposed SCTC
Unbiased Faster R-CNN for Single-source Domain Generalized Object Detection
Single-source domain generalization (SDG) for object detection is a
challenging yet essential task as the distribution bias of the unseen domain
degrades the algorithm performance significantly. However, existing methods
attempt to extract domain-invariant features, neglecting that the biased data
leads the network to learn biased features that are non-causal and poorly
generalizable. To this end, we propose an Unbiased Faster R-CNN (UFR) for
generalizable feature learning. Specifically, we formulate SDG in object
detection from a causal perspective and construct a Structural Causal Model
(SCM) to analyze the data bias and feature bias in the task, which are caused
by scene confounders and object attribute confounders. Based on the SCM, we
design a Global-Local Transformation module for data augmentation, which
effectively simulates domain diversity and mitigates the data bias.
Additionally, we introduce a Causal Attention Learning module that incorporates
a designed attention invariance loss to learn image-level features that are
robust to scene confounders. Moreover, we develop a Causal Prototype Learning
module with an explicit instance constraint and an implicit prototype
constraint, which further alleviates the negative impact of object attribute
confounders. Experimental results on five scenes demonstrate the prominent
generalization ability of our method, with an improvement of 3.9% mAP on the
Night-Clear scene.Comment: CVPR 202
Motif-aware temporal GCN for fraud detection in signed cryptocurrency trust networks
Graph convolutional networks (GCNs) is a class of artificial neural networks
for processing data that can be represented as graphs. Since financial
transactions can naturally be constructed as graphs, GCNs are widely applied in
the financial industry, especially for financial fraud detection. In this
paper, we focus on fraud detection on cryptocurrency truct networks. In the
literature, most works focus on static networks. Whereas in this study, we
consider the evolving nature of cryptocurrency networks, and use local
structural as well as the balance theory to guide the training process. More
specifically, we compute motif matrices to capture the local topological
information, then use them in the GCN aggregation process. The generated
embedding at each snapshot is a weighted average of embeddings within a time
window, where the weights are learnable parameters. Since the trust networks is
signed on each edge, balance theory is used to guide the training process.
Experimental results on bitcoin-alpha and bitcoin-otc datasets show that the
proposed model outperforms those in the literature
Cast Shadow Detection for Surveillance System Based on Tricolor Attenuation Model
Abstract. Shadows bring some undesirable problems in computer vision, such as object detecting in outdoor scenes. In this paper, we propose a novel method for cast shadow detecting for moving target in surveillance system. This measure is based on tricolor attenuation model, which describes the relationship of three color channel's attenuation in image when shadow happens. According to this relationship, the cast shadow is removed from the detected moving area, only the target area is left. Some experiments were done, and their results validate the performance of our method
- …