382 research outputs found

    Pixel-wise Orthogonal Decomposition for Color Illumination Invariant and Shadow-free Image

    Full text link
    In this paper, we propose a novel, effective and fast method to obtain a color illumination invariant and shadow-free image from a single outdoor image. Different from state-of-the-art methods for shadow-free image that either need shadow detection or statistical learning, we set up a linear equation set for each pixel value vector based on physically-based shadow invariants, deduce a pixel-wise orthogonal decomposition for its solutions, and then get an illumination invariant vector for each pixel value vector on an image. The illumination invariant vector is the unique particular solution of the linear equation set, which is orthogonal to its free solutions. With this illumination invariant vector and Lab color space, we propose an algorithm to generate a shadow-free image which well preserves the texture and color information of the original image. A series of experiments on a diverse set of outdoor images and the comparisons with the state-of-the-art methods validate our method.Comment: This paper has been published in Optics Express, Vol. 23, Issue 3, pp. 2220-2239. The final version is available on http://dx.doi.org/10.1364/OE.23.002220. Please refer to that version when citing this pape

    Teaching Practice and Research on BIM-Based Assembled Building Measurement and Valuation

    Get PDF
    As the construction industry and information technology are being deeply integrated, it is of great practical significance to rapidly develop the practical training course Measurement and Valuation of Prefabricated Construction based on building information modeling (BIM). In this paper, we comprehensively analyze the status quo of teaching of the course Measurement and Valuation of Prefabricated Construction in China as well as the existing problems about the BIM-integrated teaching. Further, we take into account four aspects as per the talent training program: course system setup, syllabus development, teacher team building, and construction of a BIM-based construction cost practice base. Then, we put forward a teaching reform mode of the course Measurement and Valuation of Prefabricated Construction, namely a new "five-in-one" teaching mode, which matches new technology and adapts to the transformation and upgrading of China's construction industry. We perform a teaching reform in the BIM-based practical training course Measurement and Valuation of Prefabricated Construction for construction cost majors in School of Construction and Engineering at a Guangxi university. This reform is fruitful and provides a theoretical reference for carrying out the similar reform across China. Keywords: prefabricated construction; measurement and valuation course; BIM; teaching practice and research DOI: 10.7176/JEP/10-27-01 Publication date:September 30th 201

    D2^2ST-Adapter: Disentangled-and-Deformable Spatio-Temporal Adapter for Few-shot Action Recognition

    Full text link
    Adapting large pre-trained image models to few-shot action recognition has proven to be an effective and efficient strategy for learning robust feature extractors, which is essential for few-shot learning. Typical fine-tuning based adaptation paradigm is prone to overfitting in the few-shot learning scenarios and offers little modeling flexibility for learning temporal features in video data. In this work we present the Disentangled-and-Deformable Spatio-Temporal Adapter (D2^2ST-Adapter), which is a novel adapter tuning framework well-suited for few-shot action recognition due to lightweight design and low parameter-learning overhead. It is designed in a dual-pathway architecture to encode spatial and temporal features in a disentangled manner. In particular, we devise the anisotropic Deformable Spatio-Temporal Attention module as the core component of D2^2ST-Adapter, which can be tailored with anisotropic sampling densities along spatial and temporal domains to learn spatial and temporal features specifically in corresponding pathways, allowing our D2^2ST-Adapter to encode features in a global view in 3D spatio-temporal space while maintaining a lightweight design. Extensive experiments with instantiations of our method on both pre-trained ResNet and ViT demonstrate the superiority of our method over state-of-the-art methods for few-shot action recognition. Our method is particularly well-suited to challenging scenarios where temporal dynamics are critical for action recognition

    Learning and Vision-Based Obstacle Avoidance and Navigation

    Full text link

    Specular Reflection Separation With Color-Lines Constraint

    Full text link

    Multi-Faceted Distillation of Base-Novel Commonality for Few-shot Object Detection

    Full text link
    Most of existing methods for few-shot object detection follow the fine-tuning paradigm, which potentially assumes that the class-agnostic generalizable knowledge can be learned and transferred implicitly from base classes with abundant samples to novel classes with limited samples via such a two-stage training strategy. However, it is not necessarily true since the object detector can hardly distinguish between class-agnostic knowledge and class-specific knowledge automatically without explicit modeling. In this work we propose to learn three types of class-agnostic commonalities between base and novel classes explicitly: recognition-related semantic commonalities, localization-related semantic commonalities and distribution commonalities. We design a unified distillation framework based on a memory bank, which is able to perform distillation of all three types of commonalities jointly and efficiently. Extensive experiments demonstrate that our method can be readily integrated into most of existing fine-tuning based methods and consistently improve the performance by a large margin

    Exploring Self- and Cross-Triplet Correlations for Human-Object Interaction Detection

    Full text link
    Human-Object Interaction (HOI) detection plays a vital role in scene understanding, which aims to predict the HOI triplet in the form of <human, object, action>. Existing methods mainly extract multi-modal features (e.g., appearance, object semantics, human pose) and then fuse them together to directly predict HOI triplets. However, most of these methods focus on seeking for self-triplet aggregation, but ignore the potential cross-triplet dependencies, resulting in ambiguity of action prediction. In this work, we propose to explore Self- and Cross-Triplet Correlations (SCTC) for HOI detection. Specifically, we regard each triplet proposal as a graph where Human, Object represent nodes and Action indicates edge, to aggregate self-triplet correlation. Also, we try to explore cross-triplet dependencies by jointly considering instance-level, semantic-level, and layout-level relations. Besides, we leverage the CLIP model to assist our SCTC obtain interaction-aware feature by knowledge distillation, which provides useful action clues for HOI detection. Extensive experiments on HICO-DET and V-COCO datasets verify the effectiveness of our proposed SCTC

    Unbiased Faster R-CNN for Single-source Domain Generalized Object Detection

    Full text link
    Single-source domain generalization (SDG) for object detection is a challenging yet essential task as the distribution bias of the unseen domain degrades the algorithm performance significantly. However, existing methods attempt to extract domain-invariant features, neglecting that the biased data leads the network to learn biased features that are non-causal and poorly generalizable. To this end, we propose an Unbiased Faster R-CNN (UFR) for generalizable feature learning. Specifically, we formulate SDG in object detection from a causal perspective and construct a Structural Causal Model (SCM) to analyze the data bias and feature bias in the task, which are caused by scene confounders and object attribute confounders. Based on the SCM, we design a Global-Local Transformation module for data augmentation, which effectively simulates domain diversity and mitigates the data bias. Additionally, we introduce a Causal Attention Learning module that incorporates a designed attention invariance loss to learn image-level features that are robust to scene confounders. Moreover, we develop a Causal Prototype Learning module with an explicit instance constraint and an implicit prototype constraint, which further alleviates the negative impact of object attribute confounders. Experimental results on five scenes demonstrate the prominent generalization ability of our method, with an improvement of 3.9% mAP on the Night-Clear scene.Comment: CVPR 202

    Motif-aware temporal GCN for fraud detection in signed cryptocurrency trust networks

    Full text link
    Graph convolutional networks (GCNs) is a class of artificial neural networks for processing data that can be represented as graphs. Since financial transactions can naturally be constructed as graphs, GCNs are widely applied in the financial industry, especially for financial fraud detection. In this paper, we focus on fraud detection on cryptocurrency truct networks. In the literature, most works focus on static networks. Whereas in this study, we consider the evolving nature of cryptocurrency networks, and use local structural as well as the balance theory to guide the training process. More specifically, we compute motif matrices to capture the local topological information, then use them in the GCN aggregation process. The generated embedding at each snapshot is a weighted average of embeddings within a time window, where the weights are learnable parameters. Since the trust networks is signed on each edge, balance theory is used to guide the training process. Experimental results on bitcoin-alpha and bitcoin-otc datasets show that the proposed model outperforms those in the literature

    Cast Shadow Detection for Surveillance System Based on Tricolor Attenuation Model

    Get PDF
    Abstract. Shadows bring some undesirable problems in computer vision, such as object detecting in outdoor scenes. In this paper, we propose a novel method for cast shadow detecting for moving target in surveillance system. This measure is based on tricolor attenuation model, which describes the relationship of three color channel&apos;s attenuation in image when shadow happens. According to this relationship, the cast shadow is removed from the detected moving area, only the target area is left. Some experiments were done, and their results validate the performance of our method
    • …
    corecore