5,684 research outputs found
Spatial Self-Distillation for Object Detection with Inaccurate Bounding Boxes
Object detection via inaccurate bounding boxes supervision has boosted a
broad interest due to the expensive high-quality annotation data or the
occasional inevitability of low annotation quality (\eg tiny objects). The
previous works usually utilize multiple instance learning (MIL), which highly
depends on category information, to select and refine a low-quality box. Those
methods suffer from object drift, group prediction and part domination problems
without exploring spatial information. In this paper, we heuristically propose
a \textbf{Spatial Self-Distillation based Object Detector (SSD-Det)} to mine
spatial information to refine the inaccurate box in a self-distillation
fashion. SSD-Det utilizes a Spatial Position Self-Distillation \textbf{(SPSD)}
module to exploit spatial information and an interactive structure to combine
spatial information and category information, thus constructing a high-quality
proposal bag. To further improve the selection procedure, a Spatial Identity
Self-Distillation \textbf{(SISD)} module is introduced in SSD-Det to obtain
spatial confidence to help select the best proposals. Experiments on MS-COCO
and VOC datasets with noisy box annotation verify our method's effectiveness
and achieve state-of-the-art performance. The code is available at
https://github.com/ucas-vg/PointTinyBenchmark/tree/SSD-Det.Comment: accepted by ICCV 202
Scene Graph Generation with External Knowledge and Image Reconstruction
Scene graph generation has received growing attention with the advancements
in image understanding tasks such as object detection, attributes and
relationship prediction,~\etc. However, existing datasets are biased in terms
of object and relationship labels, or often come with noisy and missing
annotations, which makes the development of a reliable scene graph prediction
model very challenging. In this paper, we propose a novel scene graph
generation algorithm with external knowledge and image reconstruction loss to
overcome these dataset issues. In particular, we extract commonsense knowledge
from the external knowledge base to refine object and phrase features for
improving generalizability in scene graph generation. To address the bias of
noisy object annotations, we introduce an auxiliary image reconstruction path
to regularize the scene graph generation network. Extensive experiments show
that our framework can generate better scene graphs, achieving the
state-of-the-art performance on two benchmark datasets: Visual Relationship
Detection and Visual Genome datasets.Comment: 10 pages, 5 figures, Accepted in CVPR 201
Grad-CAM++: Improved Visual Explanations for Deep Convolutional Networks
Over the last decade, Convolutional Neural Network (CNN) models have been
highly successful in solving complex vision problems. However, these deep
models are perceived as "black box" methods considering the lack of
understanding of their internal functioning. There has been a significant
recent interest in developing explainable deep learning models, and this paper
is an effort in this direction. Building on a recently proposed method called
Grad-CAM, we propose a generalized method called Grad-CAM++ that can provide
better visual explanations of CNN model predictions, in terms of better object
localization as well as explaining occurrences of multiple object instances in
a single image, when compared to state-of-the-art. We provide a mathematical
derivation for the proposed method, which uses a weighted combination of the
positive partial derivatives of the last convolutional layer feature maps with
respect to a specific class score as weights to generate a visual explanation
for the corresponding class label. Our extensive experiments and evaluations,
both subjective and objective, on standard datasets showed that Grad-CAM++
provides promising human-interpretable visual explanations for a given CNN
architecture across multiple tasks including classification, image caption
generation and 3D action recognition; as well as in new settings such as
knowledge distillation.Comment: 17 Pages, 15 Figures, 11 Tables. Accepted in the proceedings of IEEE
Winter Conf. on Applications of Computer Vision (WACV2018). Extended version
is under review at IEEE Transactions on Pattern Analysis and Machine
Intelligenc
Building information modelling (BIM) implementation and remote construction projects: issues, challenges, and critiques.
The construction industry has been facing a paradigm shift to (i) increase productivity, efficiency, infrastructure value; quality and sustainability (ii) reduce lifecycle costs, lead times and duplications via effective collaboration and communication of stakeholders in construction projects. This paradigm shift is becoming more critical with remote construction projects, which reveals unique and even more complicated challenging problems in relation to communication and management due to the remoteness of the construction sites. On the other hand, Building Informational Modelling (BIM) is offered by some as the panacea to addressing the interdisciplinary inefficiencies in construction projects. Although in many cases the adoption of BIM has numerous potential benefits, it also raises interesting challenges with regards to how BIM integrates the business processes of individual practices. This paper aims to show how BIM adoption for an architectural company helps to mitigate the management and communication problems in remote construction project. The paper adopts a case study methodology, which is a UK Knowledge Transfer Partnership (KTP) project of BIM adoption between the University of Salford, UK and John McCall Architects (JMA), in which the BIM use between the architectural company and the main contractor for a remote construction project is elaborated and justified. Research showed that the key management and communication problems such as poor quality of construction works, unavailability of materials, and ineffective planning and scheduling can largely be mitigated by adopting BIM at the design stage
Open-Vocabulary Affordance Detection using Knowledge Distillation and Text-Point Correlation
Affordance detection presents intricate challenges and has a wide range of
robotic applications. Previous works have faced limitations such as the
complexities of 3D object shapes, the wide range of potential affordances on
real-world objects, and the lack of open-vocabulary support for affordance
understanding. In this paper, we introduce a new open-vocabulary affordance
detection method in 3D point clouds, leveraging knowledge distillation and
text-point correlation. Our approach employs pre-trained 3D models through
knowledge distillation to enhance feature extraction and semantic understanding
in 3D point clouds. We further introduce a new text-point correlation method to
learn the semantic links between point cloud features and open-vocabulary
labels. The intensive experiments show that our approach outperforms previous
works and adapts to new affordance labels and unseen objects. Notably, our
method achieves the improvement of 7.96% mIOU score compared to the baselines.
Furthermore, it offers real-time inference which is well-suitable for robotic
manipulation applications.Comment: 8 page
3D-PreMise: Can Large Language Models Generate 3D Shapes with Sharp Features and Parametric Control?
Recent advancements in implicit 3D representations and generative models have
markedly propelled the field of 3D object generation forward. However, it
remains a significant challenge to accurately model geometries with defined
sharp features under parametric controls, which is crucial in fields like
industrial design and manufacturing. To bridge this gap, we introduce a
framework that employs Large Language Models (LLMs) to generate text-driven 3D
shapes, manipulating 3D software via program synthesis. We present 3D-PreMise,
a dataset specifically tailored for 3D parametric modeling of industrial
shapes, designed to explore state-of-the-art LLMs within our proposed pipeline.
Our work reveals effective generation strategies and delves into the
self-correction capabilities of LLMs using a visual interface. Our work
highlights both the potential and limitations of LLMs in 3D parametric modeling
for industrial applications.Comment: 10 pages, 6 figure
- …