134 research outputs found
Study of object recognition and identification based on shape and texture analysis
The objective of object recognition is to enable computers to recognize image patterns without human intervention. According to its applications, it is mainly divided into two parts: recognition of object categories and detection/identification of objects.
My thesis studied the techniques of object feature analysis and identification strategies, which solve the object recognition problem by employing effective and perceptually important object features. The shape information is of particular interest and a review of the shape representation and description is presented, as well as the latest research work on object recognition. In the second chapter of the thesis, a novel content-based approach is proposed for efficient shape classification and retrieval of 2D objects.
Two object detection approaches, which are designed according to the characteristics of the shape context and SIFT descriptors, respectively, are analyzed and compared. It is found that the identification strategy constructed on a single type of object feature is only able to recognize the target object under specific conditions which the identifier is adapted to. These identifiers are usually designed to detect the target objects which are rich in the feature type captured by the identifier. In addition, this type of feature often distinguishes the target object from the complex scene.
To overcome this constraint, a novel prototyped-based object identification method is presented to detect the target object in the complex scene by employing different types of descriptors to capture the heterogeneous features. All types of descriptors are modified to meet the requirement of the detection strategy’s framework. Thus this new method is able to describe and identify various kinds of objects whose dominant features are quite different. The identification system employs the cosine similarity to evaluate the resemblance between the prototype image and image windows on the complex scene. Then a ‘resemblance map’ is established with values on each patch representing the likelihood of the target object’s presence. The simulation approved that this novel object detection strategy is efficient, robust and of scale and rotation invariance
Ultrafast laser-induced spin/electron dynamic in advanced material
I have commissioned the instrumentation of a one-colour time-resolved optical pump-probe experimental system and studied carrier dynamics in a monolayer MoSe2 sample. The system has been first tested and optimized by performing time-resolved measurements at room temperatures in an intrinsic GaAs wafer. Different modulation methods have been extensively investigated in order to increase SNR ratio. Comparing the experiment data and simulation results, double-chopping modulation is found to significantly enhance the signal-to-noise ratio. Transient reflectivity and Kerr rotation have been obtained in the GaAs test sample under various pump fluency, polarisation, and wavelength. The carrier dynamics in the GaAs sample can be reproduced and three distinct time scales have been revealed: τ_1~1.5 ps,τ_2~10 ps,τ_3~100 ps, which are consistent to what have been reported. The same experimental methods are then used to investigate the carrier dynamics of a monolayer MoSe2. From the wavelength dependent transient reflectivity data, the band gap of the MoSe2 sample is confirmed to be lower than 820 nm at room temperature, which is greatly reduced compared with its reported value at zero temperature. The reflectivity is found falling to negative after the initial positive peak at wavelengths longer than 790 nm. We suggest this phenomenon arise from the bound of trion and nonradioactive combination of excitons. All the reflectivity data is fitted by a bi-exponential decay function. fitting weight constants for these two dynamics have a clear dependence on laser wavelength. Transient Kerr rotation is also observed over the range of wavelength from 790 nm to 820 nm. The decay of the Kerr rotation is within the first picosecond and doesn’t show an obvious wavelength dependence, which suggest an ultrafast spin relaxation time in MoSe2
Leveraging Work-Related Stressors for Employee Innovation: The Moderating Role of Enterprise Social Networking Use
Enterprise social networking (ESN) techniques have been widely adopted by firms to provide a platform for public communication among employees. This study investigates how the relationships between stressors (i.e., challenge and hindrance stressors) and employee innovation are moderated by task-oriented and relationship-oriented ESN use. Since challenge-hindrance stressors and employee innovation are individual-level variables and task-oriented ESN use and relationship-oriented ESN use are team-level variables, we thus use hierarchical linear model to test this cross-level model. The results of a survey of 191 employees in 50 groups indicate that two ESN use types differentially moderate the relationship between stressors and employee innovation. Specifically, task-oriented ESN use positively moderates the effects of the two stressors on employee innovation, while relationship-oriented ESN use negatively moderates the relationship between the two stressors and employee innovation. In addition, we find that challenge stressors significantly improve employee innovation. Theoretical and practical implications are discussed
What Does Stable Diffusion Know about the 3D Scene?
Recent advances in generative models like Stable Diffusion enable the
generation of highly photo-realistic images. Our objective in this paper is to
probe the diffusion network to determine to what extent it 'understands'
different properties of the 3D scene depicted in an image. To this end, we make
the following contributions: (i) We introduce a protocol to evaluate whether a
network models a number of physical 'properties' of the 3D scene by probing for
explicit features that represent these properties. The probes are applied on
datasets of real images with annotations for the property. (ii) We apply this
protocol to properties covering scene geometry, scene material, support
relations, lighting, and view dependent measures. (iii) We find that Stable
Diffusion is good at a number of properties including scene geometry, support
relations, shadows and depth, but less performant for occlusion. (iv) We also
apply the probes to other models trained at large-scale, including DINO and
CLIP, and find their performance inferior to that of Stable Diffusion
Evaluating Explanation Methods for Vision-and-Language Navigation
The ability to navigate robots with natural language instructions in an
unknown environment is a crucial step for achieving embodied artificial
intelligence (AI). With the improving performance of deep neural models
proposed in the field of vision-and-language navigation (VLN), it is equally
interesting to know what information the models utilize for their
decision-making in the navigation tasks. To understand the inner workings of
deep neural models, various explanation methods have been developed for
promoting explainable AI (XAI). But they are mostly applied to deep neural
models for image or text classification tasks and little work has been done in
explaining deep neural models for VLN tasks. In this paper, we address these
problems by building quantitative benchmarks to evaluate explanation methods
for VLN models in terms of faithfulness. We propose a new erasure-based
evaluation pipeline to measure the step-wise textual explanation in the
sequential decision-making setting. We evaluate several explanation methods for
two representative VLN models on two popular VLN datasets and reveal valuable
findings through our experiments.Comment: Accepted by ECAI 202
Learning Environment-Aware Affordance for 3D Articulated Object Manipulation under Occlusions
Perceiving and manipulating 3D articulated objects in diverse environments is
essential for home-assistant robots. Recent studies have shown that point-level
affordance provides actionable priors for downstream manipulation tasks.
However, existing works primarily focus on single-object scenarios with
homogeneous agents, overlooking the realistic constraints imposed by the
environment and the agent's morphology, e.g., occlusions and physical
limitations. In this paper, we propose an environment-aware affordance
framework that incorporates both object-level actionable priors and environment
constraints. Unlike object-centric affordance approaches, learning
environment-aware affordance faces the challenge of combinatorial explosion due
to the complexity of various occlusions, characterized by their quantities,
geometries, positions and poses. To address this and enhance data efficiency,
we introduce a novel contrastive affordance learning framework capable of
training on scenes containing a single occluder and generalizing to scenes with
complex occluder combinations. Experiments demonstrate the effectiveness of our
proposed approach in learning affordance considering environment constraints.
Project page at https://chengkaiacademycity.github.io/EnvAwareAfford/Comment: In 37th Conference on Neural Information Processing Systems (NeurIPS
2023). Website at https://chengkaiacademycity.github.io/EnvAwareAfford
Image-based Visual Servo Control for Aerial Manipulation Using a Fully-Actuated UAV
Using Unmanned Aerial Vehicles (UAVs) to perform high-altitude manipulation
tasks beyond just passive visual application can reduce the time, cost, and
risk of human workers. Prior research on aerial manipulation has relied on
either ground truth state estimate or GPS/total station with some Simultaneous
Localization and Mapping (SLAM) algorithms, which may not be practical for many
applications close to infrastructure with degraded GPS signal or featureless
environments. Visual servo can avoid the need to estimate robot pose. Existing
works on visual servo for aerial manipulation either address solely
end-effector position control or rely on precise velocity measurement and
pre-defined visual visual marker with known pattern. Furthermore, most of
previous work used under-actuated UAVs, resulting in complicated mechanical and
hence control design for the end-effector. This paper develops an image-based
visual servo control strategy for bridge maintenance using a fully-actuated
UAV. The main components are (1) a visual line detection and tracking system,
(2) a hybrid impedance force and motion control system. Our approach does not
rely on either robot pose/velocity estimation from an external localization
system or pre-defined visual markers. The complexity of the mechanical system
and controller architecture is also minimized due to the fully-actuated nature.
Experiments show that the system can effectively execute motion tracking and
force holding using only the visual guidance for the bridge painting. To the
best of our knowledge, this is one of the first studies on aerial manipulation
using visual servo that is capable of achieving both motion and force control
without the need of external pose/velocity information or pre-defined visual
guidance.Comment: Accepted by 2023 IEEE/RSJ International Conference on Intelligent
Robots and Systems (IROS
Score-PA: Score-based 3D Part Assembly
Autonomous 3D part assembly is a challenging task in the areas of robotics
and 3D computer vision. This task aims to assemble individual components into a
complete shape without relying on predefined instructions. In this paper, we
formulate this task from a novel generative perspective, introducing the
Score-based 3D Part Assembly framework (Score-PA) for 3D part assembly. Knowing
that score-based methods are typically time-consuming during the inference
stage. To address this issue, we introduce a novel algorithm called the Fast
Predictor-Corrector Sampler (FPC) that accelerates the sampling process within
the framework. We employ various metrics to assess assembly quality and
diversity, and our evaluation results demonstrate that our algorithm
outperforms existing state-of-the-art approaches. We release our code at
https://github.com/J-F-Cheng/Score-PA_Score-based-3D-Part-Assembly.Comment: BMVC 202
- …