2,009 research outputs found
Physics-Informed Computer Vision: A Review and Perspectives
Incorporation of physical information in machine learning frameworks are
opening and transforming many application domains. Here the learning process is
augmented through the induction of fundamental knowledge and governing physical
laws. In this work we explore their utility for computer vision tasks in
interpreting and understanding visual data. We present a systematic literature
review of formulation and approaches to computer vision tasks guided by
physical laws. We begin by decomposing the popular computer vision pipeline
into a taxonomy of stages and investigate approaches to incorporate governing
physical equations in each stage. Existing approaches in each task are analyzed
with regard to what governing physical processes are modeled, formulated and
how they are incorporated, i.e. modify data (observation bias), modify networks
(inductive bias), and modify losses (learning bias). The taxonomy offers a
unified view of the application of the physics-informed capability,
highlighting where physics-informed learning has been conducted and where the
gaps and opportunities are. Finally, we highlight open problems and challenges
to inform future research. While still in its early days, the study of
physics-informed computer vision has the promise to develop better computer
vision models that can improve physical plausibility, accuracy, data efficiency
and generalization in increasingly realistic applications
EC^2: Emergent Communication for Embodied Control
Embodied control requires agents to leverage multi-modal pre-training to
quickly learn how to act in new environments, where video demonstrations
contain visual and motion details needed for low-level perception and control,
and language instructions support generalization with abstract, symbolic
structures. While recent approaches apply contrastive learning to force
alignment between the two modalities, we hypothesize better modeling their
complementary differences can lead to more holistic representations for
downstream adaption. To this end, we propose Emergent Communication for
Embodied Control (EC^2), a novel scheme to pre-train video-language
representations for few-shot embodied control. The key idea is to learn an
unsupervised "language" of videos via emergent communication, which bridges the
semantics of video details and structures of natural language. We learn
embodied representations of video trajectories, emergent language, and natural
language using a language model, which is then used to finetune a lightweight
policy network for downstream control. Through extensive experiments in
Metaworld and Franka Kitchen embodied benchmarks, EC^2 is shown to consistently
outperform previous contrastive learning methods for both videos and texts as
task inputs. Further ablations confirm the importance of the emergent language,
which is beneficial for both video and language learning, and significantly
superior to using pre-trained video captions. We also present a quantitative
and qualitative analysis of the emergent language and discuss future directions
toward better understanding and leveraging emergent communication in embodied
tasks.Comment: Published in CVPR202
Advances in Object and Activity Detection in Remote Sensing Imagery
The recent revolution in deep learning has enabled considerable development in the fields of object and activity detection. Visual object detection tries to find objects of target classes with precise localisation in an image and assign each object instance a corresponding class label. At the same time, activity recognition aims to determine the actions or activities of an agent or group of agents based on sensor or video observation data. It is a very important and challenging problem to detect, identify, track, and understand the behaviour of objects through images and videos taken by various cameras. Together, objects and their activity recognition in imaging data captured by remote sensing platforms is a highly dynamic and challenging research topic. During the last decade, there has been significant growth in the number of publications in the field of object and activity recognition. In particular, many researchers have proposed application domains to identify objects and their specific behaviours from air and spaceborne imagery. This Special Issue includes papers that explore novel and challenging topics for object and activity detection in remote sensing images and videos acquired by diverse platforms
- …