8,378 research outputs found
Towards Autonomous Selective Harvesting: A Review of Robot Perception, Robot Design, Motion Planning and Control
This paper provides an overview of the current state-of-the-art in selective
harvesting robots (SHRs) and their potential for addressing the challenges of
global food production. SHRs have the potential to increase productivity,
reduce labour costs, and minimise food waste by selectively harvesting only
ripe fruits and vegetables. The paper discusses the main components of SHRs,
including perception, grasping, cutting, motion planning, and control. It also
highlights the challenges in developing SHR technologies, particularly in the
areas of robot design, motion planning and control. The paper also discusses
the potential benefits of integrating AI and soft robots and data-driven
methods to enhance the performance and robustness of SHR systems. Finally, the
paper identifies several open research questions in the field and highlights
the need for further research and development efforts to advance SHR
technologies to meet the challenges of global food production. Overall, this
paper provides a starting point for researchers and practitioners interested in
developing SHRs and highlights the need for more research in this field.Comment: Preprint: to be appeared in Journal of Field Robotic
The Metaverse: Survey, Trends, Novel Pipeline Ecosystem & Future Directions
The Metaverse offers a second world beyond reality, where boundaries are
non-existent, and possibilities are endless through engagement and immersive
experiences using the virtual reality (VR) technology. Many disciplines can
benefit from the advancement of the Metaverse when accurately developed,
including the fields of technology, gaming, education, art, and culture.
Nevertheless, developing the Metaverse environment to its full potential is an
ambiguous task that needs proper guidance and directions. Existing surveys on
the Metaverse focus only on a specific aspect and discipline of the Metaverse
and lack a holistic view of the entire process. To this end, a more holistic,
multi-disciplinary, in-depth, and academic and industry-oriented review is
required to provide a thorough study of the Metaverse development pipeline. To
address these issues, we present in this survey a novel multi-layered pipeline
ecosystem composed of (1) the Metaverse computing, networking, communications
and hardware infrastructure, (2) environment digitization, and (3) user
interactions. For every layer, we discuss the components that detail the steps
of its development. Also, for each of these components, we examine the impact
of a set of enabling technologies and empowering domains (e.g., Artificial
Intelligence, Security & Privacy, Blockchain, Business, Ethics, and Social) on
its advancement. In addition, we explain the importance of these technologies
to support decentralization, interoperability, user experiences, interactions,
and monetization. Our presented study highlights the existing challenges for
each component, followed by research directions and potential solutions. To the
best of our knowledge, this survey is the most comprehensive and allows users,
scholars, and entrepreneurs to get an in-depth understanding of the Metaverse
ecosystem to find their opportunities and potentials for contribution
Recommended from our members
Ensuring Access to Safe and Nutritious Food for All Through the Transformation of Food Systems
Open Set Classification of GAN-based Image Manipulations via a ViT-based Hybrid Architecture
Classification of AI-manipulated content is receiving great attention, for
distinguishing different types of manipulations. Most of the methods developed
so far fail in the open-set scenario, that is when the algorithm used for the
manipulation is not represented by the training set. In this paper, we focus on
the classification of synthetic face generation and manipulation in open-set
scenarios, and propose a method for classification with a rejection option. The
proposed method combines the use of Vision Transformers (ViT) with a hybrid
approach for simultaneous classification and localization. Feature map
correlation is exploited by the ViT module, while a localization branch is
employed as an attention mechanism to force the model to learn per-class
discriminative features associated with the forgery when the manipulation is
performed locally in the image. Rejection is performed by considering several
strategies and analyzing the model output layers. The effectiveness of the
proposed method is assessed for the task of classification of facial attribute
editing and GAN attribution
MaPLe: Multi-modal Prompt Learning
Pre-trained vision-language (V-L) models such as CLIP have shown excellent
generalization ability to downstream tasks. However, they are sensitive to the
choice of input text prompts and require careful selection of prompt templates
to perform well. Inspired by the Natural Language Processing (NLP) literature,
recent CLIP adaptation approaches learn prompts as the textual inputs to
fine-tune CLIP for downstream tasks. We note that using prompting to adapt
representations in a single branch of CLIP (language or vision) is sub-optimal
since it does not allow the flexibility to dynamically adjust both
representation spaces on a downstream task. In this work, we propose
Multi-modal Prompt Learning (MaPLe) for both vision and language branches to
improve alignment between the vision and language representations. Our design
promotes strong coupling between the vision-language prompts to ensure mutual
synergy and discourages learning independent uni-modal solutions. Further, we
learn separate prompts across different early stages to progressively model the
stage-wise feature relationships to allow rich context learning. We evaluate
the effectiveness of our approach on three representative tasks of
generalization to novel classes, new target datasets and unseen domain shifts.
Compared with the state-of-the-art method Co-CoOp, MaPLe exhibits favorable
performance and achieves an absolute gain of 3.45% on novel classes and 2.72%
on overall harmonic-mean, averaged over 11 diverse image recognition datasets.
Our code and pre-trained models are available at
https://github.com/muzairkhattak/multimodal-prompt-learning.Comment: Accepted at CVPR202
Robo3D: Towards Robust and Reliable 3D Perception against Corruptions
The robustness of 3D perception systems under natural corruptions from
environments and sensors is pivotal for safety-critical applications. Existing
large-scale 3D perception datasets often contain data that are meticulously
cleaned. Such configurations, however, cannot reflect the reliability of
perception models during the deployment stage. In this work, we present Robo3D,
the first comprehensive benchmark heading toward probing the robustness of 3D
detectors and segmentors under out-of-distribution scenarios against natural
corruptions that occur in real-world environments. Specifically, we consider
eight corruption types stemming from adversarial weather conditions, external
disturbances, and internal sensor failure. We uncover that, although promising
results have been progressively achieved on standard benchmarks,
state-of-the-art 3D perception models are at risk of being vulnerable to
corruptions. We draw key observations on the use of data representations,
augmentation schemes, and training strategies, that could severely affect the
model's performance. To pursue better robustness, we propose a
density-insensitive training framework along with a simple flexible
voxelization strategy to enhance the model resiliency. We hope our benchmark
and approach could inspire future research in designing more robust and
reliable 3D perception models. Our robustness benchmark suite is publicly
available.Comment: 33 pages, 26 figures, 26 tables; code at
https://github.com/ldkong1205/Robo3D project page at
https://ldkong.com/Robo3
Deep Learning for Scene Flow Estimation on Point Clouds: A Survey and Prospective Trends
Aiming at obtaining structural information and 3D motion of dynamic scenes, scene flow estimation has been an interest of research in computer vision and computer graphics for a long time. It is also a fundamental task for various applications such as autonomous driving. Compared to previous methods that utilize image representations, many recent researches build upon the power of deep analysis and focus on point clouds representation to conduct 3D flow estimation. This paper comprehensively reviews the pioneering literature in scene flow estimation based on point clouds. Meanwhile, it delves into detail in learning paradigms and presents insightful comparisons between the state-of-the-art methods using deep learning for scene flow estimation. Furthermore, this paper investigates various higher-level scene understanding tasks, including object tracking, motion segmentation, etc. and concludes with an overview of foreseeable research trends for scene flow estimation
Deep Transfer Learning Applications in Intrusion Detection Systems: A Comprehensive Review
Globally, the external Internet is increasingly being connected to the
contemporary industrial control system. As a result, there is an immediate need
to protect the network from several threats. The key infrastructure of
industrial activity may be protected from harm by using an intrusion detection
system (IDS), a preventive measure mechanism, to recognize new kinds of
dangerous threats and hostile activities. The most recent artificial
intelligence (AI) techniques used to create IDS in many kinds of industrial
control networks are examined in this study, with a particular emphasis on
IDS-based deep transfer learning (DTL). This latter can be seen as a type of
information fusion that merge, and/or adapt knowledge from multiple domains to
enhance the performance of the target task, particularly when the labeled data
in the target domain is scarce. Publications issued after 2015 were taken into
account. These selected publications were divided into three categories:
DTL-only and IDS-only are involved in the introduction and background, and
DTL-based IDS papers are involved in the core papers of this review.
Researchers will be able to have a better grasp of the current state of DTL
approaches used in IDS in many different types of networks by reading this
review paper. Other useful information, such as the datasets used, the sort of
DTL employed, the pre-trained network, IDS techniques, the evaluation metrics
including accuracy/F-score and false alarm rate (FAR), and the improvement
gained, were also covered. The algorithms, and methods used in several studies,
or illustrate deeply and clearly the principle in any DTL-based IDS subcategory
are presented to the reader
- …