12,291 research outputs found
Towards Autonomous Selective Harvesting: A Review of Robot Perception, Robot Design, Motion Planning and Control
This paper provides an overview of the current state-of-the-art in selective
harvesting robots (SHRs) and their potential for addressing the challenges of
global food production. SHRs have the potential to increase productivity,
reduce labour costs, and minimise food waste by selectively harvesting only
ripe fruits and vegetables. The paper discusses the main components of SHRs,
including perception, grasping, cutting, motion planning, and control. It also
highlights the challenges in developing SHR technologies, particularly in the
areas of robot design, motion planning and control. The paper also discusses
the potential benefits of integrating AI and soft robots and data-driven
methods to enhance the performance and robustness of SHR systems. Finally, the
paper identifies several open research questions in the field and highlights
the need for further research and development efforts to advance SHR
technologies to meet the challenges of global food production. Overall, this
paper provides a starting point for researchers and practitioners interested in
developing SHRs and highlights the need for more research in this field.Comment: Preprint: to be appeared in Journal of Field Robotic
Satellite Image Based Cross-view Localization for Autonomous Vehicle
Existing spatial localization techniques for autonomous vehicles mostly use a
pre-built 3D-HD map, often constructed using a survey-grade 3D mapping vehicle,
which is not only expensive but also laborious. This paper shows that by using
an off-the-shelf high-definition satellite image as a ready-to-use map, we are
able to achieve cross-view vehicle localization up to a satisfactory accuracy,
providing a cheaper and more practical way for localization. While the
utilization of satellite imagery for cross-view localization is an established
concept, the conventional methodology focuses primarily on image retrieval.
This paper introduces a novel approach to cross-view localization that departs
from the conventional image retrieval method. Specifically, our method develops
(1) a Geometric-align Feature Extractor (GaFE) that leverages measured 3D
points to bridge the geometric gap between ground and overhead views, (2) a
Pose Aware Branch (PAB) adopting a triplet loss to encourage pose-aware feature
extraction, and (3) a Recursive Pose Refine Branch (RPRB) using the
Levenberg-Marquardt (LM) algorithm to align the initial pose towards the true
vehicle pose iteratively. Our method is validated on KITTI and Ford Multi-AV
Seasonal datasets as ground view and Google Maps as the satellite view. The
results demonstrate the superiority of our method in cross-view localization
with median spatial and angular errors within meter and ,
respectively.Comment: Accepted by ICRA202
Loss minimization yields multicalibration for large neural networks
Multicalibration is a notion of fairness that aims to provide accurate
predictions across a large set of groups. Multicalibration is known to be a
different goal than loss minimization, even for simple predictors such as
linear functions. In this note, we show that for (almost all) large neural
network sizes, optimally minimizing squared error leads to multicalibration.
Our results are about representational aspects of neural networks, and not
about algorithmic or sample complexity considerations. Previous such results
were known only for predictors that were nearly Bayes-optimal and were
therefore representation independent. We emphasize that our results do not
apply to specific algorithms for optimizing neural networks, such as SGD, and
they should not be interpreted as "fairness comes for free from optimizing
neural networks"
Security and Privacy Problems in Voice Assistant Applications: A Survey
Voice assistant applications have become omniscient nowadays. Two models that
provide the two most important functions for real-life applications (i.e.,
Google Home, Amazon Alexa, Siri, etc.) are Automatic Speech Recognition (ASR)
models and Speaker Identification (SI) models. According to recent studies,
security and privacy threats have also emerged with the rapid development of
the Internet of Things (IoT). The security issues researched include attack
techniques toward machine learning models and other hardware components widely
used in voice assistant applications. The privacy issues include technical-wise
information stealing and policy-wise privacy breaches. The voice assistant
application takes a steadily growing market share every year, but their privacy
and security issues never stopped causing huge economic losses and endangering
users' personal sensitive information. Thus, it is important to have a
comprehensive survey to outline the categorization of the current research
regarding the security and privacy problems of voice assistant applications.
This paper concludes and assesses five kinds of security attacks and three
types of privacy threats in the papers published in the top-tier conferences of
cyber security and voice domain.Comment: 5 figure
Learning Robust Visual-Semantic Embedding for Generalizable Person Re-identification
Generalizable person re-identification (Re-ID) is a very hot research topic
in machine learning and computer vision, which plays a significant role in
realistic scenarios due to its various applications in public security and
video surveillance. However, previous methods mainly focus on the visual
representation learning, while neglect to explore the potential of semantic
features during training, which easily leads to poor generalization capability
when adapted to the new domain. In this paper, we propose a Multi-Modal
Equivalent Transformer called MMET for more robust visual-semantic embedding
learning on visual, textual and visual-textual tasks respectively. To further
enhance the robust feature learning in the context of transformer, a dynamic
masking mechanism called Masked Multimodal Modeling strategy (MMM) is
introduced to mask both the image patches and the text tokens, which can
jointly works on multimodal or unimodal data and significantly boost the
performance of generalizable person Re-ID. Extensive experiments on benchmark
datasets demonstrate the competitive performance of our method over previous
approaches. We hope this method could advance the research towards
visual-semantic representation learning. Our source code is also publicly
available at https://github.com/JeremyXSC/MMET
The Metaverse: Survey, Trends, Novel Pipeline Ecosystem & Future Directions
The Metaverse offers a second world beyond reality, where boundaries are
non-existent, and possibilities are endless through engagement and immersive
experiences using the virtual reality (VR) technology. Many disciplines can
benefit from the advancement of the Metaverse when accurately developed,
including the fields of technology, gaming, education, art, and culture.
Nevertheless, developing the Metaverse environment to its full potential is an
ambiguous task that needs proper guidance and directions. Existing surveys on
the Metaverse focus only on a specific aspect and discipline of the Metaverse
and lack a holistic view of the entire process. To this end, a more holistic,
multi-disciplinary, in-depth, and academic and industry-oriented review is
required to provide a thorough study of the Metaverse development pipeline. To
address these issues, we present in this survey a novel multi-layered pipeline
ecosystem composed of (1) the Metaverse computing, networking, communications
and hardware infrastructure, (2) environment digitization, and (3) user
interactions. For every layer, we discuss the components that detail the steps
of its development. Also, for each of these components, we examine the impact
of a set of enabling technologies and empowering domains (e.g., Artificial
Intelligence, Security & Privacy, Blockchain, Business, Ethics, and Social) on
its advancement. In addition, we explain the importance of these technologies
to support decentralization, interoperability, user experiences, interactions,
and monetization. Our presented study highlights the existing challenges for
each component, followed by research directions and potential solutions. To the
best of our knowledge, this survey is the most comprehensive and allows users,
scholars, and entrepreneurs to get an in-depth understanding of the Metaverse
ecosystem to find their opportunities and potentials for contribution
Recommended from our members
Ensuring Access to Safe and Nutritious Food for All Through the Transformation of Food Systems
Conditional Adapters: Parameter-efficient Transfer Learning with Fast Inference
We propose Conditional Adapter (CoDA), a parameter-efficient transfer
learning method that also improves inference efficiency. CoDA generalizes
beyond standard adapter approaches to enable a new way of balancing speed and
accuracy using conditional computation. Starting with an existing dense
pretrained model, CoDA adds sparse activation together with a small number of
new parameters and a light-weight training phase. Our experiments demonstrate
that the CoDA approach provides an unexpectedly efficient way to transfer
knowledge. Across a variety of language, vision, and speech tasks, CoDA
achieves a 2x to 8x inference speed-up compared to the state-of-the-art Adapter
approach with moderate to no accuracy loss and the same parameter efficiency
MERMAIDE: Learning to Align Learners using Model-Based Meta-Learning
We study how a principal can efficiently and effectively intervene on the
rewards of a previously unseen learning agent in order to induce desirable
outcomes. This is relevant to many real-world settings like auctions or
taxation, where the principal may not know the learning behavior nor the
rewards of real people. Moreover, the principal should be few-shot adaptable
and minimize the number of interventions, because interventions are often
costly. We introduce MERMAIDE, a model-based meta-learning framework to train a
principal that can quickly adapt to out-of-distribution agents with different
learning strategies and reward functions. We validate this approach
step-by-step. First, in a Stackelberg setting with a best-response agent, we
show that meta-learning enables quick convergence to the theoretically known
Stackelberg equilibrium at test time, although noisy observations severely
increase the sample complexity. We then show that our model-based meta-learning
approach is cost-effective in intervening on bandit agents with unseen
explore-exploit strategies. Finally, we outperform baselines that use either
meta-learning or agent behavior modeling, in both -shot and -shot
settings with partial agent information
Testing the nomological network for the Personal Engagement Model
The study of employee engagement has been a key focus of management for over three decades. The academic literature on engagement has generated multiple definitions but there are two primary models of engagement: the Personal Engagement Model of Kahn (1990), and the Work Engagement Model (WEM) of Schaufeli et al., (2002). While the former is cited by most authors as the seminal work on engagement, research has tended to focus on elements of the model and most theoretical work on engagement has predominantly used the WEM to consider the topic.
The purpose of this study was to test all the elements of the nomological network of the PEM to determine whether the complete model of personal engagement is viable. This was done using data from a large, complex public sector workforce. Survey questions were designed to test each element of the PEM and administered to a sample of the workforce (n = 3,103). The scales were tested and refined using confirmatory factor analysis and then the model was tested determine the structure of the nomological network. This was validated and the generalisability of the final model was tested across different work and organisational types.
The results showed that the PEM is viable but there were differences from what was originally proposed by Kahn (1990). Specifically, of the three psychological conditions deemed necessary for engagement to occur, meaningfulness, safety, and availability, only meaningfulness was found to contribute to employee engagement. The model demonstrated that employees experience meaningfulness through both the nature of the work that they do and the organisation within which they do their work. Finally, the findings were replicated across employees in different work types and different organisational types.
This thesis makes five contributions to the engagement paradigm. It advances engagement theory by testing the PEM and showing that it is an adequate representation of engagement. A model for testing the causal mechanism for engagement has been articulated, demonstrating that meaningfulness in work is a primary mechanism for engagement. The research has shown the key aspects of the workplace in which employees experience meaningfulness, the nature of the work that they do and the organisation within which they do it. It has demonstrated that this is consistent across organisations and the type of work. Finally, it has developed a reliable measure of the different elements of the PEM which will support future research in this area
- …