16,226 research outputs found
Towards Autonomous Selective Harvesting: A Review of Robot Perception, Robot Design, Motion Planning and Control
This paper provides an overview of the current state-of-the-art in selective
harvesting robots (SHRs) and their potential for addressing the challenges of
global food production. SHRs have the potential to increase productivity,
reduce labour costs, and minimise food waste by selectively harvesting only
ripe fruits and vegetables. The paper discusses the main components of SHRs,
including perception, grasping, cutting, motion planning, and control. It also
highlights the challenges in developing SHR technologies, particularly in the
areas of robot design, motion planning and control. The paper also discusses
the potential benefits of integrating AI and soft robots and data-driven
methods to enhance the performance and robustness of SHR systems. Finally, the
paper identifies several open research questions in the field and highlights
the need for further research and development efforts to advance SHR
technologies to meet the challenges of global food production. Overall, this
paper provides a starting point for researchers and practitioners interested in
developing SHRs and highlights the need for more research in this field.Comment: Preprint: to be appeared in Journal of Field Robotic
Learning Robust Visual-Semantic Embedding for Generalizable Person Re-identification
Generalizable person re-identification (Re-ID) is a very hot research topic
in machine learning and computer vision, which plays a significant role in
realistic scenarios due to its various applications in public security and
video surveillance. However, previous methods mainly focus on the visual
representation learning, while neglect to explore the potential of semantic
features during training, which easily leads to poor generalization capability
when adapted to the new domain. In this paper, we propose a Multi-Modal
Equivalent Transformer called MMET for more robust visual-semantic embedding
learning on visual, textual and visual-textual tasks respectively. To further
enhance the robust feature learning in the context of transformer, a dynamic
masking mechanism called Masked Multimodal Modeling strategy (MMM) is
introduced to mask both the image patches and the text tokens, which can
jointly works on multimodal or unimodal data and significantly boost the
performance of generalizable person Re-ID. Extensive experiments on benchmark
datasets demonstrate the competitive performance of our method over previous
approaches. We hope this method could advance the research towards
visual-semantic representation learning. Our source code is also publicly
available at https://github.com/JeremyXSC/MMET
The Metaverse: Survey, Trends, Novel Pipeline Ecosystem & Future Directions
The Metaverse offers a second world beyond reality, where boundaries are
non-existent, and possibilities are endless through engagement and immersive
experiences using the virtual reality (VR) technology. Many disciplines can
benefit from the advancement of the Metaverse when accurately developed,
including the fields of technology, gaming, education, art, and culture.
Nevertheless, developing the Metaverse environment to its full potential is an
ambiguous task that needs proper guidance and directions. Existing surveys on
the Metaverse focus only on a specific aspect and discipline of the Metaverse
and lack a holistic view of the entire process. To this end, a more holistic,
multi-disciplinary, in-depth, and academic and industry-oriented review is
required to provide a thorough study of the Metaverse development pipeline. To
address these issues, we present in this survey a novel multi-layered pipeline
ecosystem composed of (1) the Metaverse computing, networking, communications
and hardware infrastructure, (2) environment digitization, and (3) user
interactions. For every layer, we discuss the components that detail the steps
of its development. Also, for each of these components, we examine the impact
of a set of enabling technologies and empowering domains (e.g., Artificial
Intelligence, Security & Privacy, Blockchain, Business, Ethics, and Social) on
its advancement. In addition, we explain the importance of these technologies
to support decentralization, interoperability, user experiences, interactions,
and monetization. Our presented study highlights the existing challenges for
each component, followed by research directions and potential solutions. To the
best of our knowledge, this survey is the most comprehensive and allows users,
scholars, and entrepreneurs to get an in-depth understanding of the Metaverse
ecosystem to find their opportunities and potentials for contribution
The Viability and Potential Consequences of IoT-Based Ransomware
With the increased threat of ransomware and the substantial growth of the Internet of Things (IoT) market, there is significant motivation for attackers to carry out IoT-based ransomware campaigns. In this thesis, the viability of such malware is tested.
As part of this work, various techniques that could be used by ransomware developers to attack commercial IoT devices were explored. First, methods that attackers could use to communicate with the victim were examined, such that a ransom note was able to be reliably sent to a victim. Next, the viability of using "bricking" as a method of ransom was evaluated, such that devices could be remotely disabled unless the victim makes a payment to the attacker. Research was then performed to ascertain whether it was possible to remotely gain persistence on IoT devices, which would improve the efficacy of existing ransomware methods, and provide opportunities for more advanced ransomware to be created. Finally, after successfully identifying a number of persistence techniques, the viability of privacy-invasion based ransomware was analysed.
For each assessed technique, proofs of concept were developed. A range of devices -- with various intended purposes, such as routers, cameras and phones -- were used to test the viability of these proofs of concept. To test communication hijacking, devices' "channels of communication" -- such as web services and embedded screens -- were identified, then hijacked to display custom ransom notes. During the analysis of bricking-based ransomware, a working proof of concept was created, which was then able to remotely brick five IoT devices. After analysing the storage design of an assortment of IoT devices, six different persistence techniques were identified, which were then successfully tested on four devices, such that malicious filesystem modifications would be retained after the device was rebooted. When researching privacy-invasion based ransomware, several methods were created to extract information from data sources that can be commonly found on IoT devices, such as nearby WiFi signals, images from cameras, or audio from microphones. These were successfully implemented in a test environment such that ransomable data could be extracted, processed, and stored for later use to blackmail the victim.
Overall, IoT-based ransomware has not only been shown to be viable but also highly damaging to both IoT devices and their users. While the use of IoT-ransomware is still very uncommon "in the wild", the techniques demonstrated within this work highlight an urgent need to improve the security of IoT devices to avoid the risk of IoT-based ransomware causing havoc in our society. Finally, during the development of these proofs of concept, a number of potential countermeasures were identified, which can be used to limit the effectiveness of the attacking techniques discovered in this PhD research
Multi-Graph Convolution Network for Pose Forecasting
Recently, there has been a growing interest in predicting human motion, which
involves forecasting future body poses based on observed pose sequences. This
task is complex due to modeling spatial and temporal relationships. The most
commonly used models for this task are autoregressive models, such as recurrent
neural networks (RNNs) or variants, and Transformer Networks. However, RNNs
have several drawbacks, such as vanishing or exploding gradients. Other
researchers have attempted to solve the communication problem in the spatial
dimension by integrating Graph Convolutional Networks (GCN) and Long Short-Term
Memory (LSTM) models. These works deal with temporal and spatial information
separately, which limits the effectiveness. To fix this problem, we propose a
novel approach called the multi-graph convolution network (MGCN) for 3D human
pose forecasting. This model simultaneously captures spatial and temporal
information by introducing an augmented graph for pose sequences. Multiple
frames give multiple parts, joined together in a single graph instance.
Furthermore, we also explore the influence of natural structure and
sequence-aware attention to our model. In our experimental evaluation of the
large-scale benchmark datasets, Human3.6M, AMSS and 3DPW, MGCN outperforms the
state-of-the-art in pose prediction.Comment: arXiv admin note: text overlap with arXiv:2110.04573 by other author
Conditional Adapters: Parameter-efficient Transfer Learning with Fast Inference
We propose Conditional Adapter (CoDA), a parameter-efficient transfer
learning method that also improves inference efficiency. CoDA generalizes
beyond standard adapter approaches to enable a new way of balancing speed and
accuracy using conditional computation. Starting with an existing dense
pretrained model, CoDA adds sparse activation together with a small number of
new parameters and a light-weight training phase. Our experiments demonstrate
that the CoDA approach provides an unexpectedly efficient way to transfer
knowledge. Across a variety of language, vision, and speech tasks, CoDA
achieves a 2x to 8x inference speed-up compared to the state-of-the-art Adapter
approach with moderate to no accuracy loss and the same parameter efficiency
NF-Atlas: Multi-Volume Neural Feature Fields for Large Scale LiDAR Mapping
LiDAR Mapping has been a long-standing problem in robotics. Recent progress
in neural implicit representation has brought new opportunities to robotic
mapping. In this paper, we propose the multi-volume neural feature fields,
called NF-Atlas, which bridge the neural feature volumes with pose graph
optimization. By regarding the neural feature volume as pose graph nodes and
the relative pose between volumes as pose graph edges, the entire neural
feature field becomes both locally rigid and globally elastic. Locally, the
neural feature volume employs a sparse feature Octree and a small MLP to encode
the submap SDF with an option of semantics. Learning the map using this
structure allows for end-to-end solving of maximum a posteriori (MAP) based
probabilistic mapping. Globally, the map is built volume by volume
independently, avoiding catastrophic forgetting when mapping incrementally.
Furthermore, when a loop closure occurs, with the elastic pose graph based
representation, only updating the origin of neural volumes is required without
remapping. Finally, these functionalities of NF-Atlas are validated. Thanks to
the sparsity and the optimization based formulation, NF-Atlas shows competitive
performance in terms of accuracy, efficiency and memory usage on both
simulation and real-world datasets
One Small Step for Generative AI, One Giant Leap for AGI: A Complete Survey on ChatGPT in AIGC Era
OpenAI has recently released GPT-4 (a.k.a. ChatGPT plus), which is
demonstrated to be one small step for generative AI (GAI), but one giant leap
for artificial general intelligence (AGI). Since its official release in
November 2022, ChatGPT has quickly attracted numerous users with extensive
media coverage. Such unprecedented attention has also motivated numerous
researchers to investigate ChatGPT from various aspects. According to Google
scholar, there are more than 500 articles with ChatGPT in their titles or
mentioning it in their abstracts. Considering this, a review is urgently
needed, and our work fills this gap. Overall, this work is the first to survey
ChatGPT with a comprehensive review of its underlying technology, applications,
and challenges. Moreover, we present an outlook on how ChatGPT might evolve to
realize general-purpose AIGC (a.k.a. AI-generated content), which will be a
significant milestone for the development of AGI.Comment: A Survey on ChatGPT and GPT-4, 29 pages. Feedback is appreciated
([email protected]
Testing the nomological network for the Personal Engagement Model
The study of employee engagement has been a key focus of management for over three decades. The academic literature on engagement has generated multiple definitions but there are two primary models of engagement: the Personal Engagement Model of Kahn (1990), and the Work Engagement Model (WEM) of Schaufeli et al., (2002). While the former is cited by most authors as the seminal work on engagement, research has tended to focus on elements of the model and most theoretical work on engagement has predominantly used the WEM to consider the topic.
The purpose of this study was to test all the elements of the nomological network of the PEM to determine whether the complete model of personal engagement is viable. This was done using data from a large, complex public sector workforce. Survey questions were designed to test each element of the PEM and administered to a sample of the workforce (n = 3,103). The scales were tested and refined using confirmatory factor analysis and then the model was tested determine the structure of the nomological network. This was validated and the generalisability of the final model was tested across different work and organisational types.
The results showed that the PEM is viable but there were differences from what was originally proposed by Kahn (1990). Specifically, of the three psychological conditions deemed necessary for engagement to occur, meaningfulness, safety, and availability, only meaningfulness was found to contribute to employee engagement. The model demonstrated that employees experience meaningfulness through both the nature of the work that they do and the organisation within which they do their work. Finally, the findings were replicated across employees in different work types and different organisational types.
This thesis makes five contributions to the engagement paradigm. It advances engagement theory by testing the PEM and showing that it is an adequate representation of engagement. A model for testing the causal mechanism for engagement has been articulated, demonstrating that meaningfulness in work is a primary mechanism for engagement. The research has shown the key aspects of the workplace in which employees experience meaningfulness, the nature of the work that they do and the organisation within which they do it. It has demonstrated that this is consistent across organisations and the type of work. Finally, it has developed a reliable measure of the different elements of the PEM which will support future research in this area
Modularizing and Assembling Cognitive Map Learners via Hyperdimensional Computing
Biological organisms must learn how to control their own bodies to achieve
deliberate locomotion, that is, predict their next body position based on their
current position and selected action. Such learning is goal-agnostic with
respect to maximizing (minimizing) an environmental reward (penalty) signal. A
cognitive map learner (CML) is a collection of three separate yet
collaboratively trained artificial neural networks which learn to construct
representations for the node states and edge actions of an arbitrary
bidirectional graph. In so doing, a CML learns how to traverse the graph nodes;
however, the CML does not learn when and why to move from one node state to
another. This work created CMLs with node states expressed as high dimensional
vectors suitable for hyperdimensional computing (HDC), a form of symbolic
machine learning (ML). In so doing, graph knowledge (CML) was segregated from
target node selection (HDC), allowing each ML approach to be trained
independently. The first approach used HDC to engineer an arbitrary number of
hierarchical CMLs, where each graph node state specified target node states for
the next lower level CMLs to traverse to. Second, an HDC-based
stimulus-response experience model was demonstrated per CML. Because
hypervectors may be in superposition with each other, multiple experience
models were added together and run in parallel without any retraining. Lastly,
a CML-HDC ML unit was modularized: trained with proxy symbols such that
arbitrary, application-specific stimulus symbols could be operated upon without
retraining either CML or HDC model. These methods provide a template for
engineering heterogenous ML systems
- …