17,730 research outputs found
Key technologies for safe and autonomous drones
Drones/UAVs are able to perform air operations that are very difficult to be performed by manned aircrafts. In addition, drones' usage brings significant economic savings and environmental benefits, while reducing risks to human life. In this paper, we present key technologies that enable development of drone systems. The technologies are identified based on the usages of drones (driven by COMP4DRONES project use cases). These technologies are grouped into four categories: U-space capabilities, system functions, payloads, and tools. Also, we present the contributions of the COMP4DRONES project to improve existing technologies. These contributions aim to ease drones’ customization, and enable their safe operation.This project has received funding from the ECSEL Joint Undertaking (JU) under grant agreement No 826610. The JU receives support from the European Union’s Horizon 2020 research and innovation programme and Spain, Austria, Belgium, Czech Republic, France, Italy, Latvia, Netherlands. The total project budget is 28,590,748.75 EUR (excluding ESIF partners), while the requested grant is 7,983,731.61 EUR to ECSEL JU, and 8,874,523.84 EUR of National and ESIF Funding. The project has been started on 1st October 2019
Towards Autonomous Selective Harvesting: A Review of Robot Perception, Robot Design, Motion Planning and Control
This paper provides an overview of the current state-of-the-art in selective
harvesting robots (SHRs) and their potential for addressing the challenges of
global food production. SHRs have the potential to increase productivity,
reduce labour costs, and minimise food waste by selectively harvesting only
ripe fruits and vegetables. The paper discusses the main components of SHRs,
including perception, grasping, cutting, motion planning, and control. It also
highlights the challenges in developing SHR technologies, particularly in the
areas of robot design, motion planning and control. The paper also discusses
the potential benefits of integrating AI and soft robots and data-driven
methods to enhance the performance and robustness of SHR systems. Finally, the
paper identifies several open research questions in the field and highlights
the need for further research and development efforts to advance SHR
technologies to meet the challenges of global food production. Overall, this
paper provides a starting point for researchers and practitioners interested in
developing SHRs and highlights the need for more research in this field.Comment: Preprint: to be appeared in Journal of Field Robotic
Vision- and tactile-based continuous multimodal intention and attention recognition for safer physical human-robot interaction
Employing skin-like tactile sensors on robots enhances both the safety and
usability of collaborative robots by adding the capability to detect human
contact. Unfortunately, simple binary tactile sensors alone cannot determine
the context of the human contact -- whether it is a deliberate interaction or
an unintended collision that requires safety manoeuvres. Many published methods
classify discrete interactions using more advanced tactile sensors or by
analysing joint torques. Instead, we propose to augment the intention
recognition capabilities of simple binary tactile sensors by adding a
robot-mounted camera for human posture analysis. Different interaction
characteristics, including touch location, human pose, and gaze direction, are
used to train a supervised machine learning algorithm to classify whether a
touch is intentional or not with an F1-score of 86%. We demonstrate that
multimodal intention recognition is significantly more accurate than monomodal
analyses with the collaborative robot Baxter. Furthermore, our method can also
continuously monitor interactions that fluidly change between intentional or
unintentional by gauging the user's attention through gaze. If a user stops
paying attention mid-task, the proposed intention and attention recognition
algorithm can activate safety features to prevent unsafe interactions. We also
employ a feature reduction technique that reduces the number of inputs to five
to achieve a more generalized low-dimensional classifier. This simplification
both reduces the amount of training data required and improves real-world
classification accuracy. It also renders the method potentially agnostic to the
robot and touch sensor architectures while achieving a high degree of task
adaptability.Comment: 11 pages, 8 figures, preprint under revie
Learning Robust Visual-Semantic Embedding for Generalizable Person Re-identification
Generalizable person re-identification (Re-ID) is a very hot research topic
in machine learning and computer vision, which plays a significant role in
realistic scenarios due to its various applications in public security and
video surveillance. However, previous methods mainly focus on the visual
representation learning, while neglect to explore the potential of semantic
features during training, which easily leads to poor generalization capability
when adapted to the new domain. In this paper, we propose a Multi-Modal
Equivalent Transformer called MMET for more robust visual-semantic embedding
learning on visual, textual and visual-textual tasks respectively. To further
enhance the robust feature learning in the context of transformer, a dynamic
masking mechanism called Masked Multimodal Modeling strategy (MMM) is
introduced to mask both the image patches and the text tokens, which can
jointly works on multimodal or unimodal data and significantly boost the
performance of generalizable person Re-ID. Extensive experiments on benchmark
datasets demonstrate the competitive performance of our method over previous
approaches. We hope this method could advance the research towards
visual-semantic representation learning. Our source code is also publicly
available at https://github.com/JeremyXSC/MMET
The Metaverse: Survey, Trends, Novel Pipeline Ecosystem & Future Directions
The Metaverse offers a second world beyond reality, where boundaries are
non-existent, and possibilities are endless through engagement and immersive
experiences using the virtual reality (VR) technology. Many disciplines can
benefit from the advancement of the Metaverse when accurately developed,
including the fields of technology, gaming, education, art, and culture.
Nevertheless, developing the Metaverse environment to its full potential is an
ambiguous task that needs proper guidance and directions. Existing surveys on
the Metaverse focus only on a specific aspect and discipline of the Metaverse
and lack a holistic view of the entire process. To this end, a more holistic,
multi-disciplinary, in-depth, and academic and industry-oriented review is
required to provide a thorough study of the Metaverse development pipeline. To
address these issues, we present in this survey a novel multi-layered pipeline
ecosystem composed of (1) the Metaverse computing, networking, communications
and hardware infrastructure, (2) environment digitization, and (3) user
interactions. For every layer, we discuss the components that detail the steps
of its development. Also, for each of these components, we examine the impact
of a set of enabling technologies and empowering domains (e.g., Artificial
Intelligence, Security & Privacy, Blockchain, Business, Ethics, and Social) on
its advancement. In addition, we explain the importance of these technologies
to support decentralization, interoperability, user experiences, interactions,
and monetization. Our presented study highlights the existing challenges for
each component, followed by research directions and potential solutions. To the
best of our knowledge, this survey is the most comprehensive and allows users,
scholars, and entrepreneurs to get an in-depth understanding of the Metaverse
ecosystem to find their opportunities and potentials for contribution
Continual Learning of Hand Gestures for Human-Robot Interaction
In this paper, we present an efficient method to incrementally learn to
classify static hand gestures. This method allows users to teach a robot to
recognize new symbols in an incremental manner. Contrary to other works which
use special sensors or external devices such as color or data gloves, our
proposed approach makes use of a single RGB camera to perform static hand
gesture recognition from 2D images. Furthermore, our system is able to
incrementally learn up to 38 new symbols using only 5 samples for each old
class, achieving a final average accuracy of over 90\%. In addition to that,
the incremental training time can be reduced to a 10\% of the time required
when using all data available
Efficacy of Information Extraction from Bar, Line, Circular, Bubble and Radar Graphs
With the emergence of enormous amounts of data, numerous ways to visualize such data have been used. Bar, circular, line, radar and bubble graphs that are ubiquitous were investigated for their effectiveness. Fourteen participants performed four types of evaluations: between categories (cities), within categories (transport modes within a city), all categories, and a direct reading within a category from a graph. The representations were presented in random order and participants were asked to respond to sixteen questions to the best of their ability after visually scanning the related graph. There were two trials on two separate days for each participant. Eye movements were recorded using an eye tracker. Bar and line graphs show superiority over circular and radial graphs in effectiveness, efficiency, and perceived ease of use primarily due to eye saccades. The radar graph had the worst performance. “Vibration-type” fill pattern could be improved by adding colors and symbolic fills. Design guidelines are proposed for the effective representation of data so that the presentation and communication of information are effective
Human Semantic Segmentation using Millimeter-Wave Radar Sparse Point Clouds
This paper presents a framework for semantic segmentation on sparse
sequential point clouds of millimeter-wave radar. Compared with cameras and
lidars, millimeter-wave radars have the advantage of not revealing privacy,
having a strong anti-interference ability, and having long detection distance.
The sparsity and capturing temporal-topological features of mmWave data is
still a problem. However, the issue of capturing the temporal-topological
coupling features under the human semantic segmentation task prevents previous
advanced segmentation methods (e.g PointNet, PointCNN, Point Transformer) from
being well utilized in practical scenarios. To address the challenge caused by
the sparsity and temporal-topological feature of the data, we (i) introduce
graph structure and topological features to the point cloud, (ii) propose a
semantic segmentation framework including a global feature-extracting module
and a sequential feature-extracting module. In addition, we design an efficient
and more fitting loss function for a better training process and segmentation
results based on graph clustering. Experimentally, we deploy representative
semantic segmentation algorithms (Transformer, GCNN, etc.) on a custom dataset.
Experimental results indicate that our model achieves mean accuracy on the
custom dataset by and outperforms the state-of-the-art
algorithms. Moreover, to validate the model's robustness, we deploy our model
on the well-known S3DIS dataset. On the S3DIS dataset, our model achieves mean
accuracy by , outperforming baseline algorithms
Semantic Segmentation Enhanced Transformer Model for Human Attention Prediction
Saliency Prediction aims to predict the attention distribution of human eyes
given an RGB image. Most of the recent state-of-the-art methods are based on
deep image feature representations from traditional CNNs. However, the
traditional convolution could not capture the global features of the image well
due to its small kernel size. Besides, the high-level factors which closely
correlate to human visual perception, e.g., objects, color, light, etc., are
not considered. Inspired by these, we propose a Transformer-based method with
semantic segmentation as another learning objective. More global cues of the
image could be captured by Transformer. In addition, simultaneously learning
the object segmentation simulates the human visual perception, which we would
verify in our investigation of human gaze control in cognitive science. We
build an extra decoder for the subtask and the multiple tasks share the same
Transformer encoder, forcing it to learn from multiple feature spaces. We find
in practice simply adding the subtask might confuse the main task learning,
hence Multi-task Attention Module is proposed to deal with the feature
interaction between the multiple learning targets. Our method achieves
competitive performance compared to other state-of-the-art methods
- …