19,099 research outputs found

    Towards Autonomous Selective Harvesting: A Review of Robot Perception, Robot Design, Motion Planning and Control

    Full text link
    This paper provides an overview of the current state-of-the-art in selective harvesting robots (SHRs) and their potential for addressing the challenges of global food production. SHRs have the potential to increase productivity, reduce labour costs, and minimise food waste by selectively harvesting only ripe fruits and vegetables. The paper discusses the main components of SHRs, including perception, grasping, cutting, motion planning, and control. It also highlights the challenges in developing SHR technologies, particularly in the areas of robot design, motion planning and control. The paper also discusses the potential benefits of integrating AI and soft robots and data-driven methods to enhance the performance and robustness of SHR systems. Finally, the paper identifies several open research questions in the field and highlights the need for further research and development efforts to advance SHR technologies to meet the challenges of global food production. Overall, this paper provides a starting point for researchers and practitioners interested in developing SHRs and highlights the need for more research in this field.Comment: Preprint: to be appeared in Journal of Field Robotic

    Modularizing and Assembling Cognitive Map Learners via Hyperdimensional Computing

    Full text link
    Biological organisms must learn how to control their own bodies to achieve deliberate locomotion, that is, predict their next body position based on their current position and selected action. Such learning is goal-agnostic with respect to maximizing (minimizing) an environmental reward (penalty) signal. A cognitive map learner (CML) is a collection of three separate yet collaboratively trained artificial neural networks which learn to construct representations for the node states and edge actions of an arbitrary bidirectional graph. In so doing, a CML learns how to traverse the graph nodes; however, the CML does not learn when and why to move from one node state to another. This work created CMLs with node states expressed as high dimensional vectors suitable for hyperdimensional computing (HDC), a form of symbolic machine learning (ML). In so doing, graph knowledge (CML) was segregated from target node selection (HDC), allowing each ML approach to be trained independently. The first approach used HDC to engineer an arbitrary number of hierarchical CMLs, where each graph node state specified target node states for the next lower level CMLs to traverse to. Second, an HDC-based stimulus-response experience model was demonstrated per CML. Because hypervectors may be in superposition with each other, multiple experience models were added together and run in parallel without any retraining. Lastly, a CML-HDC ML unit was modularized: trained with proxy symbols such that arbitrary, application-specific stimulus symbols could be operated upon without retraining either CML or HDC model. These methods provide a template for engineering heterogenous ML systems

    CoRe-Sleep: A Multimodal Fusion Framework for Time Series Robust to Imperfect Modalities

    Full text link
    Sleep abnormalities can have severe health consequences. Automated sleep staging, i.e. labelling the sequence of sleep stages from the patient's physiological recordings, could simplify the diagnostic process. Previous work on automated sleep staging has achieved great results, mainly relying on the EEG signal. However, often multiple sources of information are available beyond EEG. This can be particularly beneficial when the EEG recordings are noisy or even missing completely. In this paper, we propose CoRe-Sleep, a Coordinated Representation multimodal fusion network that is particularly focused on improving the robustness of signal analysis on imperfect data. We demonstrate how appropriately handling multimodal information can be the key to achieving such robustness. CoRe-Sleep tolerates noisy or missing modalities segments, allowing training on incomplete data. Additionally, it shows state-of-the-art performance when testing on both multimodal and unimodal data using a single model on SHHS-1, the largest publicly available study that includes sleep stage labels. The results indicate that training the model on multimodal data does positively influence performance when tested on unimodal data. This work aims at bridging the gap between automated analysis tools and their clinical utility.Comment: 10 pages, 4 figures, 2 tables, journa

    TransFusionOdom: Interpretable Transformer-based LiDAR-Inertial Fusion Odometry Estimation

    Full text link
    Multi-modal fusion of sensors is a commonly used approach to enhance the performance of odometry estimation, which is also a fundamental module for mobile robots. However, the question of \textit{how to perform fusion among different modalities in a supervised sensor fusion odometry estimation task?} is still one of challenging issues remains. Some simple operations, such as element-wise summation and concatenation, are not capable of assigning adaptive attentional weights to incorporate different modalities efficiently, which make it difficult to achieve competitive odometry results. Recently, the Transformer architecture has shown potential for multi-modal fusion tasks, particularly in the domains of vision with language. In this work, we propose an end-to-end supervised Transformer-based LiDAR-Inertial fusion framework (namely TransFusionOdom) for odometry estimation. The multi-attention fusion module demonstrates different fusion approaches for homogeneous and heterogeneous modalities to address the overfitting problem that can arise from blindly increasing the complexity of the model. Additionally, to interpret the learning process of the Transformer-based multi-modal interactions, a general visualization approach is introduced to illustrate the interactions between modalities. Moreover, exhaustive ablation studies evaluate different multi-modal fusion strategies to verify the performance of the proposed fusion strategy. A synthetic multi-modal dataset is made public to validate the generalization ability of the proposed fusion strategy, which also works for other combinations of different modalities. The quantitative and qualitative odometry evaluations on the KITTI dataset verify the proposed TransFusionOdom could achieve superior performance compared with other related works.Comment: Submitted to IEEE Sensors Journal with some modifications. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessibl

    Copy-paste data augmentation for domain transfer on traffic signs

    Get PDF
    City streets carry a lot of information that can be exploited to improve the quality of the services the citizens receive. For example, autonomous vehicles need to act accordingly to all the element that are nearby the vehicle itself, like pedestrians, traffic signs and other vehicles. It is also possible to use such information for smart city applications, for example to predict and analyze the traffic or pedestrian flows. Among all the objects that it is possible to find in a street, traffic signs are very important because of the information they carry. This information can in fact be exploited both for autonomous driving and for smart city applications. Deep learning and, more generally, machine learning models however need huge quantities to learn. Even though modern models are very good at gener- alizing, the more samples the model has, the better it can generalize between different samples. Creating these datasets organically, namely with real pictures, is a very tedious task because of the wide variety of signs available in the whole world and especially because of all the possible light, orientation conditions and con- ditions in general in which they can appear. In addition to that, it may not be easy to collect enough samples for all the possible traffic signs available, cause some of them may be very rare to find. Instead of collecting pictures manually, it is possible to exploit data aug- mentation techniques to create synthetic datasets containing the signs that are needed. Creating this data synthetically allows to control the distribution and the conditions of the signs in the datasets, improving the quality and quantity of training data that is going to be used. This thesis work is about using copy-paste data augmentation to create synthetic data for the traffic sign recognition task

    Neural Architecture Search: Insights from 1000 Papers

    Full text link
    In the past decade, advances in deep learning have resulted in breakthroughs in a variety of areas, including computer vision, natural language understanding, speech recognition, and reinforcement learning. Specialized, high-performing neural architectures are crucial to the success of deep learning in these areas. Neural architecture search (NAS), the process of automating the design of neural architectures for a given task, is an inevitable next step in automating machine learning and has already outpaced the best human-designed architectures on many tasks. In the past few years, research in NAS has been progressing rapidly, with over 1000 papers released since 2020 (Deng and Lindauer, 2021). In this survey, we provide an organized and comprehensive guide to neural architecture search. We give a taxonomy of search spaces, algorithms, and speedup techniques, and we discuss resources such as benchmarks, best practices, other surveys, and open-source libraries

    Open Set Classification of GAN-based Image Manipulations via a ViT-based Hybrid Architecture

    Full text link
    Classification of AI-manipulated content is receiving great attention, for distinguishing different types of manipulations. Most of the methods developed so far fail in the open-set scenario, that is when the algorithm used for the manipulation is not represented by the training set. In this paper, we focus on the classification of synthetic face generation and manipulation in open-set scenarios, and propose a method for classification with a rejection option. The proposed method combines the use of Vision Transformers (ViT) with a hybrid approach for simultaneous classification and localization. Feature map correlation is exploited by the ViT module, while a localization branch is employed as an attention mechanism to force the model to learn per-class discriminative features associated with the forgery when the manipulation is performed locally in the image. Rejection is performed by considering several strategies and analyzing the model output layers. The effectiveness of the proposed method is assessed for the task of classification of facial attribute editing and GAN attribution

    Mitigating Spurious Correlations in Multi-modal Models during Fine-tuning

    Full text link
    Spurious correlations that degrade model generalization or lead the model to be right for the wrong reasons are one of the main robustness concerns for real-world deployments. However, mitigating these correlations during pre-training for large-scale models can be costly and impractical, particularly for those without access to high-performance computing resources. This paper proposes a novel approach to address spurious correlations during fine-tuning for a given domain of interest. With a focus on multi-modal models (e.g., CLIP), the proposed method leverages different modalities in these models to detect and explicitly set apart spurious attributes from the affected class, achieved through a multi-modal contrastive loss function that expresses spurious relationships through language. Our experimental results and in-depth visualizations on CLIP show that such an intervention can effectively i) improve the model's accuracy when spurious attributes are not present, and ii) directs the model's activation maps towards the actual class rather than the spurious attribute when present. In particular, on the Waterbirds dataset, our algorithm achieved a worst-group accuracy 23% higher than ERM on CLIP with a ResNet-50 backbone, and 32% higher on CLIP with a ViT backbone, while maintaining the same average accuracy as ERM

    INFERENSI KONTEKS BERDASARKAN ANALISIS RELASI MAKNA WEBTOON “SMILE BRUSH: MY OLD PICTURES”

    Get PDF
    The study in this research is oriented to the analysis and description of inferences on the context and a comprehensive understanding of other linguistic variables in the text and discourse in it. The research data are lingual lexical units and phrases that show the relation of synonymy and polysemy meanings in the narrative text of the comic "Smile Brush: My Old Pictures" by Waroo, which can be accessed on the Webtoon platform. The data is processed using descriptive qualitative linguistic research characteristics combined with ethnoscience analysis. Data was occupied by the distribution method using the BUL/Direct Element Sharing technique and coding. The result states that this inference is the conclusion of cognition based on the context built by involving participants, awareness, and over-paradigmatic relations to syntagmatic other ties. This inference is the role of the association of meaning to other linguistic units in understanding the context in terminating inference. The process and conclusion of all these factors and variables show the stimulative, systemic, and holistic linguistic correlation of metafunctions and stratification of linguistic domains

    GETT-QA: Graph Embedding based T2T Transformer for Knowledge Graph Question Answering

    Full text link
    In this work, we present an end-to-end Knowledge Graph Question Answering (KGQA) system named GETT-QA. GETT-QA uses T5, a popular text-to-text pre-trained language model. The model takes a question in natural language as input and produces a simpler form of the intended SPARQL query. In the simpler form, the model does not directly produce entity and relation IDs. Instead, it produces corresponding entity and relation labels. The labels are grounded to KG entity and relation IDs in a subsequent step. To further improve the results, we instruct the model to produce a truncated version of the KG embedding for each entity. The truncated KG embedding enables a finer search for disambiguation purposes. We find that T5 is able to learn the truncated KG embeddings without any change of loss function, improving KGQA performance. As a result, we report strong results for LC-QuAD 2.0 and SimpleQuestions-Wikidata datasets on end-to-end KGQA over Wikidata.Comment: 16 pages single column format accepted at ESWC 2023 research trac
    corecore