11,870 research outputs found

    Modularizing and Assembling Cognitive Map Learners via Hyperdimensional Computing

    Full text link
    Biological organisms must learn how to control their own bodies to achieve deliberate locomotion, that is, predict their next body position based on their current position and selected action. Such learning is goal-agnostic with respect to maximizing (minimizing) an environmental reward (penalty) signal. A cognitive map learner (CML) is a collection of three separate yet collaboratively trained artificial neural networks which learn to construct representations for the node states and edge actions of an arbitrary bidirectional graph. In so doing, a CML learns how to traverse the graph nodes; however, the CML does not learn when and why to move from one node state to another. This work created CMLs with node states expressed as high dimensional vectors suitable for hyperdimensional computing (HDC), a form of symbolic machine learning (ML). In so doing, graph knowledge (CML) was segregated from target node selection (HDC), allowing each ML approach to be trained independently. The first approach used HDC to engineer an arbitrary number of hierarchical CMLs, where each graph node state specified target node states for the next lower level CMLs to traverse to. Second, an HDC-based stimulus-response experience model was demonstrated per CML. Because hypervectors may be in superposition with each other, multiple experience models were added together and run in parallel without any retraining. Lastly, a CML-HDC ML unit was modularized: trained with proxy symbols such that arbitrary, application-specific stimulus symbols could be operated upon without retraining either CML or HDC model. These methods provide a template for engineering heterogenous ML systems

    ENABLING EFFICIENT FLEET COMPOSITION SELECTION THROUGH THE DEVELOPMENT OF A RANK HEURISTIC FOR A BRANCH AND BOUND METHOD

    Get PDF
    In the foreseeable future, autonomous mobile robots (AMRs) will become a key enabler for increasing productivity and flexibility in material handling in warehousing facilities, distribution centers and manufacturing systems. The objective of this research is to develop and validate parametric models of AMRs, develop ranking heuristic using a physics-based algorithm within the framework of the Branch and Bound method, integrate the ranking algorithm into a Fleet Composition Optimization (FCO) tool, and finally conduct simulations under various scenarios to verify the suitability and robustness of the developed tool in a factory equipped with AMRs. Kinematic-based equations are used for computing both energy and time consumption. Multivariate linear regression, a data-driven method, is used for designing the ranking heuristic. The results indicate that the unique physical structures and parameters of each robot are the main factors contributing to differences in energy and time consumption. improvement on reducing computation time was achieved by comparing heuristic-based search and non-heuristic-based search. This research is expected to significantly improve the current nested fleet composition optimization tool by reducing computation time without sacrificing optimality. From a practical perspective, greater efficiency in reducing energy and time costs can be achieved.Ford Motor CompanyNo embargoAcademic Major: Aerospace Engineerin

    Kurcuma: a kitchen utensil recognition collection for unsupervised domain adaptation

    Get PDF
    The use of deep learning makes it possible to achieve extraordinary results in all kinds of tasks related to computer vision. However, this performance is strongly related to the availability of training data and its relationship with the distribution in the eventual application scenario. This question is of vital importance in areas such as robotics, where the targeted environment data are barely available in advance. In this context, domain adaptation (DA) techniques are especially important to building models that deal with new data for which the corresponding label is not available. To promote further research in DA techniques applied to robotics, this work presents Kurcuma (Kitchen Utensil Recognition Collection for Unsupervised doMain Adaptation), an assortment of seven datasets for the classification of kitchen utensils—a task of relevance in home-assistance robotics and a suitable showcase for DA. Along with the data, we provide a broad description of the main characteristics of the dataset, as well as a baseline using the well-known domain-adversarial training of neural networks approach. The results show the challenge posed by DA on these types of tasks, pointing to the need for new approaches in future work.Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature. This work was supported by the I+D+i project TED2021-132103A-I00 (DOREMI), funded by MCIN/AEI/10.13039/501100011033. Some of the computing resources were provided by the Generalitat Valenciana and the European Union through the FEDER funding program (IDIFEDER/2020/003). The second author is supported by grant APOSTD/2020/256 from “Programa I+D+i de la Generalitat Valenciana”

    Neural Architecture Search: Insights from 1000 Papers

    Full text link
    In the past decade, advances in deep learning have resulted in breakthroughs in a variety of areas, including computer vision, natural language understanding, speech recognition, and reinforcement learning. Specialized, high-performing neural architectures are crucial to the success of deep learning in these areas. Neural architecture search (NAS), the process of automating the design of neural architectures for a given task, is an inevitable next step in automating machine learning and has already outpaced the best human-designed architectures on many tasks. In the past few years, research in NAS has been progressing rapidly, with over 1000 papers released since 2020 (Deng and Lindauer, 2021). In this survey, we provide an organized and comprehensive guide to neural architecture search. We give a taxonomy of search spaces, algorithms, and speedup techniques, and we discuss resources such as benchmarks, best practices, other surveys, and open-source libraries

    Human-Art: A Versatile Human-Centric Dataset Bridging Natural and Artificial Scenes

    Full text link
    Humans have long been recorded in a variety of forms since antiquity. For example, sculptures and paintings were the primary media for depicting human beings before the invention of cameras. However, most current human-centric computer vision tasks like human pose estimation and human image generation focus exclusively on natural images in the real world. Artificial humans, such as those in sculptures, paintings, and cartoons, are commonly neglected, making existing models fail in these scenarios. As an abstraction of life, art incorporates humans in both natural and artificial scenes. We take advantage of it and introduce the Human-Art dataset to bridge related tasks in natural and artificial scenarios. Specifically, Human-Art contains 50k high-quality images with over 123k person instances from 5 natural and 15 artificial scenarios, which are annotated with bounding boxes, keypoints, self-contact points, and text information for humans represented in both 2D and 3D. It is, therefore, comprehensive and versatile for various downstream tasks. We also provide a rich set of baseline results and detailed analyses for related tasks, including human detection, 2D and 3D human pose estimation, image generation, and motion transfer. As a challenging dataset, we hope Human-Art can provide insights for relevant research and open up new research questions.Comment: CVPR202

    Procedure-Aware Pretraining for Instructional Video Understanding

    Full text link
    Our goal is to learn a video representation that is useful for downstream procedure understanding tasks in instructional videos. Due to the small amount of available annotations, a key challenge in procedure understanding is to be able to extract from unlabeled videos the procedural knowledge such as the identity of the task (e.g., 'make latte'), its steps (e.g., 'pour milk'), or the potential next steps given partial progress in its execution. Our main insight is that instructional videos depict sequences of steps that repeat between instances of the same or different tasks, and that this structure can be well represented by a Procedural Knowledge Graph (PKG), where nodes are discrete steps and edges connect steps that occur sequentially in the instructional activities. This graph can then be used to generate pseudo labels to train a video representation that encodes the procedural knowledge in a more accessible form to generalize to multiple procedure understanding tasks. We build a PKG by combining information from a text-based procedural knowledge database and an unlabeled instructional video corpus and then use it to generate training pseudo labels with four novel pre-training objectives. We call this PKG-based pre-training procedure and the resulting model Paprika, Procedure-Aware PRe-training for Instructional Knowledge Acquisition. We evaluate Paprika on COIN and CrossTask for procedure understanding tasks such as task recognition, step recognition, and step forecasting. Paprika yields a video representation that improves over the state of the art: up to 11.23% gains in accuracy in 12 evaluation settings. Implementation is available at https://github.com/salesforce/paprika.Comment: CVPR 202

    Deep Transfer Learning Applications in Intrusion Detection Systems: A Comprehensive Review

    Full text link
    Globally, the external Internet is increasingly being connected to the contemporary industrial control system. As a result, there is an immediate need to protect the network from several threats. The key infrastructure of industrial activity may be protected from harm by using an intrusion detection system (IDS), a preventive measure mechanism, to recognize new kinds of dangerous threats and hostile activities. The most recent artificial intelligence (AI) techniques used to create IDS in many kinds of industrial control networks are examined in this study, with a particular emphasis on IDS-based deep transfer learning (DTL). This latter can be seen as a type of information fusion that merge, and/or adapt knowledge from multiple domains to enhance the performance of the target task, particularly when the labeled data in the target domain is scarce. Publications issued after 2015 were taken into account. These selected publications were divided into three categories: DTL-only and IDS-only are involved in the introduction and background, and DTL-based IDS papers are involved in the core papers of this review. Researchers will be able to have a better grasp of the current state of DTL approaches used in IDS in many different types of networks by reading this review paper. Other useful information, such as the datasets used, the sort of DTL employed, the pre-trained network, IDS techniques, the evaluation metrics including accuracy/F-score and false alarm rate (FAR), and the improvement gained, were also covered. The algorithms, and methods used in several studies, or illustrate deeply and clearly the principle in any DTL-based IDS subcategory are presented to the reader

    Saliency-aware Stereoscopic Video Retargeting

    Full text link
    Stereo video retargeting aims to resize an image to a desired aspect ratio. The quality of retargeted videos can be significantly impacted by the stereo videos spatial, temporal, and disparity coherence, all of which can be impacted by the retargeting process. Due to the lack of a publicly accessible annotated dataset, there is little research on deep learning-based methods for stereo video retargeting. This paper proposes an unsupervised deep learning-based stereo video retargeting network. Our model first detects the salient objects and shifts and warps all objects such that it minimizes the distortion of the salient parts of the stereo frames. We use 1D convolution for shifting the salient objects and design a stereo video Transformer to assist the retargeting process. To train the network, we use the parallax attention mechanism to fuse the left and right views and feed the retargeted frames to a reconstruction module that reverses the retargeted frames to the input frames. Therefore, the network is trained in an unsupervised manner. Extensive qualitative and quantitative experiments and ablation studies on KITTI stereo 2012 and 2015 datasets demonstrate the efficiency of the proposed method over the existing state-of-the-art methods. The code is available at https://github.com/z65451/SVR/.Comment: 8 pages excluding references. CVPRW conferenc

    Modelling uncertainties for measurements of the H → γγ Channel with the ATLAS Detector at the LHC

    Get PDF
    The Higgs boson to diphoton (H → γγ) branching ratio is only 0.227 %, but this final state has yielded some of the most precise measurements of the particle. As measurements of the Higgs boson become increasingly precise, greater import is placed on the factors that constitute the uncertainty. Reducing the effects of these uncertainties requires an understanding of their causes. The research presented in this thesis aims to illuminate how uncertainties on simulation modelling are determined and proffers novel techniques in deriving them. The upgrade of the FastCaloSim tool is described, used for simulating events in the ATLAS calorimeter at a rate far exceeding the nominal detector simulation, Geant4. The integration of a method that allows the toolbox to emulate the accordion geometry of the liquid argon calorimeters is detailed. This tool allows for the production of larger samples while using significantly fewer computing resources. A measurement of the total Higgs boson production cross-section multiplied by the diphoton branching ratio (σ × Bγγ) is presented, where this value was determined to be (σ × Bγγ)obs = 127 ± 7 (stat.) ± 7 (syst.) fb, within agreement with the Standard Model prediction. The signal and background shape modelling is described, and the contribution of the background modelling uncertainty to the total uncertainty ranges from 18–2.4 %, depending on the Higgs boson production mechanism. A method for estimating the number of events in a Monte Carlo background sample required to model the shape is detailed. It was found that the size of the nominal γγ background events sample required a multiplicative increase by a factor of 3.60 to adequately model the background with a confidence level of 68 %, or a factor of 7.20 for a confidence level of 95 %. Based on this estimate, 0.5 billion additional simulated events were produced, substantially reducing the background modelling uncertainty. A technique is detailed for emulating the effects of Monte Carlo event generator differences using multivariate reweighting. The technique is used to estimate the event generator uncertainty on the signal modelling of tHqb events, improving the reliability of estimating the tHqb production cross-section. Then this multivariate reweighting technique is used to estimate the generator modelling uncertainties on background V γγ samples for the first time. The estimated uncertainties were found to be covered by the currently assumed background modelling uncertainty
    corecore