Search CORE

2,256 research outputs found

Tuning Modular Networks with Weighted Losses for Hand-Eye Coordination

Author: Corke Peter I.
Leitner Jürgen
Milford Michael
Zhang Fangyi
Publication venue
Publication date: 15/05/2017
Field of study

This paper introduces an end-to-end fine-tuning method to improve hand-eye coordination in modular deep visuo-motor policies (modular networks) where each module is trained independently. Benefiting from weighted losses, the fine-tuning method significantly improves the performance of the policies for a robotic planar reaching task.Comment: 2 pages, to appear in the Deep Learning for Robotic Vision (DLRV) Workshop in CVPR 201

arXiv.org e-Print Archive

Crossref

Queensland University of Technology ePrints Archive

Adversarial Discriminative Sim-to-real Transfer of Visuo-motor Policies

Author: Corke Peter
Ge Zongyuan
Leitner Jürgen
Milford Michael
Zhang Fangyi
Publication venue
Publication date: 31/05/2018
Field of study

Various approaches have been proposed to learn visuo-motor policies for real-world robotic applications. One solution is first learning in simulation then transferring to the real world. In the transfer, most existing approaches need real-world images with labels. However, the labelling process is often expensive or even impractical in many robotic applications. In this paper, we propose an adversarial discriminative sim-to-real transfer approach to reduce the cost of labelling real data. The effectiveness of the approach is demonstrated with modular networks in a table-top object reaching task where a 7 DoF arm is controlled in velocity mode to reach a blue cuboid in clutter through visual observations. The adversarial transfer approach reduced the labelled real data requirement by 50%. Policies can be transferred to real environments with only 93 labelled and 186 unlabelled real images. The transferred visuo-motor policies are robust to novel (not seen in training) objects in clutter and even a moving target, achieving a 97.8% success rate and 1.8 cm control accuracy.Comment: Under review for the International Journal of Robotics Researc

arXiv.org e-Print Archive

Queensland University of Technology ePrints Archive

Concept-Centric Transformers: Enhancing Model Interpretability through Object-Centric Concept Learning within a Shared Global Workspace

Author: Hong Jinyung
Park Keun Hee
Pavlic Theodore P.
Publication venue
Publication date: 08/09/2023
Field of study

To explain "black-box" properties of AI models, many approaches, such as post hoc and intrinsically interpretable models, have been proposed to provide plausible explanations that identify human-understandable features/concepts that a trained model uses to make predictions, and attention mechanisms have been widely used to aid in model interpretability by visualizing that information. However, the problem of configuring an interpretable model that effectively communicates and coordinates among computational modules has received less attention. A recently proposed shared global workspace theory demonstrated that networks of distributed modules can benefit from sharing information with a bandwidth-limited working memory because the communication constraints encourage specialization, compositionality, and synchronization among the modules. Inspired by this, we consider how such shared working memories can be realized to build intrinsically interpretable models with better interpretability and performance. Toward this end, we propose Concept-Centric Transformers, a simple yet effective configuration of the shared global workspace for interpretability consisting of: i) an object-centric-based architecture for extracting semantic concepts from input features, ii) a cross-attention mechanism between the learned concept and input embeddings, and iii) standard classification and additional explanation losses to allow human analysts to directly assess an explanation for the model's classification reasoning. We test our approach against other existing concept-based methods on classification tasks for various datasets, including CIFAR100 (super-classes), CUB-200-2011 (bird species), and ImageNet, and we show that our model achieves better classification accuracy than all selected methods across all problems but also generates more consistent concept-based explanations of classification output.Comment: 21 pages, 9 tables, 13 figure

arXiv.org e-Print Archive

Recommended from our members

End-to-end deep reinforcement learning in computer systems

Author: Schaarschmidt Michael
Publication venue: University of Cambridge
Publication date: 13/04/2020
Field of study

Abstract The growing complexity of data processing systems has long led systems designers to imagine systems (e.g. databases, schedulers) which can self-configure and adapt based on environmental cues. In this context, reinforcement learning (RL) methods have since their inception appealed to systems developers. They promise to acquire complex decision policies from raw feedback signals. Despite their conceptual popularity, RL methods are scarcely found in real-world data processing systems. Recently, RL has seen explosive growth in interest due to high profile successes when utilising large neural networks (deep reinforcement learning). Newly emerging machine learning frameworks and powerful hardware accelerators have given rise to a plethora of new potential applications. In this dissertation, I first argue that in order to design and execute deep RL algorithms efficiently, novel software abstractions are required which can accommodate the distinct computational patterns of communication-intensive and fast-evolving algorithms. I propose an architecture which decouples logical algorithm construction from local and distributed execution semantics. I further present RLgraph, my proof-of-concept implementation of this architecture. In RLgraph, algorithm developers can explore novel designs by constructing a high-level data flow graph through combination of logical components. This dataflow graph is independent of specific backend frameworks or notions of execution, and is only later mapped to execution semantics via a staged build process. RLgraph enables high-performing algorithm implementations while maintaining flexibility for rapid prototyping. Second, I investigate reasons for the scarcity of RL applications in systems themselves. I argue that progress in applied RL is hindered by a lack of tools for task model design which bridge the gap between systems and algorithms, and also by missing shared standards for evaluation of model capabilities. I introduce Wield, a first-of-its-kind tool for incremental model design in applied RL. Wield provides a small set of primitives which decouple systems interfaces and deployment-specific configuration from representation. Core to Wield is a novel instructive experiment protocol called progressive randomisation which helps practitioners to incrementally evaluate different dimensions of non-determinism. I demonstrate how Wield and progressive randomisation can be used to reproduce and assess prior work, and to guide implementation of novel RL applications

Apollo (Cambridge)

Recommended from our members

Redundancy reduction in motor control

Author: Johnson Leif Morgan
Publication venue
Publication date: 21/01/2016
Field of study

Research in machine learning and neuroscience has made remarkable progress by investigating statistical redundancy in representations of natural environments, but to date much of this work has focused on sensory information like images and sounds. This dissertation explores the notions of redundancy and efficiency in the motor domain, where several different forms of independence exist. The dissertation begins by discussing redundancy at a conceptual level and presents relevant background material. Next, three main branches of original research are described. The first branch consists of a novel control framework for integrating low-bandwidth sensory updates with model uncertainty and action selection for navigating complex, multi-task environments. The second branch of research applies existing machine learning techniques to movement information and explores the mismatch between these methods for extracting independent components and the forms of redundancy that exist in the motor domain. The third branch of work analyzes full-body, goal-directed reaching movements gathered in a novel laboratory experiment, using explicitly measured information about the goal of each movement to uncover patterns in the movement dynamics. Each branch of research explores redundancy reduction in movement from a different perspective, building up a sort of catalog of the types of information present in movements. Redundancy is discussed throughout as an an important aspect of movement in the natural world. The dissertation concludes by summarizing the contributions of these three branches of work, and discussing promising areas for future work spurred by these investigations. More detailed models of voluntary movements hold promise not only for better treatments, improved prosthetics, smoother animations, and more fluid robots, but also as an avenue for scientific insight into the very foundations of cognition.Computer Science

Texas ScholarWorks

Label-efficient learning of LiDAR-based perception models for autonomous driving

Author: Bernardo Magina Madureira Palha de Araújo
Publication venue
Publication date: 22/07/2021
Field of study

Deep learning on 3D LiDAR point clouds is in its infancy stages, with room to grow and improve, especially in the context of automated driving systems. A considerable amount of research has been pointed at this particular application very lately as a means to boost the performance and reliability of self-driving cars. However, the quantity of data needed to supervise perception point cloud-based models is extremely large and costly to annotate. This thesis studies, evaluates and compares state-of-the-art detection networks and label efficient learning techniques, shedding some light on how to train perception models on point clouds with less annotated data

Repositório Aberto da Universidade do Porto

Visual and linguistic processes in deep neural networks:A cognitive perspective

Author: Takmaz E.K.
Publication venue
Publication date: 01/01/2024
Field of study

When people describe an image, there are complex visual and linguistic processes at work. For instance, speakers tend to look at an object right before mentioning it, but not every time. Similarly, during a conversation, speakers can refer to an entity multiple times, using expressions evolving in the common ground. In this thesis, I develop computational models of such visual and linguistic processes, drawing inspiration from theories and findings from cognitive science and psycholinguistics. This work, where I aim to capture the intricate relationship between non-linguistic modalities and language within deep artificial neural networks, contributes to the line of research into multimodal Natural Language Processing. This thesis consists of two parts: (1) modeling human gaze in language use (production and comprehension), and (2) modeling communication strategies in referential tasks in visually grounded dialogue. In the first part, I delve into enhancing image description generation models using eye-tracking data; evaluating the variation in human signals while describing images; and predicting human reading behavior in the form of eye movements. In the second part, I build models quantifying, generating, resolving, and adapting utterances in referential tasks situated within visual and conversational contexts. The outcomes advance our understanding of human visuo-linguistic processes by revealing intricate strategies at play in such processes, and point to the importance of accounting for them when developing and utilizing multimodal models. The findings shed light on how the advancements in artificial intelligence could contribute to advancing the research on crossmodal processes in humans and vice versa

International Migration, Integration and Social Cohesion online publications

Visual and linguistic processes in deep neural networks:A cognitive perspective

Author: Takmaz E.K.
Publication venue
Publication date: 01/01/2024
Field of study

International Migration, Integration and Social Cohesion online publications