41 research outputs found
PlaNet-ClothPick: Effective Fabric Flattening Based on Latent Dynamic Planning
Why do Recurrent State Space Models such as PlaNet fail at cloth manipulation
tasks? Recent work has attributed this to the blurry prediction of the
observation, which makes it difficult to plan directly in the latent space.
This paper explores the reasons behind this by applying PlaNet in the
pick-and-place fabric-flattening domain. We find that the sharp discontinuity
of the transition function on the contour of the fabric makes it difficult to
learn an accurate latent dynamic model, causing the MPC planner to produce pick
actions slightly outside of the article. By limiting picking space on the cloth
mask and training on specially engineered trajectories, our mesh-free
PlaNet-ClothPick surpasses visual planning and policy learning methods on
principal metrics in simulation, achieving similar performance as
state-of-the-art mesh-based planning approaches. Notably, our model exhibits a
faster action inference and requires fewer transitional model parameters than
the state-of-the-art robotic systems in this domain. Other supplementary
materials are available at: https://sites.google.com/view/planet-clothpick.Comment: 12 pages, 2 tables, and 14 figures. It has been accepted to The 2024
16th IEEE/SICE International Symposium on System Integration, Ha Long,
Vietnam 8-11th January, 202
Supervisor recommendation tool for Computer Science projects
In most Computer Science programmes, students are required to undertake an individual project under the guidance of a supervisor during their studies. With increasing student numbers, matching students to suitable supervisors is becoming an increasing challenge. This paper presents a software tool which assists Computer Science students in identifying the most suitable supervisor for their final year project. It does this by matching a list of keywords or a project proposal provided by the students to a list of keywords which were automatically extracted from freely available data for each potential supervisor. The tool was evaluated using both manual and user testing, with generally positive results and user feedback. 83% of respondents agree that the current implementation of the tool is accurate, with 67% saying it would be a useful tool to have when looking for a supervisor. The tool is currently being adapted for wider use in the School.Postprin
Texture features for object salience
Although texture is important for many vision-related tasks, it is not used in most salience models. As a consequence, there are images where all existing salience algorithms fail. We introduce a novel set of texture features built on top of a fast model of complex cells in striate cortex, i.e., visual area V1. The texture at each position is characterised by the two-dimensional local power spectrum obtained from Gabor filters which are tuned to many scales and orientations. We then apply a parametric model and describe the local spectrum by the combination of two one-dimensional Gaussian approximations: the scale and orientation distributions. The scale distribution indicates whether the texture has a dominant frequency and what frequency it is. Likewise, the orientation distribution attests the degree of anisotropy. We evaluate the features in combination with the state-of-the-art VOCUS2 salience algorithm. We found that using our novel texture features in addition to colour improves AUC by 3.8% on the PASCAL-S dataset when compared to the colour-only baseline, and by 62% on a novel texture-based dataset. (C) 2017 Elsevier B.V. All rights reserved.EU [ICT-2009.2.1-270247
Fast and accurate multi-scale keypoints based on end-stopped cells
Increasingly more applications in computer vision employ interest points. Algorithms like SIFT and
SURF are all based on partial derivatives of images smoothed with Gaussian filter kemels. These
algorithrns are fast and therefore very popular
Fast cortical keypoints for real-time object recognition
Best-performing object recognition algorithms employ a large number features extracted on a dense grid, so they are too slow for real-time and active vision. In this paper we present
a fast cortical keypoint detector for extracting meaningful points from images. It is competitive with state-of-the-art
detectors and particularly well-suited for tasks such as object recognition. We show that by using these points we can
achieve state-of-the-art categorization results in a fraction of the time required by competing algorithms
Phase-differencing in stereo vision: solving the localisation problem
Complex Gabor filters with phases in quadrature are often used to model even- and odd-symmetric simple cells in the primary visual cortex. In stereo vision, the phase difference between the responses of the left and right views can be used to construct a disparity or depth map. Various constraints can be applied in order to construct smooth maps, but this leads to very imprecise depth transitions. In this theoretical paper we show, by using lines and edges as image primitives, the origin of the localisation problem. We also argue that disparity should be attributed to lines and edges, rather than trying to construct a 3D surface map in cortical area V1. We derive allowable translation ranges which yield correct disparity estimates, both for left-view centered vision and for cyclopean vision
A biological and real-time framework for hand gestures and head poses
Human-robot interaction is an interdisciplinary research area that aims at the development of social robots. Since social robots are expected to interact with humans and understand their behavior through gestures and body movements, cognitive psychology and robot technology must be integrated. In this paper we present a biological and real-time framework for detecting and tracking hands and heads. This framework is based on keypoints extracted by means of cortical V1 end-stopped cells. Detected keypoints and the cells’ responses are used to classify the junction type. Through the combination of annotated keypoints in a hierarchical, multi-scale tree structure, moving and deformable hands can be segregated and tracked over time. By using hand templates with lines and edges at only a few scales, a hand’s gestures can be recognized. Head tracking and pose detection are also implemented, which can be integrated with detection of facial expressions in the future. Through the combinations of head poses and hand gestures a large number of commands can be given to a robot
Multi-scale cortical keypoints for realtime hand tracking and gesture recognition
Human-robot interaction is an interdisciplinary
research area which aims at integrating human factors, cognitive
psychology and robot technology. The ultimate goal is
the development of social robots. These robots are expected to
work in human environments, and to understand behavior of
persons through gestures and body movements. In this paper
we present a biological and realtime framework for detecting
and tracking hands. This framework is based on keypoints
extracted from cortical V1 end-stopped cells. Detected keypoints
and the cells’ responses are used to classify the junction type.
By combining annotated keypoints in a hierarchical, multi-scale
tree structure, moving and deformable hands can be segregated,
their movements can be obtained, and they can be tracked over
time. By using hand templates with keypoints at only two scales,
a hand’s gestures can be recognized
A disparity energy model improved by line, edge and keypoint correspondences
Disparity energy models (DEMs) estimate local depth information on the basis ofVl complex cells. Our
recent DEM (Martins et al, 2011 ISSPlT261-266) employs a population code. Once the population's
cells have been trained with randorn-dot stereograms, it is applied at all retinotopic positions in the visual
field. Despite producing good results in textured regions, the model needs to be made more precise,
especially at depth transitions