9,106 research outputs found
Children, Humanoid Robots and Caregivers
This paper presents developmental learning on a humanoid robot from human-robot interactions. We consider in particular teaching humanoids as children during the child's Separation and Individuation developmental phase (Mahler, 1979). Cognitive development during this phase is characterized both by the child's dependence on her mother for learning while becoming awareness of her own individuality, and by self-exploration of her physical surroundings. We propose a learning framework for a humanoid robot inspired on such cognitive development
Single View Human Pose Tracking
Recovery of human pose from videos has become a highly active research area in the last decade because of many attractive potential applications, such as surveillance, non-intrusive motion analysis and natural human machine interaction. Video based full body pose estimation is a very challenging task, because of the high degree of articulation of the human body, the large variety of possible human motions, and the diversity of human appearances.
Methods for tackling this problem can be roughly categorized as either discriminative or generative. Discriminative methods can work on single images, and are able to recover the human poses efficiently. However, the accuracy and generality largely depend on the training data. Generative approaches usually formulate the problem as a tracking problem and adopt an explicit human model. Although arbitrary motions can be tracked, such systems usually have difficulties in adapting to different subjects and in dealing with tracking failures.
In this thesis, an accurate, efficient and robust human pose tracking system from a single view camera is developed, mainly following a generative approach. A novel discriminative feature is also proposed and integrated into the tracking framework to improve the tracking performance.
The human pose tracking system is proposed within a particle filtering framework. A reconfigurable skeleton model is constructed based on the Acclaim Skeleton File convention. A basic particle filter is first implemented for upper body tracking, which fuses time efficient cues from monocular sequences and achieves real-time tracking for constrained motions. Next, a 3D surface model is added to the skeleton model, and a full body tracking system is developed for more general and complex motions, assuming a stereo camera input. Partitioned sampling is adopted to deal with the high dimensionality problem, and the system is capable of running in near real-time. Multiple visual cues are investigated and compared, including a newly developed explicit depth cue.
Based on the comparative analysis of cues, which reveals the importance of depth and good bottom-up features, a novel algorithm for detecting and identifying endpoint body parts from depth images is proposed. Inspired by the shape context concept, this thesis proposes a novel Local Shape Context (LSC) descriptor specifically for describing the shape features of body parts in depth images. This descriptor describes the local shape of different body parts with respect to a given reference point on a human silhouette, and is shown to be effective at detecting and classifying endpoint body parts. A new type of interest point is defined based on the LSC descriptor, and a hierarchical interest point selection algorithm is designed to further conserve computational resources. The detected endpoint body parts are then classified according to learned models based on the LSC feature. The algorithm is tested using a public dataset and achieves good accuracy with a 100Hz processing speed on a standard PC.
Finally, the LSC descriptor is improved to be more generalized. Both the endpoint body parts and the limbs are detected simultaneously. The generalized algorithm is integrated into the tracking framework, which provides a very strong cue and enables tracking failure recovery. The skeleton model is also simplified to further increase the system efficiency. To evaluate the system on arbitrary motions quantitatively, a new dataset is designed and collected using a synchronized Kinect sensor and a marker based motion capture system, including 22 different motions from 5 human subjects. The system is capable of tracking full body motions accurately using a simple skeleton-only model in near real-time on a laptop PC before optimization
Image and interpretation using artificial intelligence to read ancient Roman texts
The ink and stylus tablets discovered at the Roman Fort of Vindolanda are a unique resource for scholars of ancient history. However, the stylus tablets have proved particularly difficult to read. This paper describes a system that assists expert papyrologists in the interpretation of the Vindolanda writing tablets. A model-based approach is taken that relies on models of the written form of characters, and statistical modelling of language, to produce plausible interpretations of the documents. Fusion of the contributions from the language, character, and image feature models is achieved by utilizing the GRAVA agent architecture that uses Minimum Description Length as the basis for information fusion across semantic levels. A system is developed that reads in image data and outputs plausible interpretations of the Vindolanda tablets
Change blindness: eradication of gestalt strategies
Arrays of eight, texture-defined rectangles were used as stimuli in a one-shot change blindness (CB) task where there was a 50% chance that one rectangle would change orientation between two successive presentations separated by an interval. CB was eliminated by cueing the target rectangle in the first stimulus, reduced by cueing in the interval and unaffected by cueing in the second presentation. This supports the idea that a representation was formed that persisted through the interval before being 'overwritten' by the second presentation (Landman et al, 2003 Vision Research 43149–164]. Another possibility is that participants used some kind of grouping or Gestalt strategy. To test this we changed the spatial position of the rectangles in the second presentation by shifting them along imaginary spokes (by ±1 degree) emanating from the central fixation point. There was no significant difference seen in performance between this and the standard task [F(1,4)=2.565, p=0.185]. This may suggest two things: (i) Gestalt grouping is not used as a strategy in these tasks, and (ii) it gives further weight to the argument that objects may be stored and retrieved from a pre-attentional store during this task
The development of local solar irradiance for outdoor computer graphics rendering
Atmospheric effects are approximated by solving the light transfer equation, LTE, of a given viewing path. The resulting accumulated spectral energy (its visible band) arriving at the observer’s eyes, defines the colour of the object currently on the line of sight. Due to the convenience of using a single rendering equation to solve the LTE for daylight sky and distant objects (aerial perspective), recent methods had opt for a similar kind of approach. Alas, the burden that the real-time calculation brings to the foil had forced these methods to make simplifications that were not in line with the actual world observation. Consequently, the results of these methods are laden with visual-errors. The two most common simplifications made were: i) assuming the atmosphere as a full-scattering medium only and ii) assuming a single density atmosphere profile. This research explored the possibility of replacing the real-time calculation involved in solving the LTE with an analytical-based approach. Hence, the two simplifications made by the previous real-time methods can be avoided. The model was implemented on top of a flight simulator prototype system since the requirements of such system match the objectives of this study. Results were verified against the actual images of the daylight skies. Comparison was also made with the previous methods’ results to showcase the proposed model strengths and advantages over its peers
Automatic Pipeline Surveillance Air-Vehicle
This thesis presents the developments of a vision-based system for
aerial pipeline Right-of-Way surveillance using optical/Infrared sensors mounted
on Unmanned Aerial Vehicles (UAV). The aim of research is to develop a highly
automated, on-board system for detecting and following the pipelines; while
simultaneously detecting any third-party interference. The proposed approach
of using a UAV platform could potentially reduce the cost of monitoring and
surveying pipelines when compared to manned aircraft. The main contributions
of this thesis are the development of the image-analysis algorithms, the overall
system architecture and validation of in hardware based on scaled down Test
environment.
To evaluate the performance of the system, the algorithms were coded using
Python programming language. A small-scale test-rig of the pipeline structure,
as well as expected third-party interference, was setup to simulate the
operational environment and capture/record data for the algorithm testing and
validation.
The pipeline endpoints are identified by transforming the 16-bits depth data of
the explored environment into 3D point clouds world coordinates. Then, using
the Random Sample Consensus (RANSAC) approach, the foreground and
background are separated based on the transformed 3D point cloud to extract
the plane that corresponds to the ground. Simultaneously, the boundaries of the
explored environment are detected based on the 16-bit depth data using a
canny detector. Following that, these boundaries were filtered out, after being
transformed into a 3D point cloud, based on the real height of the pipeline for fast and accurate measurements using a Euclidean distance of each boundary
point, relative to the plane of the ground extracted previously. The filtered
boundaries were used to detect the straight lines of the object boundary (Hough
lines), once transformed into 16-bit depth data, using a Hough transform
method. The pipeline is verified by estimating a centre line segment, using a 3D
point cloud of each pair of the Hough line segments, (transformed into 3D).
Then, the corresponding linearity of the pipeline points cloud is filtered within
the width of the pipeline using Euclidean distance in the foreground point cloud.
Then, the segment length of the detected centre line is enhanced to match the
exact pipeline segment by extending it along the filtered point cloud of the
pipeline.
The third-party interference is detected based on four parameters, namely:
foreground depth data; pipeline depth data; pipeline endpoints location in the
3D point cloud; and Right-of-Way distance. The techniques include detection,
classification, and localization algorithms.
Finally, a waypoints-based navigation system was implemented for the air-
vehicle to fly over the course waypoints that were generated online by a
heading angle demand to follow the pipeline structure in real-time based on the
online identification of the pipeline endpoints relative to a camera frame
Industrial Segment Anything -- a Case Study in Aircraft Manufacturing, Intralogistics, Maintenance, Repair, and Overhaul
Deploying deep learning-based applications in specialized domains like the
aircraft production industry typically suffers from the training data
availability problem. Only a few datasets represent non-everyday objects,
situations, and tasks. Recent advantages in research around Vision Foundation
Models (VFM) opened a new area of tasks and models with high generalization
capabilities in non-semantic and semantic predictions. As recently demonstrated
by the Segment Anything Project, exploiting VFM's zero-shot capabilities is a
promising direction in tackling the boundaries spanned by data, context, and
sensor variety. Although, investigating its application within specific domains
is subject to ongoing research. This paper contributes here by surveying
applications of the SAM in aircraft production-specific use cases. We include
manufacturing, intralogistics, as well as maintenance, repair, and overhaul
processes, also representing a variety of other neighboring industrial domains.
Besides presenting the various use cases, we further discuss the injection of
domain knowledge
Internet of Underwater Things and Big Marine Data Analytics -- A Comprehensive Survey
The Internet of Underwater Things (IoUT) is an emerging communication
ecosystem developed for connecting underwater objects in maritime and
underwater environments. The IoUT technology is intricately linked with
intelligent boats and ships, smart shores and oceans, automatic marine
transportations, positioning and navigation, underwater exploration, disaster
prediction and prevention, as well as with intelligent monitoring and security.
The IoUT has an influence at various scales ranging from a small scientific
observatory, to a midsized harbor, and to covering global oceanic trade. The
network architecture of IoUT is intrinsically heterogeneous and should be
sufficiently resilient to operate in harsh environments. This creates major
challenges in terms of underwater communications, whilst relying on limited
energy resources. Additionally, the volume, velocity, and variety of data
produced by sensors, hydrophones, and cameras in IoUT is enormous, giving rise
to the concept of Big Marine Data (BMD), which has its own processing
challenges. Hence, conventional data processing techniques will falter, and
bespoke Machine Learning (ML) solutions have to be employed for automatically
learning the specific BMD behavior and features facilitating knowledge
extraction and decision support. The motivation of this paper is to
comprehensively survey the IoUT, BMD, and their synthesis. It also aims for
exploring the nexus of BMD with ML. We set out from underwater data collection
and then discuss the family of IoUT data communication techniques with an
emphasis on the state-of-the-art research challenges. We then review the suite
of ML solutions suitable for BMD handling and analytics. We treat the subject
deductively from an educational perspective, critically appraising the material
surveyed.Comment: 54 pages, 11 figures, 19 tables, IEEE Communications Surveys &
Tutorials, peer-reviewed academic journa
Learning Cooperative Dynamic Manipulation Skills from Human Demonstration Videos
This article proposes a method for learning and robotic replication of
dynamic collaborative tasks from offline videos. The objective is to extend the
concept of learning from demonstration (LfD) to dynamic scenarios, benefiting
from widely available or easily producible offline videos. To achieve this
goal, we decode important dynamic information, such as the Configuration
Dependent Stiffness (CDS), which reveals the contribution of arm pose to the
arm endpoint stiffness, from a three-dimensional human skeleton model. Next,
through encoding of the CDS via Gaussian Mixture Model (GMM) and decoding via
Gaussian Mixture Regression (GMR), the robot's Cartesian impedance profile is
estimated and replicated. We demonstrate the proposed method in a collaborative
sawing task with leader-follower structure, considering environmental
constraints and dynamic uncertainties. The experimental setup includes two
Panda robots, which replicate the leader-follower roles and the impedance
profiles extracted from a two-persons sawing video
- …