8,098 research outputs found

    3D Face tracking and gaze estimation using a monocular camera

    Get PDF
    Estimating a user’s gaze direction, one of the main novel user interaction technologies, will eventually be used for numerous applications where current methods are becoming less effective. In this paper, a new method is presented for estimating the gaze direction using Canonical Correlation Analysis (CCA), which finds a linear relationship between two datasets defining the face pose and the corresponding facial appearance changes. Afterwards, iris tracking is performed by blob detection using a 4-connected component labeling algorithm. Finally, a gaze vector is calculated based on gathered eye properties. Results obtained from datasets and real-time input confirm the robustness of this metho

    F-formation Detection: Individuating Free-standing Conversational Groups in Images

    Full text link
    Detection of groups of interacting people is a very interesting and useful task in many modern technologies, with application fields spanning from video-surveillance to social robotics. In this paper we first furnish a rigorous definition of group considering the background of the social sciences: this allows us to specify many kinds of group, so far neglected in the Computer Vision literature. On top of this taxonomy, we present a detailed state of the art on the group detection algorithms. Then, as a main contribution, we present a brand new method for the automatic detection of groups in still images, which is based on a graph-cuts framework for clustering individuals; in particular we are able to codify in a computational sense the sociological definition of F-formation, that is very useful to encode a group having only proxemic information: position and orientation of people. We call the proposed method Graph-Cuts for F-formation (GCFF). We show how GCFF definitely outperforms all the state of the art methods in terms of different accuracy measures (some of them are brand new), demonstrating also a strong robustness to noise and versatility in recognizing groups of various cardinality.Comment: 32 pages, submitted to PLOS On

    SIGS: Synthetic Imagery Generating Software for the development and evaluation of vision-based sense-and-avoid systems

    Get PDF
    Unmanned Aerial Systems (UASs) have recently become a versatile platform for many civilian applications including inspection, surveillance and mapping. Sense-and-Avoid systems are essential for the autonomous safe operation of these systems in non-segregated airspaces. Vision-based Sense-and-Avoid systems are preferred to other alternatives as their price, physical dimensions and weight are more suitable for small and medium-sized UASs, but obtaining real flight imagery of potential collision scenarios is hard and dangerous, which complicates the development of Vision-based detection and tracking algorithms. For this purpose, user-friendly software for synthetic imagery generation has been developed, allowing to blend user-defined flight imagery of a simulated aircraft with real flight scenario images to produce realistic images with ground truth annotations. These are extremely useful for the development and benchmarking of Vision-based detection and tracking algorithms at a much lower cost and risk. An image processing algorithm has also been developed for automatic detection of the occlusions caused by certain parts of the UAV which carries the camera. The detected occlusions can later be used by our software to simulate the occlusions due to the UAV that would appear in a real flight with the same camera setup. Additionally this algorithm could be used to mask out pixels which do not contain relevant information of the scene for the visual detection, making the image search process more efficient. Finally an application example of the imagery obtained with our software for the benchmarking of a state-of-art visual tracker is presented

    Vision-based techniques for gait recognition

    Full text link
    Global security concerns have raised a proliferation of video surveillance devices. Intelligent surveillance systems seek to discover possible threats automatically and raise alerts. Being able to identify the surveyed object can help determine its threat level. The current generation of devices provide digital video data to be analysed for time varying features to assist in the identification process. Commonly, people queue up to access a facility and approach a video camera in full frontal view. In this environment, a variety of biometrics are available - for example, gait which includes temporal features like stride period. Gait can be measured unobtrusively at a distance. The video data will also include face features, which are short-range biometrics. In this way, one can combine biometrics naturally using one set of data. In this paper we survey current techniques of gait recognition and modelling with the environment in which the research was conducted. We also discuss in detail the issues arising from deriving gait data, such as perspective and occlusion effects, together with the associated computer vision challenges of reliable tracking of human movement. Then, after highlighting these issues and challenges related to gait processing, we proceed to discuss the frameworks combining gait with other biometrics. We then provide motivations for a novel paradigm in biometrics-based human recognition, i.e. the use of the fronto-normal view of gait as a far-range biometrics combined with biometrics operating at a near distance

    How Does the Cerebral Cortex Work? Developement, Learning, Attention, and 3D Vision by Laminar Circuits of Visual Cortex

    Full text link
    A key goal of behavioral and cognitive neuroscience is to link brain mechanisms to behavioral functions. The present article describes recent progress towards explaining how the visual cortex sees. Visual cortex, like many parts of perceptual and cognitive neocortex, is organized into six main layers of cells, as well as characteristic sub-lamina. Here it is proposed how these layered circuits help to realize the processes of developement, learning, perceptual grouping, attention, and 3D vision through a combination of bottom-up, horizontal, and top-down interactions. A key theme is that the mechanisms which enable developement and learning to occur in a stable way imply properties of adult behavior. These results thus begin to unify three fields: infant cortical developement, adult cortical neurophysiology and anatomy, and adult visual perception. The identified cortical mechanisms promise to generalize to explain how other perceptual and cognitive processes work.Air Force Office of Scientific Research (F49620-01-1-0397); Office of Naval Research (N00014-01-1-0624

    Change blindness: eradication of gestalt strategies

    Get PDF
    Arrays of eight, texture-defined rectangles were used as stimuli in a one-shot change blindness (CB) task where there was a 50% chance that one rectangle would change orientation between two successive presentations separated by an interval. CB was eliminated by cueing the target rectangle in the first stimulus, reduced by cueing in the interval and unaffected by cueing in the second presentation. This supports the idea that a representation was formed that persisted through the interval before being 'overwritten' by the second presentation (Landman et al, 2003 Vision Research 43149–164]. Another possibility is that participants used some kind of grouping or Gestalt strategy. To test this we changed the spatial position of the rectangles in the second presentation by shifting them along imaginary spokes (by ±1 degree) emanating from the central fixation point. There was no significant difference seen in performance between this and the standard task [F(1,4)=2.565, p=0.185]. This may suggest two things: (i) Gestalt grouping is not used as a strategy in these tasks, and (ii) it gives further weight to the argument that objects may be stored and retrieved from a pre-attentional store during this task

    Cooperative Virtual Sensor for Fault Detection and Identification in Multi-UAV Applications

    Get PDF
    This paper considers the problem of fault detection and identification (FDI) in applications carried out by a group of unmanned aerial vehicles (UAVs) with visual cameras. In many cases, the UAVs have cameras mounted onboard for other applications, and these cameras can be used as bearing-only sensors to estimate the relative orientation of another UAV. The idea is to exploit the redundant information provided by these sensors onboard each of the UAVs to increase safety and reliability, detecting faults on UAV internal sensors that cannot be detected by the UAVs themselves. Fault detection is based on the generation of residuals which compare the expected position of a UAV, considered as target, with the measurements taken by one or more UAVs acting as observers that are tracking the target UAV with their cameras. Depending on the available number of observers and the way they are used, a set of strategies and policies for fault detection are defined. When the target UAV is being visually tracked by two or more observers, it is possible to obtain an estimation of its 3D position that could replace damaged sensors. Accuracy and reliability of this vision-based cooperative virtual sensor (CVS) have been evaluated experimentally in a multivehicle indoor testbed with quadrotors, injecting faults on data to validate the proposed fault detection methods.Comisión Europea H2020 644271Comisión Europea FP7 288082Ministerio de Economia, Industria y Competitividad DPI2015-71524-RMinisterio de Economia, Industria y Competitividad DPI2014-5983-C2-1-RMinisterio de Educación, Cultura y Deporte FP

    Learning the dynamics and time-recursive boundary detection of deformable objects

    Get PDF
    We propose a principled framework for recursively segmenting deformable objects across a sequence of frames. We demonstrate the usefulness of this method on left ventricular segmentation across a cardiac cycle. The approach involves a technique for learning the system dynamics together with methods of particle-based smoothing as well as non-parametric belief propagation on a loopy graphical model capturing the temporal periodicity of the heart. The dynamic system state is a low-dimensional representation of the boundary, and the boundary estimation involves incorporating curve evolution into recursive state estimation. By formulating the problem as one of state estimation, the segmentation at each particular time is based not only on the data observed at that instant, but also on predictions based on past and future boundary estimates. Although the paper focuses on left ventricle segmentation, the method generalizes to temporally segmenting any deformable object

    Accuracy assessment of Tri-plane B-mode ultrasound for non-invasive 3D kinematic analysis of knee joints

    Get PDF
    BACKGROUND Currently the clinical standard for measuring the motion of the bones in knee joints with sufficient precision involves implanting tantalum beads into the bones. These beads appear as high intensity features in radiographs and can be used for precise kinematic measurements. This procedure imposes a strong coupling between accuracy and invasiveness. In this paper, a tri-plane B-mode ultrasound (US) based non-invasive approach is proposed for use in kinematic analysis of knee joints in 3D space. METHODS The 3D analysis is performed using image processing procedures on the 2D US slices. The novelty of the proposed procedure and its applicability to the unconstrained 3D kinematic analysis of knee joints is outlined. An error analysis for establishing the method's feasibility is included for different artificial compositions of a knee joint phantom. Some in-vivo and in-vitro scans are presented to demonstrate that US scans reveal enough anatomical details, which further supports the experimental setup used using knee bone phantoms. RESULTS The error between the displacements measured by the registration of the US image slices and the true displacements of the respective slices measured using the precision mechanical stages on the experimental apparatus is evaluated for translation and rotation in two simulated environments. The mean and standard deviation of errors are shown in tabular form. This method provides an average measurement precision of less than 0.1 mm and 0.1 degrees, respectively. CONCLUSION In this paper, we have presented a novel non-invasive approach to measuring the motion of the bones in a knee using tri-plane B-mode ultrasound and image registration. In our study, the image registration method determines the position of bony landmarks relative to a B-mode ultrasound sensor array with sub-pixel accuracy. The advantages of our proposed system over previous techniques are that it is non-invasive, does not require the use of ionizing radiation and can be used conveniently if miniaturized.This work has been supported by School of Engineering & IT, UNSW Canberra, under Research Publication Fellowship

    A system for recognizing human emotions based on speech analysis and facial feature extraction: applications to Human-Robot Interaction

    Get PDF
    With the advance in Artificial Intelligence, humanoid robots start to interact with ordinary people based on the growing understanding of psychological processes. Accumulating evidences in Human Robot Interaction (HRI) suggest that researches are focusing on making an emotional communication between human and robot for creating a social perception, cognition, desired interaction and sensation. Furthermore, robots need to receive human emotion and optimize their behavior to help and interact with a human being in various environments. The most natural way to recognize basic emotions is extracting sets of features from human speech, facial expression and body gesture. A system for recognition of emotions based on speech analysis and facial features extraction can have interesting applications in Human-Robot Interaction. Thus, the Human-Robot Interaction ontology explains how the knowledge of these fundamental sciences is applied in physics (sound analyses), mathematics (face detection and perception), philosophy theory (behavior) and robotic science context. In this project, we carry out a study to recognize basic emotions (sadness, surprise, happiness, anger, fear and disgust). Also, we propose a methodology and a software program for classification of emotions based on speech analysis and facial features extraction. The speech analysis phase attempted to investigate the appropriateness of using acoustic (pitch value, pitch peak, pitch range, intensity and formant), phonetic (speech rate) properties of emotive speech with the freeware program PRAAT, and consists of generating and analyzing a graph of speech signals. The proposed architecture investigated the appropriateness of analyzing emotive speech with the minimal use of signal processing algorithms. 30 participants to the experiment had to repeat five sentences in English (with durations typically between 0.40 s and 2.5 s) in order to extract data relative to pitch (value, range and peak) and rising-falling intonation. Pitch alignments (peak, value and range) have been evaluated and the results have been compared with intensity and speech rate. The facial feature extraction phase uses the mathematical formulation (B\ue9zier curves) and the geometric analysis of the facial image, based on measurements of a set of Action Units (AUs) for classifying the emotion. The proposed technique consists of three steps: (i) detecting the facial region within the image, (ii) extracting and classifying the facial features, (iii) recognizing the emotion. Then, the new data have been merged with reference data in order to recognize the basic emotion. Finally, we combined the two proposed algorithms (speech analysis and facial expression), in order to design a hybrid technique for emotion recognition. Such technique have been implemented in a software program, which can be employed in Human-Robot Interaction. The efficiency of the methodology was evaluated by experimental tests on 30 individuals (15 female and 15 male, 20 to 48 years old) form different ethnic groups, namely: (i) Ten adult European, (ii) Ten Asian (Middle East) adult and (iii) Ten adult American. Eventually, the proposed technique made possible to recognize the basic emotion in most of the cases
    corecore