1,737 research outputs found

    Visual servoing by partitioning degrees of freedom

    Get PDF
    There are many design factors and choices when mounting a vision system for robot control. Such factors may include the kinematic and dynamic characteristics in the robot's degrees of freedom (DOF), which determine what velocities and fields-of-view a camera can achieve. Another factor is that additional motion components (such as pan-tilt units) are often mounted on a robot and introduce synchronization problems. When a task does not require visually servoing every robot DOF, the designer must choose which ones to servo. Questions then arise as to what roles, if any, do the remaining DOF play in the task. Without an analytical framework, the designer resorts to intuition and try-and-see implementations. This paper presents a frequency-based framework that identifies the parameters that factor into tracking. This framework gives design insight which was then used to synthesize a control law that exploits the kinematic and dynamic attributes of each DOF. The resulting multi-input multi-output control law, which we call partitioning, defines an underlying joint coupling to servo camera motions. The net effect is that by employing both visual and kinematic feedback loops, a robot can quickly position and orient a camera in a large assembly workcell. Real-time experiments tracking people and robot hands are presented using a 5-DOF hybrid (3-DOF Cartesian gantry plus 2-DOF pan-tilt unit) robot

    A Magnetic Actuated Fully Insertable Robotic Camera System for Single Incision Laparoscopic Surgery

    Get PDF
    Minimally Invasive Surgery (MIS) is a common surgical procedure which makes tiny incisions in the patients anatomy, inserting surgical instruments and using laparoscopic cameras to guide the procedure. Compared with traditional open surgery, MIS allows surgeons to perform complex surgeries with reduced trauma to the muscles and soft tissues, less intraoperative hemorrhaging and postoperative pain, and faster recovery time. Surgeons rely heavily on laparoscopic cameras for hand-eye coordination and control during a procedure. However, the use of a standard laparoscopic camera, achieved by pushing long sticks into a dedicated small opening, involves multiple incisions for the surgical instruments. Recently, single incision laparoscopic surgery (SILS) and natural orifice translumenal endoscopic surgery (NOTES) have been introduced to reduce or even eliminate the number of incisions. However, the shared use of a single incision or a natural orifice for both surgical instruments and laparoscopic cameras further reduces dexterity in manipulating instruments and laparoscopic cameras with low efficient visual feedback. In this dissertation, an innovative actuation mechanism design is proposed for laparoscopic cameras that can be navigated, anchored and orientated wirelessly with a single rigid body to improve surgical procedures, especially for SILS. This design eliminates the need for an articulated design and the integrated motors to significantly reduce the size of the camera. The design features a unified mechanism for anchoring, navigating, and rotating a fully insertable camera by externally generated rotational magnetic field. The key component and innovation of the robotic camera is the magnetic driving unit, which is referred to as a rotor, driven externally by a specially designed magnetic stator. The rotor, with permanent magnets (PMs) embedded in a capsulated camera, can be magnetically coupled to a stator placed externally against or close to a dermal surface. The external stator, which consists of PMs and coils, generates 3D rotational magnetic field that thereby produces torque to rotate the rotor for desired camera orientation, and force to serve as an anchoring system that keeps the camera steady during a surgical procedure. Experimental assessments have been implemented to evaluate the performance of the camera system

    A robust and efficient video representation for action recognition

    Get PDF
    This paper introduces a state-of-the-art video representation and applies it to efficient action recognition and detection. We first propose to improve the popular dense trajectory features by explicit camera motion estimation. More specifically, we extract feature point matches between frames using SURF descriptors and dense optical flow. The matches are used to estimate a homography with RANSAC. To improve the robustness of homography estimation, a human detector is employed to remove outlier matches from the human body as human motion is not constrained by the camera. Trajectories consistent with the homography are considered as due to camera motion, and thus removed. We also use the homography to cancel out camera motion from the optical flow. This results in significant improvement on motion-based HOF and MBH descriptors. We further explore the recent Fisher vector as an alternative feature encoding approach to the standard bag-of-words histogram, and consider different ways to include spatial layout information in these encodings. We present a large and varied set of evaluations, considering (i) classification of short basic actions on six datasets, (ii) localization of such actions in feature-length movies, and (iii) large-scale recognition of complex events. We find that our improved trajectory features significantly outperform previous dense trajectories, and that Fisher vectors are superior to bag-of-words encodings for video recognition tasks. In all three tasks, we show substantial improvements over the state-of-the-art results

    Towards A Self-calibrating Video Camera Network For Content Analysis And Forensics

    Get PDF
    Due to growing security concerns, video surveillance and monitoring has received an immense attention from both federal agencies and private firms. The main concern is that a single camera, even if allowed to rotate or translate, is not sufficient to cover a large area for video surveillance. A more general solution with wide range of applications is to allow the deployed cameras to have a non-overlapping field of view (FoV) and to, if possible, allow these cameras to move freely in 3D space. This thesis addresses the issue of how cameras in such a network can be calibrated and how the network as a whole can be calibrated, such that each camera as a unit in the network is aware of its orientation with respect to all the other cameras in the network. Different types of cameras might be present in a multiple camera network and novel techniques are presented for efficient calibration of these cameras. Specifically: (i) For a stationary camera, we derive new constraints on the Image of the Absolute Conic (IAC). These new constraints are shown to be intrinsic to IAC; (ii) For a scene where object shadows are cast on a ground plane, we track the shadows on the ground plane cast by at least two unknown stationary points, and utilize the tracked shadow positions to compute the horizon line and hence compute the camera intrinsic and extrinsic parameters; (iii) A novel solution to a scenario where a camera is observing pedestrians is presented. The uniqueness of formulation lies in recognizing two harmonic homologies present in the geometry obtained by observing pedestrians; (iv) For a freely moving camera, a novel practical method is proposed for its self-calibration which even allows it to change its internal parameters by zooming; and (v) due to the increased application of the pan-tilt-zoom (PTZ) cameras, a technique is presented that uses only two images to estimate five camera parameters. For an automatically configurable multi-camera network, having non-overlapping field of view and possibly containing moving cameras, a practical framework is proposed that determines the geometry of such a dynamic camera network. It is shown that only one automatically computed vanishing point and a line lying on any plane orthogonal to the vertical direction is sufficient to infer the geometry of a dynamic network. Our method generalizes previous work which considers restricted camera motions. Using minimal assumptions, we are able to successfully demonstrate promising results on synthetic as well as on real data. Applications to path modeling, GPS coordinate estimation, and configuring mixed-reality environment are explored

    Action Recognition with Improved Trajectories

    Get PDF
    International audienceRecently dense trajectories were shown to be an efficient video representation for action recognition and achieved state-of-the-art results on a variety of datasets. This paper improves their performance by taking into account camera motion to correct them. To estimate camera motion, we match feature points between frames using SURF descriptors and dense optical flow, which are shown to be complementary. These matches are, then, used to robustly estimate a homography with RANSAC. Human motion is in general different from camera motion and generates inconsistent matches. To improve the estimation, a human detector is employed to remove these matches. Given the estimated camera motion, we remove trajectories consistent with it. We also use this estimation to cancel out camera motion from the optical flow. This significantly improves motion-based descriptors, such as HOF and MBH. Experimental results on four challenging action datasets (i.e., Hollywood2, HMDB51, Olympic Sports and UCF50) significantly outperform the current state of the art

    Neuromorphic Visual Odometry with Resonator Networks

    Full text link
    Autonomous agents require self-localization to navigate in unknown environments. They can use Visual Odometry (VO) to estimate self-motion and localize themselves using visual sensors. This motion-estimation strategy is not compromised by drift as inertial sensors or slippage as wheel encoders. However, VO with conventional cameras is computationally demanding, limiting its application in systems with strict low-latency, -memory, and -energy requirements. Using event-based cameras and neuromorphic computing hardware offers a promising low-power solution to the VO problem. However, conventional algorithms for VO are not readily convertible to neuromorphic hardware. In this work, we present a VO algorithm built entirely of neuronal building blocks suitable for neuromorphic implementation. The building blocks are groups of neurons representing vectors in the computational framework of Vector Symbolic Architecture (VSA) which was proposed as an abstraction layer to program neuromorphic hardware. The VO network we propose generates and stores a working memory of the presented visual environment. It updates this working memory while at the same time estimating the changing location and orientation of the camera. We demonstrate how VSA can be leveraged as a computing paradigm for neuromorphic robotics. Moreover, our results represent an important step towards using neuromorphic computing hardware for fast and power-efficient VO and the related task of simultaneous localization and mapping (SLAM). We validate this approach experimentally in a simple robotic task and with an event-based dataset, demonstrating state-of-the-art performance in these settings.Comment: 14 pages, 5 figures, minor change

    Development of An In Vivo Robotic Camera for Dexterous Manipulation and Clear Imaging

    Get PDF
    Minimally invasive surgeriy (MIS) techniques are becoming more popular as replacements for traditional open surgeries. These methods benefit patients with lowering blood loss and post-operative pain, reducing recovery period and hospital stay time, decreasing surgical area scarring and cosmetic issues, and lessening the treatment costs, hence greater patient satisfaction would be earned. Manipulating surgical instruments from outside of abdomen and performing surgery needs precise hand-eye coordination which is provided by insertable cameras. The traditional MIS insertable cameras suffer from port complexity and reduced manipulation dexterity, which leads to defection in Hand-eye coordination and surgical flow. Fully insertable robotic camera systems emerged as a promising solution in MIS. Implementing robotic camera systems faces multiple challenges in fixation, manipulation, orientation control, tool-tissue interaction, in vivo illumination and clear imaging.In this dissertation a novel actuation and control mechanism is developed and validated for an insertable laparoscopic camera. This design uses permanent magnets and coils as force/torque generators in an external control unit to manipulate an in vivo camera capsule. The motorless design of this capsule reduces the, wight, size and power consumption of the driven unit. In order to guarantee the smooth motion of the camera inside the abdominal cavity, an interaction force control method was proposed and validated.Optimizing the system\u27s design, through minimizing the control unit size and power consumption and extending maneuverability of insertable camera, was achieved by a novel transformable design, which uses a single permanent magnet in the control unit. The camera robot uses a permanent magnet as fixation and translation unit, and two embedded motor for tilt motion actuation, as well as illumination actuation. Transformable design provides superior imaging quality through an optimized illumination unit and a cleaning module. The illumination module uses freeform optical lenses to control light beams from the LEDs to achieve optimized illumination over surgical zone. The cleaning module prevents lens contamination through a pump actuated debris prevention system, while mechanically wipes the lens in case of contamination. The performance of transformable design and its modules have been assessed experimentally

    WATCHING PEOPLE: ALGORITHMS TO STUDY HUMAN MOTION AND ACTIVITIES

    Get PDF
    Nowadays human motion analysis is one of the most active research topics in Computer Vision and it is receiving an increasing attention from both the industrial and scientific communities. The growing interest in human motion analysis is motivated by the increasing number of promising applications, ranging from surveillance, human–computer interaction, virtual reality to healthcare, sports, computer games and video conferencing, just to name a few. The aim of this thesis is to give an overview of the various tasks involved in visual motion analysis of the human body and to present the issues and possible solutions related to it. In this thesis, visual motion analysis is categorized into three major areas related to the interpretation of human motion: tracking of human motion using virtual pan-tilt-zoom (vPTZ) camera, recognition of human motions and human behaviors segmentation. In the field of human motion tracking, a virtual environment for PTZ cameras (vPTZ) is presented to overcame the mechanical limitations of PTZ cameras. The vPTZ is built on equirectangular images acquired by 360° cameras and it allows not only the development of pedestrian tracking algorithms but also the comparison of their performances. On the basis of this virtual environment, three novel pedestrian tracking algorithms for 360° cameras were developed, two of which adopt a tracking-by-detection approach while the last adopts a Bayesian approach. The action recognition problem is addressed by an algorithm that represents actions in terms of multinomial distributions of frequent sequential patterns of different length. Frequent sequential patterns are series of data descriptors that occur many times in the data. The proposed method learns a codebook of frequent sequential patterns by means of an apriori-like algorithm. An action is then represented with a Bag-of-Frequent-Sequential-Patterns approach. In the last part of this thesis a methodology to semi-automatically annotate behavioral data given a small set of manually annotated data is presented. The resulting methodology is not only effective in the semi-automated annotation task but can also be used in presence of abnormal behaviors, as demonstrated empirically by testing the system on data collected from children affected by neuro-developmental disorders
    • …
    corecore