2,473 research outputs found

    RGB-D datasets using microsoft kinect or similar sensors: a survey

    Get PDF
    RGB-D data has turned out to be a very useful representation of an indoor scene for solving fundamental computer vision problems. It takes the advantages of the color image that provides appearance information of an object and also the depth image that is immune to the variations in color, illumination, rotation angle and scale. With the invention of the low-cost Microsoft Kinect sensor, which was initially used for gaming and later became a popular device for computer vision, high quality RGB-D data can be acquired easily. In recent years, more and more RGB-D image/video datasets dedicated to various applications have become available, which are of great importance to benchmark the state-of-the-art. In this paper, we systematically survey popular RGB-D datasets for different applications including object recognition, scene classification, hand gesture recognition, 3D-simultaneous localization and mapping, and pose estimation. We provide the insights into the characteristics of each important dataset, and compare the popularity and the difficulty of those datasets. Overall, the main goal of this survey is to give a comprehensive description about the available RGB-D datasets and thus to guide researchers in the selection of suitable datasets for evaluating their algorithms

    Estimating animal pose using deep learning a trained deep learning model outperforms morphological analysis

    Get PDF
    INTRODUCTION: Analyzing animal behavior helps researchers understand their decision-making process and helper tools are rapidly becoming an indispensable part of many interdisciplinary studies. However, researchers are often challenged to estimate animal pose because of the limitation of the tools and its vulnerability to a specific environment. Over the years, deep learning has been introduced as an alternative solution to overcome these challenges. OBJECTIVES: This study investigates how deep learning models can be applied for the accurate prediction of animal behavior, comparing with traditional morphological analysis based on image pixels. METHODS: Transparent Omnidirectional Locomotion Compensator (TOLC), a tracking device, is used to record videos with a wide range of animal behavior. Recorded videos contain two insects: a walking red imported fire ant (Solenopsis invicta) and a walking fruit fly (Drosophila melanogaster). Body parts such as the head, legs, and thorax, are estimated by using an open-source deep-learning toolbox. A deep learning model, ResNet-50, is trained to predict the body parts of the fire ant and the fruit fly respectively. 500 image frames for each insect were annotated by humans and then compared with the predictions of the deep learning model as well as the points generated from the morphological analysis. RESULTS: The experimental results show that the average distance between the deep learning-predicted centroids and the human-annotated centroids is 2.54, while the average distance between the morphological analysis-generated centroids and the human-annotated centroids is 6.41 over the 500 frames of the fire ant. For the fruit fly, the average distance of the centroids between the deep learning- predicted and the human-annotated is 2.43, while the average distance of the centroids between the morphological analysis-generated and the human-annotated is 5.06 over the 477 image frames. CONCLUSION: In this paper, we demonstrate that the deep learning model outperforms traditional morphological analysis in terms of estimating animal pose in a series of video frames

    Wing and body motion during flight initiation in Drosophila revealed by automated visual tracking

    Get PDF
    The fruit fly Drosophila melanogaster is a widely used model organism in studies of genetics, developmental biology and biomechanics. One limitation for exploiting Drosophila as a model system for behavioral neurobiology is that measuring body kinematics during behavior is labor intensive and subjective. In order to quantify flight kinematics during different types of maneuvers, we have developed a visual tracking system that estimates the posture of the fly from multiple calibrated cameras. An accurate geometric fly model is designed using unit quaternions to capture complex body and wing rotations, which are automatically fitted to the images in each time frame. Our approach works across a range of flight behaviors, while also being robust to common environmental clutter. The tracking system is used in this paper to compare wing and body motion during both voluntary and escape take-offs. Using our automated algorithms, we are able to measure stroke amplitude, geometric angle of attack and other parameters important to a mechanistic understanding of flapping flight. When compared with manual tracking methods, the algorithm estimates body position within 4.4±1.3% of the body length, while body orientation is measured within 6.5±1.9 deg. (roll), 3.2±1.3 deg. (pitch) and 3.4±1.6 deg. (yaw) on average across six videos. Similarly, stroke amplitude and deviation are estimated within 3.3 deg. and 2.1 deg., while angle of attack is typically measured within 8.8 deg. comparing against a human digitizer. Using our automated tracker, we analyzed a total of eight voluntary and two escape take-offs. These sequences show that Drosophila melanogaster do not utilize clap and fling during take-off and are able to modify their wing kinematics from one wingstroke to the next. Our approach should enable biomechanists and ethologists to process much larger datasets than possible at present and, therefore, accelerate insight into the mechanisms of free-flight maneuvers of flying insects

    3D pose estimation of flying animals in multi-view video datasets

    Get PDF
    Flying animals such as bats, birds, and moths are actively studied by researchers wanting to better understand these animals’ behavior and flight characteristics. Towards this goal, multi-view videos of flying animals have been recorded both in lab- oratory conditions and natural habitats. The analysis of these videos has shifted over time from manual inspection by scientists to more automated and quantitative approaches based on computer vision algorithms. This thesis describes a study on the largely unexplored problem of 3D pose estimation of flying animals in multi-view video data. This problem has received little attention in the computer vision community where few flying animal datasets exist. Additionally, published solutions from researchers in the natural sciences have not taken full advantage of advancements in computer vision research. This thesis addresses this gap by proposing three different approaches for 3D pose estimation of flying animals in multi-view video datasets, which evolve from successful pose estimation paradigms used in computer vision. The first approach models the appearance of a flying animal with a synthetic 3D graphics model and then uses a Markov Random Field to model 3D pose estimation over time as a single optimization problem. The second approach builds on the success of Pictorial Structures models and further improves them for the case where only a sparse set of landmarks are annotated in training data. The proposed approach first discovers parts from regions of the training images that are not annotated. The discovered parts are then used to generate more accurate appearance likelihood terms which in turn produce more accurate landmark localizations. The third approach takes advantage of the success of deep learning models and adapts existing deep architectures to perform landmark localization. Both the second and third approaches perform 3D pose estimation by first obtaining accurate localization of key landmarks in individual views, and then using calibrated cameras and camera geometry to reconstruct the 3D position of key landmarks. This thesis shows that the proposed algorithms generate first-of-a-kind and leading results on real world datasets of bats and moths, respectively. Furthermore, a variety of resources are made freely available to the public to further strengthen the connection between research communities

    Systems and Methods for Behavior Detection Using 3D Tracking and Machine Learning

    Get PDF
    Systems and methods for performing behavioral detection using three-dimensional tracking and machine learning in accordance with various embodiments of the invention are disclosed. One embodiment of the invention involves a the classification application that directs a microprocessor to: identify at least a primary subject interacting with a secondary subject within a sequence of frames of image data including depth information; determine poses of the subjects; extract a set of parameters describing the poses and movement of at least the primary and secondary subjects; and detect a social behavior performed by at least the primary subject and involving at least the second subject using a classifier trained to discriminate between a plurality of social behaviors based upon the set of parameters describing poses and movement
    • …
    corecore