58 research outputs found

    3D pose estimation of flying animals in multi-view video datasets

    Get PDF
    Flying animals such as bats, birds, and moths are actively studied by researchers wanting to better understand these animals’ behavior and flight characteristics. Towards this goal, multi-view videos of flying animals have been recorded both in lab- oratory conditions and natural habitats. The analysis of these videos has shifted over time from manual inspection by scientists to more automated and quantitative approaches based on computer vision algorithms. This thesis describes a study on the largely unexplored problem of 3D pose estimation of flying animals in multi-view video data. This problem has received little attention in the computer vision community where few flying animal datasets exist. Additionally, published solutions from researchers in the natural sciences have not taken full advantage of advancements in computer vision research. This thesis addresses this gap by proposing three different approaches for 3D pose estimation of flying animals in multi-view video datasets, which evolve from successful pose estimation paradigms used in computer vision. The first approach models the appearance of a flying animal with a synthetic 3D graphics model and then uses a Markov Random Field to model 3D pose estimation over time as a single optimization problem. The second approach builds on the success of Pictorial Structures models and further improves them for the case where only a sparse set of landmarks are annotated in training data. The proposed approach first discovers parts from regions of the training images that are not annotated. The discovered parts are then used to generate more accurate appearance likelihood terms which in turn produce more accurate landmark localizations. The third approach takes advantage of the success of deep learning models and adapts existing deep architectures to perform landmark localization. Both the second and third approaches perform 3D pose estimation by first obtaining accurate localization of key landmarks in individual views, and then using calibrated cameras and camera geometry to reconstruct the 3D position of key landmarks. This thesis shows that the proposed algorithms generate first-of-a-kind and leading results on real world datasets of bats and moths, respectively. Furthermore, a variety of resources are made freely available to the public to further strengthen the connection between research communities

    Detecting irregularity in videos using spatiotemporal volumes.

    Get PDF
    Li, Yun.Thesis (M.Phil.)--Chinese University of Hong Kong, 2007.Includes bibliographical references (leaves 68-72).Abstracts in English and Chinese.Abstract --- p.I摘芁 --- p.IIIAcknowledgments --- p.IVList of Contents --- p.VIList of Figures --- p.VIIChapter Chapter 1 --- Introduction --- p.1Chapter 1.1 --- Visual Detection --- p.2Chapter 1.2 --- Irregularity Detection --- p.4Chapter Chapter 2 --- System Overview --- p.7Chapter 2.1 --- Definition of Irregularity --- p.7Chapter 2.2 --- Contributions --- p.8Chapter 2.3 --- Review of previous work --- p.9Chapter 2.3.1 --- Model-based Methods --- p.9Chapter 2.3.2 --- Statistical Methods --- p.11Chapter 2.4 --- System Outline --- p.14Chapter Chapter 3 --- Background Subtraction --- p.16Chapter 3.1 --- Related Work --- p.17Chapter 3.2 --- Adaptive Mixture Model --- p.18Chapter 3.2.1 --- Online Model Update --- p.20Chapter 3.2.2 --- Background Model Estimation --- p.22Chapter 3.2.3 --- Foreground Segmentation --- p.24Chapter Chapter 4 --- Feature Extraction --- p.28Chapter 4.1 --- Various Feature Descriptors --- p.29Chapter 4.2 --- Histogram of Oriented Gradients --- p.30Chapter 4.2.1 --- Feature Descriptor --- p.31Chapter 4.2.2 --- Feature Merits --- p.33Chapter 4.3 --- Subspace Analysis --- p.35Chapter 4.3.1 --- Principal Component Analysis --- p.35Chapter 4.3.2 --- Subspace Projection --- p.37Chapter Chapter 5 --- Bayesian Probabilistic Inference --- p.39Chapter 5.1 --- Estimation of PDFs --- p.40Chapter 5.1.1 --- K-Means Clustering --- p.40Chapter 5.1.2 --- Kernel Density Estimation --- p.42Chapter 5.2 --- MAP Estimation --- p.44Chapter 5.2.1 --- ML Estimation & MAP Estimation --- p.44Chapter 5.2.2 --- Detection through MAP --- p.46Chapter 5.3 --- Efficient Implementation --- p.47Chapter 5.3.1 --- K-D Trees --- p.48Chapter 5.3.2 --- Nearest Neighbor (NN) Algorithm --- p.49Chapter Chapter 6 --- Experiments and Conclusion --- p.51Chapter 6.1 --- Experiments --- p.51Chapter 6.1.1 --- Outdoor Video Surveillance - Exp. 1 --- p.52Chapter 6.1.2 --- Outdoor Video Surveillance - Exp. 2 --- p.54Chapter 6.1.3 --- Outdoor Video Surveillance - Exp. 3 --- p.56Chapter 6.1.4 --- Classroom Monitoring - Exp.4 --- p.61Chapter 6.2 --- Algorithm Evaluation --- p.64Chapter 6.3 --- Conclusion --- p.66Bibliography --- p.6

    Chasing control in male blowflies : behavioural performance and neuronal responses

    Get PDF
    Trischler C. Chasing control in male blowflies : behavioural performance and neuronal responses. Bielefeld (Germany): Bielefeld University; 2008

    Cognitive-developmental learning for a humanoid robot : a caregiver's gift

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2004.Includes bibliographical references (p. 319-341).(cont.) which are then applied to developmentally acquire new object representations. The humanoid robot therefore sees the world through the caregiver's eyes. Building an artificial humanoid robot's brain, even at an infant's cognitive level, has been a long quest which still lies only in the realm of our imagination. Our efforts towards such a dimly imaginable task are developed according to two alternate and complementary views: cognitive and developmental.The goal of this work is to build a cognitive system for the humanoid robot, Cog, that exploits human caregivers as catalysts to perceive and learn about actions, objects, scenes, people, and the robot itself. This thesis addresses a broad spectrum of machine learning problems across several categorization levels. Actions by embodied agents are used to automatically generate training data for the learning mechanisms, so that the robot develops categorization autonomously. Taking inspiration from the human brain, a framework of algorithms and methodologies was implemented to emulate different cognitive capabilities on the humanoid robot Cog. This framework is effectively applied to a collection of AI, computer vision, and signal processing problems. Cognitive capabilities of the humanoid robot are developmentally created, starting from infant-like abilities for detecting, segmenting, and recognizing percepts over multiple sensing modalities. Human caregivers provide a helping hand for communicating such information to the robot. This is done by actions that create meaningful events (by changing the world in which the robot is situated) thus inducing the "compliant perception" of objects from these human-robot interactions. Self-exploration of the world extends the robot's knowledge concerning object properties. This thesis argues for enculturating humanoid robots using infant development as a metaphor for building a humanoid robot's cognitive abilities. A human caregiver redesigns a humanoid's brain by teaching the humanoid robot as she would teach a child, using children's learning aids such as books, drawing boards, or other cognitive artifacts. Multi-modal object properties are learned using these tools and inserted into several recognition schemes,by Artur Miguel Do Amaral Arsenio.Ph.D

    NASA Tech Briefs, December 2007

    Get PDF
    Topics include: Ka-Band TWT High-Efficiency Power Combiner for High-Rate Data Transmission; Reusable, Extensible High-Level Data-Distribution Concept; Processing Satellite Imagery To Detect Waste Tire Piles; Monitoring by Use of Clusters of Sensor-Data Vectors; Circuit and Method for Communication Over DC Power Line; Switched Band-Pass Filters for Adaptive Transceivers; Noncoherent DTTLs for Symbol Synchronization; High-Voltage Power Supply With Fast Rise and Fall Times; Waveguide Calibrator for Multi-Element Probe Calibration; Four-Way Ka-Band Power Combiner; Loss-of-Control-Inhibitor Systems for Aircraft; Improved Underwater Excitation-Emission Matrix Fluorometer; Metrology Camera System Using Two-Color Interferometry; Design and Fabrication of High-Efficiency CMOS/CCD Imagers; Foam Core Shielding for Spacecraft CHEM-Based Self-Deploying Planetary Storage Tanks Sequestration of Single-Walled Carbon Nanotubes in a Polymer PPC750 Performance Monitor Application-Program-Installer Builder Using Visual Odometry to Estimate Position and Attitude Design and Data Management System Simple, Script-Based Science Processing Archive Automated Rocket Propulsion Test Management Online Remote Sensing Interface Fusing Image Data for Calculating Position of an Object Implementation of a Point Algorithm for Real-Time Convex Optimization Handling Input and Output for COAMPS Modeling and Grid Generation of Iced Airfoils Automated Identification of Nucleotide Sequences Balloon Design Software Rocket Science 101 Interactive Educational Program Creep Forming of Carbon-Reinforced Ceramic-Matrix Composites Dog-Bone Horns for Piezoelectric Ultrasonic/Sonic Actuators Benchtop Detection of Proteins Recombinant Collagenlike Proteins Remote Sensing of Parasitic Nematodes in Plants Direct Coupling From WGM Resonator Disks to Photodetectors Using Digital Radiography To Image Liquid Nitrogen in Voids Multiple-Parameter, Low-False-Alarm Fire-Detection Systems Mosaic-Detector-Based Fluorescence Spectral Imager Plasmoid Thruster for High Specific-Impulse Propulsion Analysis Method for Quantifying Vehicle Design Goals Improved Tracking of Targets by Cameras on a Mars Rover Sample Caching Subsystem Multistage Passive Cooler for Spaceborne Instruments GVIPS Models and Software Stowable Energy-Absorbing Rocker-Bogie Suspension

    Single-View 3D Reconstruction of Animals

    Get PDF
    Humans have a remarkable ability to infer the 3D shape of objects from just a single image. Even for complex and non-rigid objects like people and animals, from just a single picture we can say much about its 3D shape, configuration and even the viewpoint that the photo was taken from. Today, the same cannot be said for computers – the existing solutions are limited, particularly for highly articulated and deformable objects. Hence, the purpose of this thesis is to develop methods for single-view 3D reconstruction of non-rigid objects, specifically for people and animals. Our goal is to recover a full 3D surface model of these objects from a single unconstrained image. The ability to do so, even with some user interaction, will have a profound impact in AR/VR and the entertainment industry. Immediate applications are virtual avatars and pets, virtual clothes fitting, immersive games, as well as applications in biology, neuroscience, ecology, and farming. However, this is a challenging problem because these objects can appear in many different forms. This thesis begins by providing the first fully automatic solution for recovering a 3D mesh of a human body from a single image. Our solution follows the classical paradigm of bottom-up estimation followed by top-down verification. The key is to solve for the mostly likely 3D model that explains the image observations by using powerful priors. The rest of the thesis explores how to extend a similar approach for other animals. Doing so reveals novel challenges whose common thread is the lack of specialized data. For solving the bottom-up estimation problem well, current methods rely on the availability of human supervision in the form of 2D part annotations. However, these annotations do not exist in the same scale for animals. We deal with this problem by means of data synthesis for the case of fine-grained categories such as bird species. There is also little work that systematically addresses the 3D scanning of animals, which almost all prior works require for learning a deformable 3D model. We propose a solution to learn a 3D deformable model from a set of annotated 2D images with a template 3D mesh and from a few set of 3D toy figurine scans. We show results on birds, house cats, horses, cows, dogs, big cats, and even hippos. This thesis makes steps towards a fully automatic system for single-view 3D reconstruction of animals. We hope this work inspires more future research in this direction

    Visual Recognition and Synthesis of Human-Object Interactions

    Full text link
    The ability to perceive and understand people's actions enables humans to efficiently communicate and collaborate in society. Endowing machines with such ability is an important step for building assistive and socially-aware robots. Despite such significance, the problem poses a great challenge and the current state of the art is still nowhere close to human-level performance. This dissertation drives progress on visual action understanding in the scope of human-object interactions (HOI), a major branch of human actions that dominates our everyday life. Specifically, we address the challenges of two important tasks: visual recognition and visual synthesis. The first part of this dissertation considers the recognition task. The main bottleneck of current research is a lack of proper benchmark, since existing action datasets contain only a small number of categories with limited diversity. To this end, we set out to construct a large-scale benchmark for HOI recognition. We first tackle the problem of establishing the vocabulary for human-object interactions, by investigating a variety of automatic approaches as well as a crowdsourcing approach that collects human labeled categories. Given the vocabulary, we then construct a large-scale image dataset of human-object interactions by annotating web images through online crowdsourcing. The new "HICO" dataset surpasses prior datasets in term of both the number of images and action categories by one order of magnitude. The introduction of HICO enables us to benchmark state-of-the-art recognition approaches and also shed light on new challenges in the realm of large-scale HOI recognition. We further discover that visual features of humans, objects, as well as their spatial relations play a central role in the representation of interaction, and the combination of three can improve the recognition outcome. The second part of this dissertation considers the synthesis task, and focuses particularly on the synthesis of body motion. The central goal is: given an image of a scene, synthesize the course of an action conditioned on the observed scene. Such capability can predict possible actions afforded by the scene, and will facilitate efficient reactions in human-robot interactions. We investigate two types of synthesis tasks: semantic-driven synthesis and goal-driven synthesis. For semantic-driven synthesis, we study the forecasting of human dynamics from a static image. We propose a novel deep neural network architecture that extracts semantic information from the image and use it to predict future body movement. For goal-directed synthesis, we study the synthesis of motion defined by human-object interactions. We focus on one particular class of interactions—a person sitting onto a chair. To ensure realistic motion from physical interactions, we leverage a physics simulated environment that contains a humanoid and chair model. We propose a novel reinforcement learning framework, and show that the synthesized motion can generalize to different initial human-chair configurations. At the end of this dissertation, we also contribute a new approach to temporal action localization, an essential task in video action understanding. We address the shortcomings of prior Faster R-CNN based approaches, and show state-of-the-art performance on standard benchmarks.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/150045/1/ywchao_1.pd

    Image-set, Temporal and Spatiotemporal Representations of Videos for Recognizing, Localizing and Quantifying Actions

    Get PDF
    This dissertation addresses the problem of learning video representations, which is defined here as transforming the video so that its essential structure is made more visible or accessible for action recognition and quantification. In the literature, a video can be represented by a set of images, by modeling motion or temporal dynamics, and by a 3D graph with pixels as nodes. This dissertation contributes in proposing a set of models to localize, track, segment, recognize and assess actions such as (1) image-set models via aggregating subset features given by regularizing normalized CNNs, (2) image-set models via inter-frame principal recovery and sparsely coding residual actions, (3) temporally local models with spatially global motion estimated by robust feature matching and local motion estimated by action detection with motion model added, (4) spatiotemporal models 3D graph and 3D CNN to model time as a space dimension, (5) supervised hashing by jointly learning embedding and quantization, respectively. State-of-the-art performances are achieved for tasks such as quantifying facial pain and human diving. Primary conclusions of this dissertation are categorized as follows: (i) Image set can capture facial actions that are about collective representation; (ii) Sparse and low-rank representations can have the expression, identity and pose cues untangled and can be learned via an image-set model and also a linear model; (iii) Norm is related with recognizability; similarity metrics and loss functions matter; (v) Combining the MIL based boosting tracker with the Particle Filter motion model induces a good trade-off between the appearance similarity and motion consistence; (iv) Segmenting object locally makes it amenable to assign shape priors; it is feasible to learn knowledge such as shape priors online from Web data with weak supervision; (v) It works locally in both space and time to represent videos as 3D graphs; 3D CNNs work effectively when inputted with temporally meaningful clips; (vi) the rich labeled images or videos help to learn better hash functions after learning binary embedded codes than the random projections. In addition, models proposed for videos can be adapted to other sequential images such as volumetric medical images which are not included in this dissertation

    Developmentally deep perceptual system for a humanoid robot

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2003.Includes bibliographical references (p. 139-152).This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.This thesis presents a perceptual system for a humanoid robot that integrates abilities such as object localization and recognition with the deeper developmental machinery required to forge those competences out of raw physical experiences. It shows that a robotic platform can build up and maintain a system for object localization, segmentation, and recognition, starting from very little. What the robot starts with is a direct solution to achieving figure/ground separation: it simply 'pokes around' in a region of visual ambiguity and watches what happens. If the arm passes through an area, that area is recognized as free space. If the arm collides with an object, causing it to move, the robot can use that motion to segment the object from the background. Once the robot can acquire reliable segmented views of objects, it learns from them, and from then on recognizes and segments those objects without further contact. Both low-level and high-level visual features can also be learned in this way, and examples are presented for both: orientation detection and affordance recognition, respectively. The motivation for this work is simple. Training on large corpora of annotated real-world data has proven crucial for creating robust solutions to perceptual problems such as speech recognition and face detection. But the powerful tools used during training of such systems are typically stripped away at deployment. Ideally they should remain, particularly for unstable tasks such as object detection, where the set of objects needed in a task tomorrow might be different from the set of objects needed today. The key limiting factor is access to training data, but as this thesis shows, that need not be a problem on a robotic platform that can actively probe its environment, and carry out experiments to resolve ambiguity.(cont.) This work is an instance of a general approach to learning a new perceptual judgment: find special situations in which the perceptual judgment is easy and study these situations to find correlated features that can be observed more generally.by Paul Michael Fitzpatrick.Ph.D
    • 

    corecore