168 research outputs found

    Benchmarking and Error Diagnosis in Multi-Instance Pose Estimation

    Get PDF
    We propose a new method to analyze the impact of errors in algorithms for multi-instance pose estimation and a principled benchmark that can be used to compare them. We define and characterize three classes of errors - localization, scoring, and background - study how they are influenced by instance attributes and their impact on an algorithm's performance. Our technique is applied to compare the two leading methods for human pose estimation on the COCO Dataset, measure the sensitivity of pose estimation with respect to instance size, type and number of visible keypoints, clutter due to multiple instances, and the relative score of instances. The performance of algorithms, and the types of error they make, are highly dependent on all these variables, but mostly on the number of keypoints and the clutter. The analysis and software tools we propose offer a novel and insightful approach for understanding the behavior of pose estimation algorithms and an effective method for measuring their strengths and weaknesses.Comment: Project page available at http://www.vision.caltech.edu/~mronchi/projects/PoseErrorDiagnosis/; Code available at https://github.com/matteorr/coco-analyze; published at ICCV 1

    Describing Common Human Visual Actions in Images

    Get PDF
    Which common human actions and interactions are recognizable in monocular still images? Which involve objects and/or other people? How many is a person performing at a time? We address these questions by exploring the actions and interactions that are detectable in the images of the MS COCO dataset. We make two main contributions. First, a list of 140 common `visual actions', obtained by analyzing the largest on-line verb lexicon currently available for English (VerbNet) and human sentences used to describe images in MS COCO. Second, a complete set of annotations for those `visual actions', composed of subject-object and associated verb, which we call COCO-a (a for `actions'). COCO-a is larger than existing action datasets in terms of number of actions and instances of these actions, and is unique because it is data-driven, rather than experimenter-biased. Other unique features are that it is exhaustive, and that all subjects and objects are localized. A statistical analysis of the accuracy of our annotations and of each action, interaction and subject-object combination is provided

    It's all Relative: Monocular 3D Human Pose Estimation from Weakly Supervised Data

    Get PDF
    We address the problem of 3D human pose estimation from 2D input images using only weakly supervised training data. Despite showing considerable success for 2D pose estimation, the application of supervised machine learning to 3D pose estimation in real world images is currently hampered by the lack of varied training images with corresponding 3D poses. Most existing 3D pose estimation algorithms train on data that has either been collected in carefully controlled studio settings or has been generated synthetically. Instead, we take a different approach, and propose a 3D human pose estimation algorithm that only requires relative estimates of depth at training time. Such training signal, although noisy, can be easily collected from crowd annotators, and is of sufficient quality for enabling successful training and evaluation of 3D pose algorithms. Our results are competitive with fully supervised regression based approaches on the Human3.6M dataset, despite using significantly weaker training data. Our proposed algorithm opens the door to using existing widespread 2D datasets for 3D pose estimation by allowing fine-tuning with noisy relative constraints, resulting in more accurate 3D poses.Comment: BMVC 2018. Project page available at http://www.vision.caltech.edu/~mronchi/projects/RelativePos

    It's all Relative: Monocular 3D Human Pose Estimation from Weakly Supervised Data

    Get PDF
    We address the problem of 3D human pose estimation from 2D input images using only weakly supervised training data. Despite showing considerable success for 2D pose estimation, the application of supervised machine learning to 3D pose estimation in real world images is currently hampered by the lack of varied training images with associated 3D poses. Existing 3D pose estimation algorithms train on data that has either been collected in carefully controlled studio settings or has been generated synthetically. Instead, we take a different approach, and propose a 3D human pose estimation algorithm that only requires relative estimates of depth at training time. Such training signal, although noisy, can be easily collected from crowd annotators, and is of sufficient quality for enabling successful training and evaluation of 3D pose. Our results are competitive with fully supervised regression based approaches on the Human3.6M dataset, despite using significantly weaker training data. Our proposed approach opens the door to using existing widespread 2D datasets for 3D pose estimation by allowing fine-tuning with noisy relative constraints, resulting in more accurate 3D poses

    Velocity-space sensitivity of the time-of-flight neutron spectrometer at JET

    Get PDF
    The velocity-space sensitivities of fast-ion diagnostics are often described by so-called weight functions. Recently, we formulated weight functions showing the velocity-space sensitivity of the often dominant beam-target part of neutron energy spectra. These weight functions for neutron emission spectrometry (NES) are independent of the particular NES diagnostic. Here we apply these NES weight functions to the time-of-flight spectrometer TOFOR at JET. By taking the instrumental response function of TOFOR into account, we calculate time-of-flight NES weight functions that enable us to directly determine the velocity-space sensitivity of a given part of a measured time-of-flight spectrum from TOFOR

    Relationship of edge localized mode burst times with divertor flux loop signal phase in JET

    Get PDF
    A phase relationship is identified between sequential edge localized modes (ELMs) occurrence times in a set of H-mode tokamak plasmas to the voltage measured in full flux azimuthal loops in the divertor region. We focus on plasmas in the Joint European Torus where a steady H-mode is sustained over several seconds, during which ELMs are observed in the Be II emission at the divertor. The ELMs analysed arise from intrinsic ELMing, in that there is no deliberate intent to control the ELMing process by external means. We use ELM timings derived from the Be II signal to perform direct time domain analysis of the full flux loop VLD2 and VLD3 signals, which provide a high cadence global measurement proportional to the voltage induced by changes in poloidal magnetic flux. Specifically, we examine how the time interval between pairs of successive ELMs is linked to the time-evolving phase of the full flux loop signals. Each ELM produces a clear early pulse in the full flux loop signals, whose peak time is used to condition our analysis. The arrival time of the following ELM, relative to this pulse, is found to fall into one of two categories: (i) prompt ELMs, which are directly paced by the initial response seen in the flux loop signals; and (ii) all other ELMs, which occur after the initial response of the full flux loop signals has decayed in amplitude. The times at which ELMs in category (ii) occur, relative to the first ELM of the pair, are clustered at times when the instantaneous phase of the full flux loop signal is close to its value at the time of the first ELM

    COCO-a

    No full text
    A large and well annotated dataset of actions on the current best image dataset for visual recognition, with rich annotations including all the actions performed by each person in the dataset, and the people and objects that are involved in each action, subject's posture and emotion, and high level visual cues such as mutual position and distance.Related Publication: Describing Common Human Visual Actions in Images Ronchi, Matteo Ruggero Caltech Perona, Pietro Caltech Proceedings of the British Machine Vision Conference (BMVC) 2015-06-07 https://doi.org/10.48550/arXiv.1506.02203 en

    Parametric transitions between bare and vegetated states in water-driven patterns

    No full text
    Conditions for vegetation spreading and pattern formation are mathematically framed through an analysis encompassing three fundamental processes: flow stochasticity, vegetation dynamics, and sediment transport. Flow unsteadiness is included through Poisson stochastic processes whereby vegetation dynamics appears as a secondary instability, which is addressed by Floquet theory. Results show that the model captures the physical conditions heralding the transition between bare and vegetated fluvial states where the nonlinear formation and growth of finite alternate bars are accounted for by Center Manifold Projection. This paves the way to understand changes in biogeomorphological styles induced by man in the Anthropocene and of natural origin since the Paleozoic (Devonian plant hypothesis)

    Caltech Multi-Distance Portraits (CMDP)

    No full text
    A dataset of frontal portraits of 53 individuals spanning a number of attributes (sex, age, race, hair), each photographed from seven distances.Related Publication: Distance Estimation of an Unknown Person from a Portrait Burgos-Artizzu, Xavier Ronchi, Matteo Ruggero Perona, Pietro Proceedings of the European Conference on Computer Vision (ECCV) 2014-09-02 https://doi.org/10.1007/978-3-319-10590-1_21 en
    corecore