168 research outputs found
Benchmarking and Error Diagnosis in Multi-Instance Pose Estimation
We propose a new method to analyze the impact of errors in algorithms for
multi-instance pose estimation and a principled benchmark that can be used to
compare them. We define and characterize three classes of errors -
localization, scoring, and background - study how they are influenced by
instance attributes and their impact on an algorithm's performance. Our
technique is applied to compare the two leading methods for human pose
estimation on the COCO Dataset, measure the sensitivity of pose estimation with
respect to instance size, type and number of visible keypoints, clutter due to
multiple instances, and the relative score of instances. The performance of
algorithms, and the types of error they make, are highly dependent on all these
variables, but mostly on the number of keypoints and the clutter. The analysis
and software tools we propose offer a novel and insightful approach for
understanding the behavior of pose estimation algorithms and an effective
method for measuring their strengths and weaknesses.Comment: Project page available at
http://www.vision.caltech.edu/~mronchi/projects/PoseErrorDiagnosis/; Code
available at https://github.com/matteorr/coco-analyze; published at ICCV 1
Describing Common Human Visual Actions in Images
Which common human actions and interactions are recognizable in monocular
still images? Which involve objects and/or other people? How many is a person
performing at a time? We address these questions by exploring the actions and
interactions that are detectable in the images of the MS COCO dataset. We make
two main contributions. First, a list of 140 common `visual actions', obtained
by analyzing the largest on-line verb lexicon currently available for English
(VerbNet) and human sentences used to describe images in MS COCO. Second, a
complete set of annotations for those `visual actions', composed of
subject-object and associated verb, which we call COCO-a (a for `actions').
COCO-a is larger than existing action datasets in terms of number of actions
and instances of these actions, and is unique because it is data-driven, rather
than experimenter-biased. Other unique features are that it is exhaustive, and
that all subjects and objects are localized. A statistical analysis of the
accuracy of our annotations and of each action, interaction and subject-object
combination is provided
It's all Relative: Monocular 3D Human Pose Estimation from Weakly Supervised Data
We address the problem of 3D human pose estimation from 2D input images using
only weakly supervised training data. Despite showing considerable success for
2D pose estimation, the application of supervised machine learning to 3D pose
estimation in real world images is currently hampered by the lack of varied
training images with corresponding 3D poses. Most existing 3D pose estimation
algorithms train on data that has either been collected in carefully controlled
studio settings or has been generated synthetically. Instead, we take a
different approach, and propose a 3D human pose estimation algorithm that only
requires relative estimates of depth at training time. Such training signal,
although noisy, can be easily collected from crowd annotators, and is of
sufficient quality for enabling successful training and evaluation of 3D pose
algorithms. Our results are competitive with fully supervised regression based
approaches on the Human3.6M dataset, despite using significantly weaker
training data. Our proposed algorithm opens the door to using existing
widespread 2D datasets for 3D pose estimation by allowing fine-tuning with
noisy relative constraints, resulting in more accurate 3D poses.Comment: BMVC 2018. Project page available at
http://www.vision.caltech.edu/~mronchi/projects/RelativePos
It's all Relative: Monocular 3D Human Pose Estimation from Weakly Supervised Data
We address the problem of 3D human pose estimation from 2D input images using only weakly supervised training data. Despite showing considerable success for 2D pose estimation, the application of supervised machine learning to 3D pose estimation in real world images is currently hampered by the lack of varied training images with associated 3D poses. Existing 3D pose estimation
algorithms train on data that has either been collected in carefully controlled
studio settings or has been generated synthetically. Instead, we take a
different approach, and propose a 3D human pose estimation algorithm that only requires relative estimates of depth at training time. Such training signal,
although noisy, can be easily collected from crowd annotators, and is of
sufficient quality for enabling successful training and evaluation of 3D pose.
Our results are competitive with fully supervised regression based approaches
on the Human3.6M dataset, despite using significantly weaker training data. Our proposed approach opens the door to using existing widespread 2D datasets for 3D pose estimation by allowing fine-tuning with noisy relative constraints, resulting in more accurate 3D poses
Velocity-space sensitivity of the time-of-flight neutron spectrometer at JET
The velocity-space sensitivities of fast-ion diagnostics are often described by so-called weight functions. Recently, we formulated weight functions showing the velocity-space sensitivity of the often dominant beam-target part of neutron energy spectra. These weight functions for neutron emission spectrometry (NES) are independent of the particular NES diagnostic. Here we apply these NES weight functions to the time-of-flight spectrometer TOFOR at JET. By taking the instrumental response function of TOFOR into account, we calculate time-of-flight NES weight functions that enable us to directly determine the velocity-space sensitivity of a given part of a measured time-of-flight spectrum from TOFOR
Relationship of edge localized mode burst times with divertor flux loop signal phase in JET
A phase relationship is identified between sequential edge localized modes (ELMs) occurrence times in a set of H-mode tokamak plasmas to the voltage measured in full flux azimuthal loops in the divertor region. We focus on plasmas in the Joint European Torus where a steady H-mode is sustained over several seconds, during which ELMs are observed in the Be II emission at the divertor. The ELMs analysed arise from intrinsic ELMing, in that there is no deliberate intent to control the ELMing process by external means. We use ELM timings derived from the Be II signal to perform direct time domain analysis of the full flux loop VLD2 and VLD3 signals, which provide a high cadence global measurement proportional to the voltage induced by changes in poloidal magnetic flux. Specifically, we examine how the time interval between pairs of successive ELMs is linked to the time-evolving phase of the full flux loop signals. Each ELM produces a clear early pulse in the full flux loop signals, whose peak time is used to condition our analysis. The arrival time of the following ELM, relative to this pulse, is found to fall into one of two categories: (i) prompt ELMs, which are directly paced by the initial response seen in the flux loop signals; and (ii) all other ELMs, which occur after the initial response of the full flux loop signals has decayed in amplitude. The times at which ELMs in category (ii) occur, relative to the first ELM of the pair, are clustered at times when the instantaneous phase of the full flux loop signal is close to its value at the time of the first ELM
COCO-a
A large and well annotated dataset of actions on the current best image dataset for visual recognition, with rich annotations including all the actions performed by each person in the dataset, and the people and objects that are involved in each action, subject's posture and emotion, and high level visual cues such as mutual position and distance.Related Publication:
Describing Common Human Visual Actions in Images
Ronchi, Matteo Ruggero Caltech
Perona, Pietro Caltech
Proceedings of the British Machine Vision Conference (BMVC)
2015-06-07
https://doi.org/10.48550/arXiv.1506.02203
en
Parametric transitions between bare and vegetated states in water-driven patterns
Conditions for vegetation spreading and pattern formation are mathematically framed through an analysis encompassing three fundamental processes: flow stochasticity, vegetation dynamics, and sediment transport. Flow unsteadiness is included through Poisson stochastic processes whereby vegetation dynamics appears as a secondary instability, which is addressed by Floquet theory. Results show that the model captures the physical conditions heralding the transition between bare and vegetated fluvial states where the nonlinear formation and growth of finite alternate bars are accounted for by Center Manifold Projection. This paves the way to understand changes in biogeomorphological styles induced by man in the Anthropocene and of natural origin since the Paleozoic (Devonian plant hypothesis)
Caltech Multi-Distance Portraits (CMDP)
A dataset of frontal portraits of 53 individuals spanning a number of attributes (sex, age, race, hair), each photographed from seven distances.Related Publication:
Distance Estimation of an Unknown Person from a Portrait
Burgos-Artizzu, Xavier
Ronchi, Matteo Ruggero
Perona, Pietro
Proceedings of the European Conference on Computer Vision (ECCV)
2014-09-02
https://doi.org/10.1007/978-3-319-10590-1_21
en
- …