Search CORE

13,011 research outputs found

On Pairwise Costs for Network Flow Multi-Object Tracking

Author: Chari Visesh
Lacoste-Julien Simon
Laptev Ivan
Sivic Josef
Publication venue
Publication date: 05/05/2015
Field of study

Multi-object tracking has been recently approached with the min-cost network flow optimization techniques. Such methods simultaneously resolve multiple object tracks in a video and enable modeling of dependencies among tracks. Min-cost network flow methods also fit well within the "tracking-by-detection" paradigm where object trajectories are obtained by connecting per-frame outputs of an object detector. Object detectors, however, often fail due to occlusions and clutter in the video. To cope with such situations, we propose to add pairwise costs to the min-cost network flow framework. While integer solutions to such a problem become NP-hard, we design a convex relaxation solution with an efficient rounding heuristic which empirically gives certificates of small suboptimality. We evaluate two particular types of pairwise costs and demonstrate improvements over recent tracking methods in real-world video sequences

arXiv.org e-Print Archive

CiteSeerX

Crossref

Multiperson Tracking by Online Learned Grouping Model With Nonlinear Motion Context

Author: An Le
Bhanu Bir
Chen Xiaojing
Qin Zhen
Publication venue: eScholarship, University of California
Publication date: 01/01/2016
Field of study

Crossref

eScholarship - University of California

Augmenting Sensorimotor Control Using “Goal-Aware” Vibrotactile Stimulation during Reaching and Manipulation Behaviors

Author: Murphey Todd D.
Scheidt Robert A.
Tzorakoleftherakis Emmanouil
Publication venue: e-Publications@Marquette
Publication date: 01/08/2016
Field of study

We describe two sets of experiments that examine the ability of vibrotactile encoding of simple position error and combined object states (calculated from an optimal controller) to enhance performance of reaching and manipulation tasks in healthy human adults. The goal of the first experiment (tracking) was to follow a moving target with a cursor on a computer screen. Visual and/or vibrotactile cues were provided in this experiment, and vibrotactile feedback was redundant with visual feedback in that it did not encode any information above and beyond what was already available via vision. After only 10 minutes of practice using vibrotactile feedback to guide performance, subjects tracked the moving target with response latency and movement accuracy values approaching those observed under visually guided reaching. Unlike previous reports on multisensory enhancement, combining vibrotactile and visual feedback of performance errors conferred neither positive nor negative effects on task performance. In the second experiment (balancing), vibrotactile feedback encoded a corrective motor command as a linear combination of object states (derived from a linear-quadratic regulator implementing a trade-off between kinematic and energetic performance) to teach subjects how to balance a simulated inverted pendulum. Here, the tactile feedback signal differed from visual feedback in that it provided information that was not readily available from visual feedback alone. Immediately after applying this novel “goal-aware” vibrotactile feedback, time to failure was improved by a factor of three. Additionally, the effect of vibrotactile training persisted after the feedback was removed. These results suggest that vibrotactile encoding of appropriate combinations of state information may be an effective form of augmented sensory feedback that can be applied, among other purposes, to compensate for lost or compromised proprioception as commonly observed, for example, in stroke survivors

epublications@Marquette

Time-Contrastive Networks: Self-Supervised Learning from Video

Author: Chebotar Yevgen
Hsu Jasmine
Jang Eric
Levine Sergey
Lynch Corey
Schaal Stefan
Sermanet Pierre
Publication venue
Publication date: 19/03/2018
Field of study

We propose a self-supervised approach for learning representations and robotic behaviors entirely from unlabeled videos recorded from multiple viewpoints, and study how this representation can be used in two robotic imitation settings: imitating object interactions from videos of humans, and imitating human poses. Imitation of human behavior requires a viewpoint-invariant representation that captures the relationships between end-effectors (hands or robot grippers) and the environment, object attributes, and body pose. We train our representations using a metric learning loss, where multiple simultaneous viewpoints of the same observation are attracted in the embedding space, while being repelled from temporal neighbors which are often visually similar but functionally different. In other words, the model simultaneously learns to recognize what is common between different-looking images, and what is different between similar-looking images. This signal causes our model to discover attributes that do not change across viewpoint, but do change across time, while ignoring nuisance variables such as occlusions, motion blur, lighting and background. We demonstrate that this representation can be used by a robot to directly mimic human poses without an explicit correspondence, and that it can be used as a reward function within a reinforcement learning algorithm. While representations are learned from an unlabeled collection of task-related videos, robot behaviors such as pouring are learned by watching a single 3rd-person demonstration by a human. Reward functions obtained by following the human demonstrations under the learned representation enable efficient reinforcement learning that is practical for real-world robotic systems. Video results, open-source code and dataset are available at https://sermanet.github.io/imitat

arXiv.org e-Print Archive

Crossref