10,348 research outputs found

    Geodesic Distance Histogram Feature for Video Segmentation

    Full text link
    This paper proposes a geodesic-distance-based feature that encodes global information for improved video segmentation algorithms. The feature is a joint histogram of intensity and geodesic distances, where the geodesic distances are computed as the shortest paths between superpixels via their boundaries. We also incorporate adaptive voting weights and spatial pyramid configurations to include spatial information into the geodesic histogram feature and show that this further improves results. The feature is generic and can be used as part of various algorithms. In experiments, we test the geodesic histogram feature by incorporating it into two existing video segmentation frameworks. This leads to significantly better performance in 3D video segmentation benchmarks on two datasets

    A Survey on Metric Learning for Feature Vectors and Structured Data

    Full text link
    The need for appropriate ways to measure the distance or similarity between data is ubiquitous in machine learning, pattern recognition and data mining, but handcrafting such good metrics for specific problems is generally difficult. This has led to the emergence of metric learning, which aims at automatically learning a metric from data and has attracted a lot of interest in machine learning and related fields for the past ten years. This survey paper proposes a systematic review of the metric learning literature, highlighting the pros and cons of each approach. We pay particular attention to Mahalanobis distance metric learning, a well-studied and successful framework, but additionally present a wide range of methods that have recently emerged as powerful alternatives, including nonlinear metric learning, similarity learning and local metric learning. Recent trends and extensions, such as semi-supervised metric learning, metric learning for histogram data and the derivation of generalization guarantees, are also covered. Finally, this survey addresses metric learning for structured data, in particular edit distance learning, and attempts to give an overview of the remaining challenges in metric learning for the years to come.Comment: Technical report, 59 pages. Changes in v2: fixed typos and improved presentation. Changes in v3: fixed typos. Changes in v4: fixed typos and new method

    Multi-Sensor Event Detection using Shape Histograms

    Full text link
    Vehicular sensor data consists of multiple time-series arising from a number of sensors. Using such multi-sensor data we would like to detect occurrences of specific events that vehicles encounter, e.g., corresponding to particular maneuvers that a vehicle makes or conditions that it encounters. Events are characterized by similar waveform patterns re-appearing within one or more sensors. Further such patterns can be of variable duration. In this work, we propose a method for detecting such events in time-series data using a novel feature descriptor motivated by similar ideas in image processing. We define the shape histogram: a constant dimension descriptor that nevertheless captures patterns of variable duration. We demonstrate the efficacy of using shape histograms as features to detect events in an SVM-based, multi-sensor, supervised learning scenario, i.e., multiple time-series are used to detect an event. We present results on real-life vehicular sensor data and show that our technique performs better than available pattern detection implementations on our data, and that it can also be used to combine features from multiple sensors resulting in better accuracy than using any single sensor. Since previous work on pattern detection in time-series has been in the single series context, we also present results using our technique on multiple standard time-series datasets and show that it is the most versatile in terms of how it ranks compared to other published results

    A 3D descriptor to detect task-oriented grasping points in clothing

    Get PDF
    © 2016. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/Manipulating textile objects with a robot is a challenging task, especially because the garment perception is difficult due to the endless configurations it can adopt, coupled with a large variety of colors and designs. Most current approaches follow a multiple re-grasp strategy, in which clothes are sequentially grasped from different points until one of them yields a recognizable configuration. In this work we propose a method that combines 3D and appearance information to directly select a suitable grasping point for the task at hand, which in our case consists of hanging a shirt or a polo shirt from a hook. Our method follows a coarse-to-fine approach in which, first, the collar of the garment is detected and, next, a grasping point on the lapel is chosen using a novel 3D descriptor. In contrast to current 3D descriptors, ours can run in real time, even when it needs to be densely computed over the input image. Our central idea is to take advantage of the structured nature of range images that most depth sensors provide and, by exploiting integral imaging, achieve speed-ups of two orders of magnitude with respect to competing approaches, while maintaining performance. This makes it especially adequate for robotic applications as we thoroughly demonstrate in the experimental section.Peer ReviewedPostprint (author's final draft

    Learning Temporal Alignment Uncertainty for Efficient Event Detection

    Full text link
    In this paper we tackle the problem of efficient video event detection. We argue that linear detection functions should be preferred in this regard due to their scalability and efficiency during estimation and evaluation. A popular approach in this regard is to represent a sequence using a bag of words (BOW) representation due to its: (i) fixed dimensionality irrespective of the sequence length, and (ii) its ability to compactly model the statistics in the sequence. A drawback to the BOW representation, however, is the intrinsic destruction of the temporal ordering information. In this paper we propose a new representation that leverages the uncertainty in relative temporal alignments between pairs of sequences while not destroying temporal ordering. Our representation, like BOW, is of a fixed dimensionality making it easily integrated with a linear detection function. Extensive experiments on CK+, 6DMG, and UvA-NEMO databases show significant performance improvements across both isolated and continuous event detection tasks.Comment: Appeared in DICTA 2015, 8 page
    • …
    corecore