382 research outputs found

    Visual Object Tracking: The Initialisation Problem

    Get PDF
    Model initialisation is an important component of object tracking. Tracking algorithms are generally provided with the first frame of a sequence and a bounding box (BB) indicating the location of the object. This BB may contain a large number of background pixels in addition to the object and can lead to parts-based tracking algorithms initialising their object models in background regions of the BB. In this paper, we tackle this as a missing labels problem, marking pixels sufficiently away from the BB as belonging to the background and learning the labels of the unknown pixels. Three techniques, One-Class SVM (OC-SVM), Sampled-Based Background Model (SBBM) (a novel background model based on pixel samples), and Learning Based Digital Matting (LBDM), are adapted to the problem. These are evaluated with leave-one-video-out cross-validation on the VOT2016 tracking benchmark. Our evaluation shows both OC-SVMs and SBBM are capable of providing a good level of segmentation accuracy but are too parameter-dependent to be used in real-world scenarios. We show that LBDM achieves significantly increased performance with parameters selected by cross validation and we show that it is robust to parameter variation.Comment: 15th Conference on Computer and Robot Vision (CRV 2018). Source code available at https://github.com/georgedeath/initialisation-proble

    Learning by correlation for computer vision applications: from Kernel methods to deep learning

    Get PDF
    Learning to spot analogies and differences within/across visual categories is an arguably powerful approach in machine learning and pattern recognition which is directly inspired by human cognition. In this thesis, we investigate a variety of approaches which are primarily driven by correlation and tackle several computer vision applications

    Cyclist Detection, Tracking, and Trajectory Analysis in Urban Traffic Video Data

    Full text link
    The major objective of this thesis work is examining computer vision and machine learning detection methods, tracking algorithms and trajectory analysis for cyclists in traffic video data and developing an efficient system for cyclist counting. Due to the growing number of cyclist accidents on urban roads, methods for collecting information on cyclists are of significant importance to the Department of Transportation. The collected information provides insights into solving critical problems related to transportation planning, implementing safety countermeasures, and managing traffic flow efficiently. Intelligent Transportation System (ITS) employs automated tools to collect traffic information from traffic video data. In comparison to other road users, such as cars and pedestrians, the automated cyclist data collection is relatively a new research area. In this work, a vision-based method for gathering cyclist count data at intersections and road segments is developed. First, we develop methodology for an efficient detection and tracking of cyclists. The combination of classification features along with motion based properties are evaluated to detect cyclists in the test video data. A Convolutional Neural Network (CNN) based detector called You Only Look Once (YOLO) is implemented to increase the detection accuracy. In the next step, the detection results are fed into a tracker which is implemented based on the Kernelized Correlation Filters (KCF) which in cooperation with the bipartite graph matching algorithm allows to track multiple cyclists, concurrently. Then, a trajectory rebuilding method and a trajectory comparison model are applied to refine the accuracy of tracking and counting. The trajectory comparison is performed based on semantic similarity approach. The proposed counting method is the first cyclist counting method that has the ability to count cyclists under different movement patterns. The trajectory data obtained can be further utilized for cyclist behavioral modeling and safety analysis

    Local learning by partitioning

    Full text link
    In many machine learning applications data is assumed to be locally simple, where examples near each other have similar characteristics such as class labels or regression responses. Our goal is to exploit this assumption to construct locally simple yet globally complex systems that improve performance or reduce the cost of common machine learning tasks. To this end, we address three main problems: discovering and separating local non-linear structure in high-dimensional data, learning low-complexity local systems to improve performance of risk-based learning tasks, and exploiting local similarity to reduce the test-time cost of learning algorithms. First, we develop a structure-based similarity metric, where low-dimensional non-linear structure is captured by solving a non-linear, low-rank representation problem. We show that this problem can be kernelized, has a closed-form solution, naturally separates independent manifolds, and is robust to noise. Experimental results indicate that incorporating this structural similarity in well-studied problems such as clustering, anomaly detection, and classification improves performance. Next, we address the problem of local learning, where a partitioning function divides the feature space into regions where independent functions are applied. We focus on the problem of local linear classification using linear partitioning and local decision functions. Under an alternating minimization scheme, learning the partitioning functions can be reduced to solving a weighted supervised learning problem. We then present a novel reformulation that yields a globally convex surrogate, allowing for efficient, joint training of the partitioning functions and local classifiers. We then examine the problem of learning under test-time budgets, where acquiring sensors (features) for each example during test-time has a cost. Our goal is to partition the space into regions, with only a small subset of sensors needed in each region, reducing the average number of sensors required per example. Starting with a cascade structure and expanding to binary trees, we formulate this problem as an empirical risk minimization and construct an upper-bounding surrogate that allows for sequential decision functions to be trained jointly by solving a linear program. Finally, we present preliminary work extending the notion of test-time budgets to the problem of adaptive privacy

    Approximation and Relaxation Approaches for Parallel and Distributed Machine Learning

    Get PDF
    Large scale machine learning requires tradeoffs. Commonly this tradeoff has led practitioners to choose simpler, less powerful models, e.g. linear models, in order to process more training examples in a limited time. In this work, we introduce parallelism to the training of non-linear models by leveraging a different tradeoff--approximation. We demonstrate various techniques by which non-linear models can be made amenable to larger data sets and significantly more training parallelism by strategically introducing approximation in certain optimization steps. For gradient boosted regression tree ensembles, we replace precise selection of tree splits with a coarse-grained, approximate split selection, yielding both faster sequential training and a significant increase in parallelism, in the distributed setting in particular. For metric learning with nearest neighbor classification, rather than explicitly train a neighborhood structure we leverage the implicit neighborhood structure induced by task-specific random forest classifiers, yielding a highly parallel method for metric learning. For support vector machines, we follow existing work to learn a reduced basis set with extremely high parallelism, particularly on GPUs, via existing linear algebra libraries. We believe these optimization tradeoffs are widely applicable wherever machine learning is put in practice in large scale settings. By carefully introducing approximation, we also introduce significantly higher parallelism and consequently can process more training examples for more iterations than competing exact methods. While seemingly learning the model with less precision, this tradeoff often yields noticeably higher accuracy under a restricted training time budget

    Kernelized Supervised Dictionary Learning

    Get PDF
    The representation of a signal using a learned dictionary instead of predefined operators, such as wavelets, has led to state-of-the-art results in various applications such as denoising, texture analysis, and face recognition. The area of dictionary learning is closely associated with sparse representation, which means that the signal is represented using few atoms in the dictionary. Despite recent advances in the computation of a dictionary using fast algorithms such as K-SVD, online learning, and cyclic coordinate descent, which make the computation of a dictionary from millions of data samples computationally feasible, the dictionary is mainly computed using unsupervised approaches such as k-means. These approaches learn the dictionary by minimizing the reconstruction error without taking into account the category information, which is not optimal in classification tasks. In this thesis, we propose a supervised dictionary learning (SDL) approach by incorporating information on class labels into the learning of the dictionary. To this end, we propose to learn the dictionary in a space where the dependency between the signals and their corresponding labels is maximized. To maximize this dependency, the recently-introduced Hilbert Schmidt independence criterion (HSIC) is used. The learned dictionary is compact and has closed form; the proposed approach is fast. We show that it outperforms other unsupervised and supervised dictionary learning approaches in the literature on real-world data. Moreover, the proposed SDL approach has as its main advantage that it can be easily kernelized, particularly by incorporating a data-driven kernel such as a compression-based kernel, into the formulation. In this thesis, we propose a novel compression-based (dis)similarity measure. The proposed measure utilizes a 2D MPEG-1 encoder, which takes into consideration the spatial locality and connectivity of pixels in the images. The proposed formulation has been carefully designed based on MPEG encoder functionality. To this end, by design, it solely uses P-frame coding to find the (dis)similarity among patches/images. We show that the proposed measure works properly on both small and large patch sizes on textures. Experimental results show that by incorporating the proposed measure as a kernel into our SDL, it significantly improves the performance of a supervised pixel-based texture classification on Brodatz and outdoor images compared to other compression-based dissimilarity measures, as well as state-of-the-art SDL methods. It also improves the computation speed by about 40% compared to its closest rival. Eventually, we have extended the proposed SDL to multiview learning, where more than one representation is available on a dataset. We propose two different multiview approaches: one fusing the feature sets in the original space and then learning the dictionary and sparse coefficients on the fused set; and the other by learning one dictionary and the corresponding coefficients in each view separately, and then fusing the representations in the space of the dictionaries learned. We will show that the proposed multiview approaches benefit from the complementary information in multiple views, and investigate the relative performance of these approaches in the application of emotion recognition

    Learning vector quantization for proximity data

    Get PDF
    Hofmann D. Learning vector quantization for proximity data. Bielefeld: Universität Bielefeld; 2016.Prototype-based classifiers such as learning vector quantization (LVQ) often display intuitive and flexible classification and learning rules. However, classical techniques are restricted to vectorial data only, and hence not suited for more complex data structures. Therefore, a few extensions of diverse LVQ variants to more general data which are characterized based on pairwise similarities or dissimilarities only have been proposed recently in the literature. In this contribution, we propose a novel extension of LVQ to similarity data which is based on the kernelization of an underlying probabilistic model: kernel robust soft LVQ (KRSLVQ). Relying on the notion of a pseudo-Euclidean embedding of proximity data, we put this specific approach as well as existing alternatives into a general framework which characterizes different fundamental possibilities how to extend LVQ towards proximity data: the main characteristics are given by the choice of the cost function, the interface to the data in terms of similarities or dissimilarities, and the way in which optimization takes place. In particular the latter strategy highlights the difference of popular kernel approaches versus so-called relational approaches. While KRSLVQ and alternatives lead to state of the art results, these extensions have two drawbacks as compared to their vectorial counterparts: (i) a quadratic training complexity is encountered due to the dependency of the methods on the full proximity matrix; (ii) prototypes are no longer given by vectors but they are represented in terms of an implicit linear combination of data, i.e. interpretability of the prototypes is lost. We investigate different techniques to deal with these challenges: We consider a speed-up of training by means of low rank approximations of the Gram matrix by its Nyström approximation. In benchmarks, this strategy is successful if the considered data are intrinsically low-dimensional. We propose a quick check to efficiently test this property prior to training. We extend KRSLVQ by sparse approximations of the prototypes: instead of the full coefficient vectors, few exemplars which represent the prototypes can be directly inspected by practitioners in the same way as data. We compare different paradigms based on which to infer a sparse approximation: sparsity priors while training, geometric approaches including orthogonal matching pursuit and core techniques, and heuristic approximations based on the coefficients or proximities. We demonstrate the performance of these LVQ techniques for benchmark data, reaching state of the art results. We discuss the behavior of the methods to enhance performance and interpretability as concerns quality, sparsity, and representativity, and we propose different measures how to quantitatively evaluate the performance of the approaches. We would like to point out that we had the possibility to present our findings in international publication organs including three journal articles [6, 9, 2], four conference papers [8, 5, 7, 1] and two workshop contributions [4, 3]. References [1] A. Gisbrecht, D. Hofmann, and B. Hammer. Discriminative dimensionality reduction mappings. Advances in Intelligent Data Analysis, 7619: 126–138, 2012. [2] B. Hammer, D. Hofmann, F.-M. Schleif, and X. Zhu. Learning vector quantization for (dis-)similarities. Neurocomputing, 131: 43–51, 2014. [3] D. Hofmann. Sparse approximations for kernel robust soft lvq. Mittweida Workshop on Computational Intelligence, 2013. [4] D. Hofmann, A. Gisbrecht, and B. Hammer. Discriminative probabilistic prototype based models in kernel space. New Challenges in Neural Computation, TR Machine Learning Reports, 2012. [5] D. Hofmann, A. Gisbrecht, and B. Hammer. Efficient approximations of kernel robust soft lvq. Workshop on Self-Organizing Maps, 198: 183–192, 2012. [6] D. Hofmann, A. Gisbrecht, and B. Hammer. Efficient approximations of robust soft learning vector quantization for non-vectorial data. Neurocomputing, 147: 96–106, 2015. [7] D. Hofmann and B. Hammer. Kernel robust soft learning vector quantization. Artificial Neural Networks in Pattern Recognition, 7477: 14–23, 2012. [8] D. Hofmann and B. Hammer. Sparse approximations for kernel learning vector quantization. European Symposium on Artificial Neural Networks, 549–554, 2013. [9] D. Hofmann, F.-M. Schleif, B. Paaßen, and B. Hammer. Learning interpretable kernelized prototype-based models. Neurocomputing, 141: 84–96, 2014
    • …
    corecore