14 research outputs found

    Unsupervised Multiple Person Tracking using AutoEncoder-Based Lifted Multicuts

    Full text link
    Multiple Object Tracking (MOT) is a long-standing task in computer vision. Current approaches based on the tracking by detection paradigm either require some sort of domain knowledge or supervision to associate data correctly into tracks. In this work, we present an unsupervised multiple object tracking approach based on visual features and minimum cost lifted multicuts. Our method is based on straight-forward spatio-temporal cues that can be extracted from neighboring frames in an image sequences without superivison. Clustering based on these cues enables us to learn the required appearance invariances for the tracking task at hand and train an autoencoder to generate suitable latent representation. Thus, the resulting latent representations can serve as robust appearance cues for tracking even over large temporal distances where no reliable spatio-temporal features could be extracted. We show that, despite being trained without using the provided annotations, our model provides competitive results on the challenging MOT Benchmark for pedestrian tracking

    Learning Embeddings for Image Clustering: An Empirical Study of Triplet Loss Approaches

    Full text link
    In this work, we evaluate two different image clustering objectives, k-means clustering and correlation clustering, in the context of Triplet Loss induced feature space embeddings. Specifically, we train a convolutional neural network to learn discriminative features by optimizing two popular versions of the Triplet Loss in order to study their clustering properties under the assumption of noisy labels. Additionally, we propose a new, simple Triplet Loss formulation, which shows desirable properties with respect to formal clustering objectives and outperforms the existing methods. We evaluate all three Triplet loss formulations for K-means and correlation clustering on the CIFAR-10 image classification dataset

    Lernen tiefer visueller Merkmale für Minimum Cost Multicut Problem

    No full text
    Image clustering is one of the most important task of unsupervised learning in the area of computer vision. Deep learning approaches allow models to be trained on large datasets. In this thesis, image clustering objectives in the context of Triplet Loss induced embedding space are evaluated. Specifically, a simplification of the well-known Triplet Loss is proposed for learning an embedding space from data. This proposed loss function is designed for the Minimum Cost Multicut Problem. Furthermore, we highlight one key aspect of the Minimum Cost Multicut Problem in terms of scalability and propose a novel approach to overcome this issue. We show empirically, that the proposed algorithm achieves a significant speedup while preserving the clustering accuracy at the same time. The algorithm is able to cluster a dataset with approximately 100.000 images in under one minute using 40 computing threads, where the embedding space is trained with the simplified Triplet Loss. We then apply our proposed loss function on multiple person tracking problems. This problem is treated as a clustering problems on an imbalanced dataset, where each individual, unique person from the scene is considered as one cluster. We compare the tracking performance from two different approaches: the proposed Triplet Loss and an AutoEncoder architecture with reconstruction loss. Experiments show the effectiveness of the clustering task on a tracking application. Finally, we provide an empirical study on embedding space, trained on classification models. Various state-of-the-art models are evaluated against image corruptions. Our key finding suggests to utilize clustering as a predictor for model robustness.Das Clustering von Bildern ist eine der wichtigsten Aufgaben des unüberwachten Lernens im Bereich der Computer Vision. Deep-Learning-Ansätze ermöglichen das Trainieren von Modellen auf großen Datensätzen. In dieser Arbeit werden die Ziele der Bildclusterung im Kontext des Triplet Loss induzierten Einbettungsraums bewertet. Insbesondere wird eine Vereinfachung des bekannten Triplet Loss für das Lernen eines Einbettungsraums aus Daten vorgeschlagen. Diese vorgeschlagene Verlustfunktion ist für das Minimum Cost Multicut Problem konzipiert. Darüber hinaus heben wir einen Schlüsselaspekt des Minimum Cost Multicut Problems in Bezug auf die Skalierbarkeit hervor und schlagen einen neuen Ansatz vor, um dieses Problem zu überwinden. Wir zeigen empirisch, dass der vorgeschlagene Algorithmus eine signifikante Beschleunigung bei gleichzeitiger Beibehaltung der Clustering Genauigkeit erreicht. Der Algorithmus ist in der Lage, einen Datensatz mit ca. 100.000 Bildern in weniger als einer Minute zu clustern, wobei 40 Threads zum Einsatz kommen und der Einbettungsraum mit dem vereinfachten Triplet Loss trainiert wird. Anschließend wenden wir die von uns vorgeschlagene Verlustfunktion auf das Problem der Verfolgung mehrerer Personen an. Dieses Problem wird als ein Clustering-Problem auf einem unausge wogenen Datensatz behandelt, wobei jede einzelne, einzigartige Person aus der Szene als ein Cluster betrachtet wird. Wir vergleichen die Verfolgungsleistung von zwei verschiede nen Ansätzen: den vorgeschlagenen Triplet Loss und eine AutoEncoder-Architektur mit Rekonstruktionsverlust. Experimente zeigen die Effektivität der Clustering-Aufgabe in einer Tracking-Anwendung. Schließlich bieten wir eine empirische Studie zum Einbettungsraum, die auf Klassifizierungsmodellen trainiert wurde. Verschiedene Modelle auf dem neuesten Stand der Technik werden anhand von Bildverfälschungen bewertet. Unsere wichtigste Erkenntnis ist, dass Clustering als Prädiktor für die Robustheit des Modells verwendet werden sollte

    Unsupervised Multiple Person Tracking using AutoEncoder-Based Lifted Multicuts

    No full text
    Multiple Object Tracking (MOT) is a long-standing task in computer vision. Current approaches based on the tracking by detection paradigm either require some sort of domain knowledge or supervision to associate data correctly into tracks. In this work, we present an unsupervised multiple object tracking approach based on visual features and minimum cost lifted multicuts. Our method is based on straight-forward spatio-temporal cues that can be extracted from neighboring frames in an image sequences without superivison. Clustering based on these cues enables us to learn the required appearance invariances for the tracking task at hand and train an autoencoder to generate suitable latent representation. Thus, the resulting latent representations can serve as robust appearance cues for tracking even over large temporal distances where no reliable spatio-temporal features could be extracted. We show that, despite being trained without using the provided annotations, our model provides competitive results on the challenging MOT Benchmark for pedestrian tracking

    Learning Embeddings for Image Clustering: An Empirical Study of Triplet Loss Approaches

    No full text
    In this work, we evaluate two different image clustering objectives, k-means clustering and correlation clustering, in the context of Triplet Loss induced feature space embeddings. Specifically, we train a convolutional neural network to learn discriminative features by optimizing two popular versions of the Triplet Loss in order to study their clustering properties under the assumption of noisy labels. Additionally, we propose a new, simple Triplet Loss formulation, which shows desirable properties with respect to formal clustering objectives and outperforms the existing methods. We evaluate all three Triplet loss formulations for K-means and correlation clustering on the CIFAR-10 image classification dataset

    Learning Embeddings for Image Clustering: An Empirical Study of Triplet Loss Approaches

    No full text
    In this work, we evaluate two different image clustering objectives, k-means clustering and correlation clustering, in the context of Triplet Loss induced feature space embeddings. Specifically, we train a convolutional neural network to learn discriminative features by optimizing two popular versions of the Triplet Loss in order to study their clustering properties under the assumption of noisy labels. Additionally, we propose a new, simple Triplet Loss formulation, which shows desirable properties with respect to formal clustering objectives and outperforms the existing methods. We evaluate all three Triplet loss formulations for K-means and correlation clustering on the CIFAR-10 image classification dataset

    MSM: Multi-stage Multicuts for Scalable Image Clustering

    No full text
    Correlation Clustering, also called the minimum cost Multicut problem, is the process of grouping data by pairwise similarities. It has proven to be effective on clustering problems, where the number of classes is unknown. However, not only is the Multicut problem NP-hard, an undirected graph G with n vertices representing single images has at most edges, thus making it challenging to implement correlation clustering for large datasets. In this work, we propose Multi-Stage Multicuts (MSM) as a scalable approach for image clustering. Specifically, we solve minimum cost Multicut problems across multiple distributed compute units. Our approach not only allows to solve problem instances which are too large to fit into the shared memory of a single compute node, but it also achieves significant speedups while preserving the clustering accuracy at the same time. We evaluate our proposed method on the CIFAR10
    corecore