1,784 research outputs found

    Revisiting Image Aesthetic Assessment via Self-Supervised Feature Learning

    Full text link
    Visual aesthetic assessment has been an active research field for decades. Although latest methods have achieved promising performance on benchmark datasets, they typically rely on a large number of manual annotations including both aesthetic labels and related image attributes. In this paper, we revisit the problem of image aesthetic assessment from the self-supervised feature learning perspective. Our motivation is that a suitable feature representation for image aesthetic assessment should be able to distinguish different expert-designed image manipulations, which have close relationships with negative aesthetic effects. To this end, we design two novel pretext tasks to identify the types and parameters of editing operations applied to synthetic instances. The features from our pretext tasks are then adapted for a one-layer linear classifier to evaluate the performance in terms of binary aesthetic classification. We conduct extensive quantitative experiments on three benchmark datasets and demonstrate that our approach can faithfully extract aesthetics-aware features and outperform alternative pretext schemes. Moreover, we achieve comparable results to state-of-the-art supervised methods that use 10 million labels from ImageNet.Comment: AAAI Conference on Artificial Intelligence, 2020, accepte

    Deep Metric Learning Assisted by Intra-variance in A Semi-supervised View of Learning

    Full text link
    Deep metric learning aims to construct an embedding space where samples of the same class are close to each other, while samples of different classes are far away from each other. Most existing deep metric learning methods attempt to maximize the difference of inter-class features. And semantic related information is obtained by increasing the distance between samples of different classes in the embedding space. However, compressing all positive samples together while creating large margins between different classes unconsciously destroys the local structure between similar samples. Ignoring the intra-class variance contained in the local structure between similar samples, the embedding space obtained from training receives lower generalizability over unseen classes, which would lead to the network overfitting the training set and crashing on the test set. To address these considerations, this paper designs a self-supervised generative assisted ranking framework that provides a semi-supervised view of intra-class variance learning scheme for typical supervised deep metric learning. Specifically, this paper performs sample synthesis with different intensities and diversity for samples satisfying certain conditions to simulate the complex transformation of intra-class samples. And an intra-class ranking loss function is designed using the idea of self-supervised learning to constrain the network to maintain the intra-class distribution during the training process to capture the subtle intra-class variance. With this approach, a more realistic embedding space can be obtained in which global and local structures of samples are well preserved, thus enhancing the effectiveness of downstream tasks. Extensive experiments on four benchmarks have shown that this approach surpasses state-of-the-art method

    Pre-Trained Driving in Localized Surroundings with Semantic Radar Information and Machine Learning

    Get PDF
    Entlang der Signalverarbeitungskette von Radar Detektionen bis zur Fahrzeugansteuerung, diskutiert diese Arbeit eine semantischen Radar Segmentierung, einen darauf aufbauenden Radar SLAM, sowie eine im Verbund realisierte autonome Parkfunktion. Die Radarsegmentierung der (statischen) Umgebung wird durch ein Radar-spezifisches neuronales Netzwerk RadarNet erreicht. Diese Segmentierung ermöglicht die Entwicklung des semantischen Radar Graph-SLAM SERALOC. Auf der Grundlage der semantischen Radar SLAM Karte wird eine beispielhafte autonome ParkfunktionalitĂ€t in einem realen VersuchstrĂ€ger umgesetzt. Entlang eines aufgezeichneten Referenzfades parkt die Funktion ausschließlich auf Basis der Radar Wahrnehmung mit bisher unerreichter Positioniergenauigkeit. Im ersten Schritt wird ein Datensatz von 8.2 · 10^6 punktweise semantisch gelabelten Radarpunktwolken ĂŒber eine Strecke von 2507.35m generiert. Es sind keine vergleichbaren DatensĂ€tze dieser Annotationsebene und Radarspezifikation öffentlich verfĂŒgbar. Das ĂŒberwachte Training der semantischen Segmentierung RadarNet erreicht 28.97% mIoU auf sechs Klassen. Außerdem wird ein automatisiertes Radar-Labeling-Framework SeRaLF vorgestellt, welches das Radarlabeling multimodal mittels Referenzkameras und LiDAR unterstĂŒtzt. FĂŒr die kohĂ€rente Kartierung wird ein Radarsignal-Vorfilter auf der Grundlage einer Aktivierungskarte entworfen, welcher Rauschen und andere dynamische Mehrwegreflektionen unterdrĂŒckt. Ein speziell fĂŒr Radar angepasstes Graph-SLAM-Frontend mit Radar-Odometrie Kanten zwischen Teil-Karten und semantisch separater NDT Registrierung setzt die vorgefilterten semantischen Radarscans zu einer konsistenten metrischen Karte zusammen. Die Kartierungsgenauigkeit und die Datenassoziation werden somit erhöht und der erste semantische Radar Graph-SLAM fĂŒr beliebige statische Umgebungen realisiert. Integriert in ein reales Testfahrzeug, wird das Zusammenspiel der live RadarNet Segmentierung und des semantischen Radar Graph-SLAM anhand einer rein Radar-basierten autonomen ParkfunktionalitĂ€t evaluiert. Im Durchschnitt ĂŒber 42 autonome Parkmanöver (∅3.73 km/h) bei durchschnittlicher ManöverlĂ€nge von ∅172.75m wird ein Median absoluter Posenfehler von 0.235m und End-Posenfehler von 0.2443m erreicht, der vergleichbare Radar-Lokalisierungsergebnisse um ≈ 50% ĂŒbertrifft. Die Kartengenauigkeit von verĂ€nderlichen, neukartierten Orten ĂŒber eine Kartierungsdistanz von ∅165m ergibt eine ≈ 56%-ige Kartenkonsistenz bei einer Abweichung von ∅0.163m. FĂŒr das autonome Parken wurde ein gegebener Trajektorienplaner und Regleransatz verwendet

    Proceedings of the 2020 Joint Workshop of Fraunhofer IOSB and Institute for Anthropomatics, Vision and Fusion Laboratory

    Get PDF
    In 2020 fand der jĂ€hrliche Workshop des Faunhofer IOSB und the Lehrstuhls fĂŒr interaktive Echtzeitsysteme statt. Vom 27. bis zum 31. Juli trugen die Doktorranden der beiden Institute ĂŒber den Stand ihrer Forschung vor in Themen wie KI, maschinellen Lernen, computer vision, usage control, Metrologie vor. Die Ergebnisse dieser VortrĂ€ge sind in diesem Band als technische Berichte gesammelt

    Gesture passwords: concepts, methods and challenges

    Full text link
    Biometrics are a convenient alternative to traditional forms of access control such as passwords and pass-cards since they rely solely on user-specific traits. Unlike alphanumeric passwords, biometrics cannot be given or told to another person, and unlike pass-cards, are always “on-hand.” Perhaps the most well-known biometrics with these properties are: face, speech, iris, and gait. This dissertation proposes a new biometric modality: gestures. A gesture is a short body motion that contains static anatomical information and changing behavioral (dynamic) information. This work considers both full-body gestures such as a large wave of the arms, and hand gestures such as a subtle curl of the fingers and palm. For access control, a specific gesture can be selected as a “password” and used for identification and authentication of a user. If this particular motion were somehow compromised, a user could readily select a new motion as a “password,” effectively changing and renewing the behavioral aspect of the biometric. This thesis describes a novel framework for acquiring, representing, and evaluating gesture passwords for the purpose of general access control. The framework uses depth sensors, such as the Kinect, to record gesture information from which depth maps or pose features are estimated. First, various distance measures, such as the log-euclidean distance between feature covariance matrices and distances based on feature sequence alignment via dynamic time warping, are used to compare two gestures, and train a classifier to either authenticate or identify a user. In authentication, this framework yields an equal error rate on the order of 1-2% for body and hand gestures in non-adversarial scenarios. Next, through a novel decomposition of gestures into posture, build, and dynamic components, the relative importance of each component is studied. The dynamic portion of a gesture is shown to have the largest impact on biometric performance with its removal causing a significant increase in error. In addition, the effects of two types of threats are investigated: one due to self-induced degradations (personal effects and the passage of time) and the other due to spoof attacks. For body gestures, both spoof attacks (with only the dynamic component) and self-induced degradations increase the equal error rate as expected. Further, the benefits of adding additional sensor viewpoints to this modality are empirically evaluated. Finally, a novel framework that leverages deep convolutional neural networks for learning a user-specific “style” representation from a set of known gestures is proposed and compared to a similar representation for gesture recognition. This deep convolutional neural network yields significantly improved performance over prior methods. A byproduct of this work is the creation and release of multiple publicly available, user-centric (as opposed to gesture-centric) datasets based on both body and hand gestures

    Minimising Human Annotation for Scalable Person Re-Identification

    Get PDF
    PhDAmong the diverse tasks performed by an intelligent distributed multi-camera surveillance system, person re-identification (re-id) is one of the most essential. Re-id refers to associating an individual or a group of people across non-overlapping cameras at different times and locations, and forms the foundation of a variety of applications ranging from security and forensic search to quotidian retail and health care. Though attracted rapidly increasing academic interests over the past decade, it still remains a non-trivial and unsolved problem for launching a practical reid system in real-world environments, due to the ambiguous and noisy feature of surveillance data and the potentially dramatic visual appearance changes caused by uncontrolled variations in human poses and divergent viewing conditions across distributed camera views. To mitigate such visual ambiguity and appearance variations, most existing re-id approaches rely on constructing fully supervised machine learning models with extensively labelled training datasets which is unscalable for practical applications in the real-world. Particularly, human annotators must exhaustively search over a vast quantity of offline collected data, manually label cross-view matched images of a large population between every possible camera pair. Nonetheless, having the prohibitively expensive human efforts dissipated, a trained re-id model is often not easily generalisable and transferable, due to the elastic and dynamic operating conditions of a surveillance system. With such motivations, this thesis proposes several scalable re-id approaches with significantly reduced human supervision, readily applied to practical applications. More specifically, this thesis has developed and investigated four new approaches for reducing human labelling effort in real-world re-id as follows: Chapter 3 The first approach is affinity mining from unlabelled data. Different from most existing supervised approaches, this work aims to model the discriminative information for reid without exploiting human annotations, but from the vast amount of unlabelled person image data, thus applicable to both semi-supervised and unsupervised re-id. It is non-trivial since the human annotated identity matching correspondence is often the key to discriminative re-id modelling. In this chapter, an alternative strategy is explored by specifically mining two types of affinity relationships among unlabelled data: (1) inter-view data affinity and (2) intra-view data affinity. In particular, with such affinity information encoded as constraints, a Regularised Kernel Subspace Learning model is developed to explicitly reduce inter-view appearance variations and meanwhile enhance intra-view appearance disparity for more discriminative re-id matching. Consequently, annotation costs can be immensely alleviated and a scalable re-id model is readily to be leveraged to plenty of unlabelled data which is inexpensive to collect. Chapter 4 The second approach is saliency discovery from unlabelled data. This chapter continues to investigate the problem of what can be learned in unlabelled images without identity labels annotated by human. Other than affinity mining as proposed by Chapter 3, a different solution is proposed. That is, to discover localised visual appearance saliency of person appearances. Intuitively, salient and atypical appearances of human are able to uniquely and representatively describe and identify an individual, whilst also often robust to view changes and detection variances. Motivated by this, an unsupervised Generative Topic Saliency model is proposed to jointly perform foreground extraction, saliency detection, as well as discriminative re-id matching. This approach completely avoids the exhaustive annotation effort for model training, and thus better scales to real-world applications. Moreover, its automatically discovered re-id saliency representations are shown to be semantically interpretable, suitable for generating useful visual analysis for deployable user-oriented software tools. Chapter 5 The third approach is incremental learning from actively labelled data. Since learning from unlabelled data alone yields less discriminative matching results, and in some cases there will be limited human labelling resources available for re-id modelling, this chapter thus investigate the problem of how to maximise a model’s discriminative capability with minimised labelling efforts. The challenges are to (1) automatically select the most representative data from a vast number of noisy/ambiguous unlabelled data in order to maximise model discrimination capacity; and (2) incrementally update the model parameters to accelerate machine responses and reduce human waiting time. To that end, this thesis proposes a regression based re-id model, characterised by its very fast and efficient incremental model updates. Furthermore, an effective active data sampling algorithm with three novel joint exploration-exploitation criteria is designed, to make automatic data selection feasible with notably reduced human labelling costs. Such an approach ensures annotations to be spent only on very few data samples which are most critical to model’s generalisation capability, instead of being exhausted by blindly labelling many noisy and redundant training samples. Chapter 6 The last technical area of this thesis is human-in-the-loop learning from relevance feedback. Whilst former chapters mainly investigate techniques to reduce human supervision for model training, this chapter motivates a novel research area to further minimise human efforts spent in the re-id deployment stage. In real-world applications where camera network and potential gallery size increases dramatically, even the state-of-the-art re-id models generate much inferior re-id performances and human involvements at deployment stage is inevitable. To minimise such human efforts and maximise re-id performance, this thesis explores an alternative approach to re-id by formulating a hybrid human-computer learning paradigm with humans in the model matching loop. Specifically, a Human Verification Incremental Learning model is formulated which does not require any pre-labelled training data, therefore scalable to new camera pairs; Moreover, the proposed model learns cumulatively from human feedback to provide an instant improvement to re-id ranking of each probe on-the-fly, thus scalable to large gallery sizes. It has been demonstrated that the proposed re-id model achieves significantly superior re-id results whilst only consumes much less human supervision effort. For facilitating a holistic understanding about this thesis, the main studies are summarised and framed into a graphical abstract as shown in Figur
    • 

    corecore