10,989 research outputs found

    Early Active Learning with Pairwise Constraint for Person Re-identification

    Full text link
    Β© 2017, Springer International Publishing AG. Research on person re-identification (re-id) has attached much attention in the machine learning field in recent years. With sufficient labeled training data, supervised re-id algorithm can obtain promising performance. However, producing labeled data for training supervised re-id models is an extremely challenging and time-consuming task because it requires every pair of images across no-overlapping camera views to be labeled. Moreover, in the early stage of experiments, when labor resources are limited, only a small number of data can be labeled. Thus, it is essential to design an effective algorithm to select the most representative samples. This is referred as early active learning or early stage experimental design problem. The pairwise relationship plays a vital role in the re-id problem, but most of the existing early active learning algorithms fail to consider this relationship. To overcome this limitation, we propose a novel and efficient early active learning algorithm with a pairwise constraint for person re-identification in this paper. By introducing the pairwise constraint, the closeness of similar representations of instances is enforced in active learning. This benefits the performance of active learning for re-id. Extensive experimental results on four benchmark datasets confirm the superiority of the proposed algorithm

    Visual analysis with limited supervision

    Full text link
    University of Technology Sydney. Faculty of Engineering and Information Technology.Visual analysis is an attractive research topic in the field of computer vision. In the visual analysis, there are two critical directions, visual retrieval and visual classification. In recent years, visual retrieval has been investigated and developed in many real-world applications, for instance, in person re-identification. On the other hand, visual classification is also widely studied, such as in image classification. Typical visual analysis methods are supervised learning algorithms. In such algorithms, extensive labeled data is demanded for training supervised models in order to achieve acceptable performance. However, it is difficult to collect and generate annotated data in the real world due to the limited resources, such as human labor for annotation. Therefore, it is urgent to develop methods to complete the visual analysis mission with limited supervision. In this thesis, we propose to address the visual analysis problem with limited supervision. Specifically, we treat limited supervision problem in three scenarios according to the amount of labeled data. In the first scenario, no labeled data are provided and only limited human labor for annotation is available; In the second scenario, scarce labeled data and abundant unlabeled data are accessible. In the third scenario, only few instances in the target dataset are labeled and there are multiple sources of labeled data from different domains. In Chapter 2 and Chapter 3, we discuss the first scenario, when no labeled data are provided, and only limited human labor for annotation is available. We propose to solve the problem via active learning. Unlike conventional active learning, which usually starts with a set of labeled data as the reference, in this thesis, we adopt the active learning algorithm with no pre-given labeled data. We refer these algorithms as the Early Active Learning. In this thesis, first, we attempt to select the most contributive instances for annotation and later being utilized for training supervised models. We demonstrate that even by annotating a few selected instances, the proposed method can achieve comparable performance in the visual retrieval. Second, we further extend the instance based active learning to pair-based early active learning. Other than select instances for annotation, the pair-based early active learning selects the most informative pairs for annotation, which is essential in the visual retrieval. In Chapter 4, in the second scenario, we address the visual retrieval problem when there are scarce labeled data and abundant unlabeled data. In this thesis, we propose to utilize both the labeled and the unlabeled data in a semi-supervised attribute learning schema. The proposed method could jointly learn the latent attributes with appropriate dimensions and estimate the pairwise probability of the data simultaneously. In Chapter 5 and Chapter 6, in the third scenario, we focus on visual classification with few or no labels, but there are pre-known labeled data from other domains. To improve the performance in the target domain, we adopt transfer learning algorithms to transfer helpful knowledge from the pre-known (source) domain with labeled data. First, in Chapter 5, the few-shot visual classification problem is considered. We have access to multiple source datasets with well-labeled data but can only access a limited set of labeled data in the target dataset. An Analogical Transfer Learning schema is proposed for this problem. It attempts to transfer the knowledge from the source domains to enhance the performance of the target domain models. In the algorithm, an analogy-revision schema is designed to select only the helpful source instances to enhance the target domain models. Second, in Chapter 6, we challenge a more difficult problem when there is no labeled data in the target domain in the visual retrieval problem. A Domain-aware Unsupervised Cross-dataset Transfer Learning algorithm is proposed to address this problem. The importance of universal and domain-unique appearances are valued simultaneously and jointly contribute to the representation learning. It manages to leverage the common and domain-unique representations across datasets in the unsupervised visual retrieval

    Activity understanding and unusual event detection in surveillance videos

    Get PDF
    PhDComputer scientists have made ceaseless efforts to replicate cognitive video understanding abilities of human brains onto autonomous vision systems. As video surveillance cameras become ubiquitous, there is a surge in studies on automated activity understanding and unusual event detection in surveillance videos. Nevertheless, video content analysis in public scenes remained a formidable challenge due to intrinsic difficulties such as severe inter-object occlusion in crowded scene and poor quality of recorded surveillance footage. Moreover, it is nontrivial to achieve robust detection of unusual events, which are rare, ambiguous, and easily confused with noise. This thesis proposes solutions for resolving ambiguous visual observations and overcoming unreliability of conventional activity analysis methods by exploiting multi-camera visual context and human feedback. The thesis first demonstrates the importance of learning visual context for establishing reliable reasoning on observed activity in a camera network. In the proposed approach, a new Cross Canonical Correlation Analysis (xCCA) is formulated to discover and quantify time delayed pairwise correlations of regional activities observed within and across multiple camera views. This thesis shows that learning time delayed pairwise activity correlations offers valuable contextual information for (1) spatial and temporal topology inference of a camera network, (2) robust person re-identification, and (3) accurate activity-based video temporal segmentation. Crucially, in contrast to conventional methods, the proposed approach does not rely on either intra-camera or inter-camera object tracking; it can thus be applied to low-quality surveillance videos featuring severe inter-object occlusions. Second, to detect global unusual event across multiple disjoint cameras, this thesis extends visual context learning from pairwise relationship to global time delayed dependency between regional activities. Specifically, a Time Delayed Probabilistic Graphical Model (TD-PGM) is proposed to model the multi-camera activities and their dependencies. Subtle global unusual events are detected and localised using the model as context-incoherent patterns across multiple camera views. In the model, different nodes represent activities in different decomposed re3 gions from different camera views, and the directed links between nodes encoding time delayed dependencies between activities observed within and across camera views. In order to learn optimised time delayed dependencies in a TD-PGM, a novel two-stage structure learning approach is formulated by combining both constraint-based and scored-searching based structure learning methods. Third, to cope with visual context changes over time, this two-stage structure learning approach is extended to permit tractable incremental update of both TD-PGM parameters and its structure. As opposed to most existing studies that assume static model once learned, the proposed incremental learning allows a model to adapt itself to reflect the changes in the current visual context, such as subtle behaviour drift over time or removal/addition of cameras. Importantly, the incremental structure learning is achieved without either exhaustive search in a large graph structure space or storing all past observations in memory, making the proposed solution memory and time efficient. Forth, an active learning approach is presented to incorporate human feedback for on-line unusual event detection. Contrary to most existing unsupervised methods that perform passive mining for unusual events, the proposed approach automatically requests supervision for critical points to resolve ambiguities of interest, leading to more robust detection of subtle unusual events. The active learning strategy is formulated as a stream-based solution, i.e. it makes decision on-the-fly on whether to request label for each unlabelled sample observed in sequence. It selects adaptively two active learning criteria, namely likelihood criterion and uncertainty criterion to achieve (1) discovery of unknown event classes and (2) refinement of classification boundary. The effectiveness of the proposed approaches is validated using videos captured from busy public scenes such as underground stations and traffic intersections
    • …
    corecore