165 research outputs found

    Incremental hashing with sample selection using dominant sets

    Get PDF
    In the world of big data, large amounts of images are available in social media, corporate and even personal collections. A collection may grow quickly as new images are generated at high rates. The new images may cause changes in the distribution of existing classes or the emergence of new classes, resulting in the collection being dynamic and having concept drift. For efficient image retrieval from an image collection using a query, a hash table consisting of a set of hash functions is needed to transform images into binaryhash codeswhich are used as the basis to find similar images to the query. If the image collection is dynamic, the hash table built at one time step may not work well at the next due to changes in the collection as a result of new images being added. Therefore, the hash table needs to be rebuilt or updated at successive time steps. Incremental hashing (ICH) is the first effective method to deal with the concept drift problem in image retrieval from dynamic collections. In ICH, a new hash table is learned based on newly emerging images only which represent data distribution of the current data environment. The new hash table is used to generate hash codes for all images including old and new ones. Due to the dynamic nature, new images of one class may not be similar to old images of the same class. In order to learn new hash table that preserves within-class similarity in both old and new images,incremental hashing with sample selection using dominant sets(ICHDS) is proposed in this paper, which selects representative samples from each class for training the new hash table. Experimental results show that ICHDS yields better retrieval performance than existing dynamic and static hashing methods

    Concept Preserving Hashing for Semantic Image Retrieval with Concept Drift

    Get PDF

    Visual Data Association: Tracking, Re-identification and Retrieval

    Get PDF
    As there is a rapid development of the information society, large amounts of multimedia data are generated, which are shared and transferred on various electronic devices and the Internet every minute. Hence, building intelligent systems capable of associating these visual data at diverse locations and different times is absolutely essential and will significantly facilitate understanding and identifying where an object came from and where it is going. Thus, the estimated traces of motions or changes increasingly make it feasible to implement advanced algorithms to real-world applications, including human-computer interaction, robotic navigation, security in surveillance, biological characteristics association and civil structure vibration detection. However, due to the inherent challenges, such as ambiguity, heterogeneity, noisy data, large-scale property and unknown variations, visual data association is currently far from being established. Therefore, this thesis focuses on the studies of associating visual data at diverse locations and different times for the tasks of tracking, re-identification and retrieval. More specifically, three situations including single camera, across multiple cameras and across multiple modalities have been investigated and four algorithms have been developed at different levels. Chapter 3 The first algorithm is to explore an ensemble system for robust object tracking, primarily considering the independence of classifier members. An empirical analysis is firstly given to show that object tracking is a non-i.i.d. sampling, under-sample and incomplete-dataset problem. Then, a set of independent classifiers trained sequentially on different small datasets is dynamically maintained to overcome the particular machine learning problem. Thus, for every challenge, an optimal classifier can be approximated in a subspace spanned by the selected competitive classifiers. Chapter 4 The second method is to improve the object tracking by exploiting a winner-take-all strategy to select the most suitable trackers. This topic naturally extends the concept of ensemble in the first topic to a more general idea: a multi-expert system, in which members come from different function spaces. Thus, the diversity of the system is more likely to be amplified. Based on a large public dataset, a prediction model of performance for different trackers on various challenges can be obtained off-line. Then, the learned structural regression model can be directly used to efficiently select the winner tracker online. Chapter 5 The third one is to learn cross-view identities for fast person re-identification, in a cross-camera setting, which significantly differs from the single-view object tracking in the first two topics. Two sets of discriminative hash functions for two different views are learned by simultaneously minimising their distance in the Hamming space, and maximising the cross-covariance and margin. Thus, similar binary codes can be found for images of the same person captured at different views by embedding the images into the Hamming space. Chapter 6 The fourth model is to develop a novel Hetero-manifold regularisation framework for efficient cross-modal retrieval. Compared with the first two settings, this is a more general and complex topic, in which the samples can be relaxed to the images captured in the very far distance or very long time, even to text, voice and other formats. Taking advantage of the hetero-manifold, the similarity between each pair of heterogeneous data could be naturally measured by three order random walks on this hetero-manifold. It is concluded that, by fully exploiting the algorithms for solving the problems in the three situations, an integrated trace for an object moving anywhere can be definitely discovered

    Towards Practicality of Sketch-Based Visual Understanding

    Full text link
    Sketches have been used to conceptualise and depict visual objects from pre-historic times. Sketch research has flourished in the past decade, particularly with the proliferation of touchscreen devices. Much of the utilisation of sketch has been anchored around the fact that it can be used to delineate visual concepts universally irrespective of age, race, language, or demography. The fine-grained interactive nature of sketches facilitates the application of sketches to various visual understanding tasks, like image retrieval, image-generation or editing, segmentation, 3D-shape modelling etc. However, sketches are highly abstract and subjective based on the perception of individuals. Although most agree that sketches provide fine-grained control to the user to depict a visual object, many consider sketching a tedious process due to their limited sketching skills compared to other query/support modalities like text/tags. Furthermore, collecting fine-grained sketch-photo association is a significant bottleneck to commercialising sketch applications. Therefore, this thesis aims to progress sketch-based visual understanding towards more practicality.Comment: PhD thesis successfully defended by Ayan Kumar Bhunia, Supervisor: Prof. Yi-Zhe Song, Thesis Examiners: Prof Stella Yu and Prof Adrian Hilto

    Exploring deep learning powered person re-identification

    Get PDF
    With increased security demands, more and more video surveillance systems are installed in public places, such as schools, stations, and shopping malls. Such large-scale monitoring requires 24/7 video analytics, which cannot be achieved purely by manual operations. Thanks to recent advances in artificial intelligence (AI), deep learning algorithms enable automatic video analytics via smart devices, which interpret people/vehicle behaviours in real time to avoid anomalies effectively. Among various video analytical tasks, people search is one of the most critical use cases due to its wide application scenarios, such as searching for missing people, detecting intruders, and tracking suspects. However, current AI-powered people search is generally built upon facial recognition technique, which is effective yet may be privacy-invaded. To address the problem, person re-identification (ReID), which aims to identify person-of-interest without facial information, has become an effective panacea. Despite considerable achievements in recent years, person ReID still faces some tough challenges, such as 1) the strong reliance on identity labels during feature learning, 2) the tradeoff between searching speed and identification accuracy, and 3) the huge modality discrepancy lying between data from different sources, e.g., RGB image and infrared (IR) image. Therefore, the research interest of this thesis is to focus on the above challenges in person ReID, analyze the advantages and limitations of existing solutions, and propose improved solutions for each challenge. Specifically, to alleviate the identity label reliance during feature learning, an improved unsupervised person ReID framework is proposed in Chapter 3, which refines not only imperfect cluster results but also the optimisation directions of samples. Based on the unsupervised setting, we further focus on the tradeoff between searching speed and identification accuracy. To this end, an improved unsupervised binary feature learning scheme for person ReID is proposed in Chapter 4, which derives binary identity representations that not only are robust to transformations but also have low bit correlations. Apart from person ReID conducted within a single modality where both query and gallery are RGB images, cross-modality retrieval is more challenging yet more common in real-world scenarios. To handle the problem, a two-stream framework, facilitating person ReID with on-the-fly keypoint-aware features, is proposed in Chapter 5. Furthermore, the thesis spots several promising research topics in Chapter 6, which are instructive for future works in person ReI

    User-centric Music Information Retrieval

    Get PDF
    The rapid growth of the Internet and the advancements of the Web technologies have made it possible for users to have access to large amounts of on-line music data, including music acoustic signals, lyrics, style/mood labels, and user-assigned tags. The progress has made music listening more fun, but has raised an issue of how to organize this data, and more generally, how computer programs can assist users in their music experience. An important subject in computer-aided music listening is music retrieval, i.e., the issue of efficiently helping users in locating the music they are looking for. Traditionally, songs were organized in a hierarchical structure such as genre-\u3eartist-\u3ealbum-\u3etrack, to facilitate the users’ navigation. However, the intentions of the users are often hard to be captured in such a simply organized structure. The users may want to listen to music of a particular mood, style or topic; and/or any songs similar to some given music samples. This motivated us to work on user-centric music retrieval system to improve users’ satisfaction with the system. The traditional music information retrieval research was mainly concerned with classification, clustering, identification, and similarity search of acoustic data of music by way of feature extraction algorithms and machine learning techniques. More recently the music information retrieval research has focused on utilizing other types of data, such as lyrics, user access patterns, and user-defined tags, and on targeting non-genre categories for classification, such as mood labels and styles. This dissertation focused on investigating and developing effective data mining techniques for (1) organizing and annotating music data with styles, moods and user-assigned tags; (2) performing effective analysis of music data with features from diverse information sources; and (3) recommending music songs to the users utilizing both content features and user access patterns
    • …
    corecore