75 research outputs found

    Towards automatic construction of diverse, high-quality image dataset

    Full text link
    University of Technology Sydney. Faculty of Engineering and Information Technology.The availability of labeled image datasets has been shown critical for high-level image understanding, which continuously drives the progress of feature designing and models developing. However, the process of manual labeling is both time-consuming and labor-intensive. To reduce the cost of manual annotation, there has been increased research interest in automatically constructing image datasets by exploiting web images. Datasets constructed by existing methods tend to suffer from the disadvantage of low accuracy and low diversity. These datasets tend to have a weak domain adaptation ability, which is known as the “dataset bias problem”. This research aims at automatically collect accurate and diverse images for given queries from the Web, and construct a domain robust image dataset. Thus, within this thesis, various methods are developed and presented to address the following research challenges. The first is the retrieved web images are usually noisy, how to remove noise and construct a relatively high accuracy dataset. The second is the collected web images are often associated with low diversity, how to address the dataset bias problem and construct a domain robust dataset. In Chapter 3, a framework is presented to address the problem of polysemy in the process of constructing a high accuracy dataset. Visual polysemy means that a word has several semantic (text) senses that are visually (image) distinct. Solving polysemy can help to choose appropriate visual senses for sense-specific images collection, thereby improving the accuracy of the collected images. Unlike previous methods which leveraged the human-developed knowledge such as Wikipedia or dictionaries to handle polysemy, we propose to automate the process of discovering and distinguishing multiple visual senses from untagged corpora to solve the problem of polysemy. In Chapter 4, a domain robust framework is presented for image dataset construction. To address the dataset bias problem, our framework mainly consists of three stages. Specifically, we first obtain the candidate query expansions by searching in the Google Books Ngram Corpus. Then, by treating word-word (semantic) and visual-visual distance (visual) as features from two different views, we formulate noisy query expansions pruning as a multi-view learning problem. Finally, by treating each selected query expansion as a “bag” and the images therein as “instances”, we formulate image selection and noise removal as a multi-instance learning problem. In this way, images from different distributions can be kept while noise is filtered out. Chapter 5 details a method for noisy images removing and accurate images selecting. The accuracy of selected images is limited by two issues: the noisy query expansions which are not filtered out and the error index of image search engine. To deal with the noisy query expansions, we divide them into two types and propose to remove noise from visual consistency and relevancy respectively. To handle noise induced by error index, we classify the noisy images into three categories and filter out noise by different mechanisms separately. Chapter 6 proposes an approach for enhancing classifier learning by using the collected web images. Different from previous works, our approach, while improving the accuracy and robustness of the classifier, greatly reduces the time and labor dependence. Specifically, we proposed a new instance-level MIL model to select a subset of training images from each selected privileged information and simultaneously learn the optimal classifiers based on the selected images. Chapter 7 concludes the thesis and outlines the scope of future work

    Deep Learning for Person Reidentification Using Support Vector Machines

    Get PDF
    © 2017 Mengyu Xu et al. Due to the variations of viewpoint, pose, and illumination, a given individual may appear considerably different across different camera views. Tracking individuals across camera networks with no overlapping fields is still a challenging problem. Previous works mainly focus on feature representation and metric learning individually which tend to have a suboptimal solution. To address this issue, in this work, we propose a novel framework to do the feature representation learning and metric learning jointly. Different from previous works, we represent the pairs of pedestrian images as new resized input and use linear Support Vector Machine to replace softmax activation function for similarity learning. Particularly, dropout and data augmentation techniques are also employed in this model to prevent the network from overfitting. Extensive experiments on two publically available datasets VIPeR and CUHK01 demonstrate the effectiveness of our proposed approach
    • …
    corecore