Visual attribute discovery and analyses from Web data

Abstract

Visual attributes are important for describing and understanding an object’s appearance. For an object classification or recognition task, an algorithm needs to infer the visual attributes of an object to compare, categorize or recognize the objects. In a zero-shot learning scenario, the algorithm depends on the visual attributes to describe an unknown object since the training samples are not available. Because different object categories usually share some common attributes (e.g. many animals have four legs, a tail and fur), the act of explicitly modeling attributes not only allows previously learnt attributes to be transferred to a novel category but also reduces the number of training samples for the new category which can be important when the number of training samples is limited. Even though larger numbers of visual attributes help the algorithm to better describe an image, they also require a larger set of training data. In the supervised scenario, data collection can be both a costly and time-consuming process. To mitigate the data collection costs, this dissertation exploits the weakly-supervised data from the Web in order to construct computational methodologies for the discovery of visual attributes, as well as an analysis across time and domains. This dissertation first presents an automatic approach to learning hundreds of visual attributes from the open-world vocabulary on the Web using a convolutional neural network. The proposed method tries to understand visual attributes in terms of perception inside deep neural networks. By focusing on the analysis of neural activations, the system can identify the degree to which an attribute can be visually perceptible and can localize the visual attributes in an image. Moreover, the approach exploits the layered structure of the deep model to determine the semantic depth of the attributes. Beyond visual attribute discovery, this dissertation explores how visual styles (i.e., attributes that correspond to multiple visual concepts) change across time. These are referred to as visual trends. To this goal, this dissertation introduces several deep neural networks for estimating when objects were made together with the analyses of the neural activations and their degree of entropy to gain insights into the deep network. To utilize the dating of the historical object frameworks in real-world applications, the dating frameworks are applied to analyze the influence of vintage fashion on runway collections, as well as to analyze the influence of fashion on runway collections and on street fashion. Finally, this dissertation introduces an approach to recognizing and transferring visual attributes across domains in a realistic manner. Given two input images from two different domains: 1) a shopping image, and 2) a scene image, this dissertation proposes a generative adversarial network for transferring the product pixels from the shopping image to the scene image such that: 1) the output image looks realistic and 2) the visual attributes of the product are preserved. In summary, this dissertation utilizes the weakly-supervised data from the Web for the purposes of visual attribute discovery and an analysis across time and domains. Beyond the novel computational methodology for each problem, this dissertation demonstrates that the proposed approaches can be applied to many real-world applications such as dating historical objects, visual trend prediction and analysis, cross-domain image label transfer, cross-domain pixel transfer for home decoration, among others.Doctor of Philosoph

    Similar works