8 research outputs found

    Probabilistic web image gathering

    Full text link
    We propose a new method for automated large scale gath-ering of Web images relevant to speci¯ed concepts. Our main goal is to build a knowledge base associated with as many concepts as possible for large scale object recognition studies. A second goal is supporting the building of more accurate text-based indexes for Web images. In our method, good quality candidate sets of images for each keyword are gathered as a function of analysis of the surrounding HTML text. The gathered images are then segmented into regions, and a model for the probability distribution of regions for the concept is computed using an iterative algorithm based on the previous work on statistical image annotation. The learned model is then applied to identify which images are visually relevant to the concept implied by the keyword. Implicitly, which regions or the images are relevant is also determined. Our experiments reveal that the new method performs much better than Google Image Search and a sim-ple method based on more standard content based image retrieval methods

    Semantic Learning and Web Image Mining with Image Recognition and Classification

    Get PDF
    Image mining is more than just an extension of data mining to image domain. Web Image mining is a technique commonly used to extract knowledge directly from images on WWW. Since main targets of conventional Web mining are numerical and textual data, Web mining for image data is on demand. There are huge image data as well as text data on the Web. However, mining image data from the Web is paid less attention than mining text data, since treating semantics of images are much more difficult. This paper proposes a novel image recognition and image classification technique using a large number of images automatically gathered from the Web as learning images. For classification the system uses imagefeature- based search exploited in content-based image retrieval(CBIR), which do not restrict target images unlike conventional image recognition methods and support vector machine(SVM), which is one of the most efficient & widely used statistical method for generic image classification that fit to the learning tasks. By the experiments it is observed that the proposed system outperforms some existing search system

    A Comprehensive Review on the Relevance Feedback in Visual Information Retrieval

    Get PDF
    Abstract-Visual information retrieval in images and video has been developing rapidly in our daily life and is an important research field in content-based information indexing and retrieval, automatic annotation and structuring of images. Visual information system can make the use of relevance feedback so that the user progressively refines the search result by marking images in the result as relevant , not relevant or neutral to the search query and then repeating the search with the new information. With a comprehensive review as the main portion, this paper also suggested some novel solutions and perspectives throughout the discussion. Introduce the concept of Negative bootstrap, opens up interesting avenues for future research. Keywords-Bootstrapping, CBIR (Content Based Image Retrieval), Relevance feedback VIR (Visual Information Retrieval). I. INTRODUCTION There has been a renewed spurt of research activity in Visual Information Retrieval. Basically two kinds of information are associated with a visual object (image or video): information about the object, called its metadata, and information contained within the object, called visual features. Metadata is alphanumeric and generally expressible as a schema of a relational or object-oriented database. Visual features are derived through computational processes typically image processing, computer vision, and computational geometric routines executed on the visual object. The simplest visual features that can be computed are based on pixel values of raw data, and several early image database systems [1] used pixels as the basis of their data models. In many specific applications, the process of visual feature extraction is limited by the availability of fast, implementable techniques in image processing and computer vision II. RELATED WORK Initially developed in document retrieval (Salton 1989), relevance feedback was transformed and introduced into content-based multimedia retrieval, mainly content-based image retrieval CBIR)[3]

    ConceptMap: Mining noisy web data for concept learning

    Get PDF
    We attack the problem of learning concepts automatically from noisy Web image search results. The idea is based on discovering common characteristics shared among subsets of images by posing a method that is able to organise the data while eliminating irrelevant instances. We propose a novel clustering and outlier detection method, namely Concept Map (CMAP). Given an image collection returned for a concept query, CMAP provides clusters pruned from outliers. Each cluster is used to train a model representing a different characteristics of the concept. The proposed method outperforms the state-of-the-art studies on the task of learning from noisy web data for low-level attributes, as well as high level object categories. It is also competitive with the supervised methods in learning scene concepts. Moreover, results on naming faces support the generalisation capability of the CMAP framework to different domains. CMAP is capable to work at large scale with no supervision through exploiting the available sources. © 2014 Springer International Publishing

    Mining web images for concept learning

    Get PDF
    Includes bibliographical references (leaves 56-64).Thesis (M.S.): Bilkent University, The Department of Computer Engineering and the Graduate School of Engineering and Science, 2014.Cataloged from PDF version of thesis.We attack the problem of learning concepts automatically from noisy Web image search results. The idea is based on discovering common characteristics shared among category images by posing two novel methods that are able to organise the data while eliminating irrelevant instances. We propose a novel clustering and outlier detection method, namely Concept Map (CMAP). Given an image collection returned for a concept query, CMAP provides clusters pruned from outliers. Each cluster is used to train a model representing a different characteristics of the concept. One another method is Association through Model Evolution (AME). It prunes the data in an iterative manner and it progressively finds better set of images with an evaluational score computed for each iteration. The idea is based on capturing discriminativeness and representativeness of each instance against large number of random images and eliminating the outliers. The final model is used for classification of novel images. These two methods are applied on different benchmark problems and we observed compelling or better results compared to state of art methods.by Eren Golge.M.S

    Language and Perceptual Categorization in Computational Visual Recognition

    Get PDF
    Computational visual recognition or giving computers the ability to understand images as well as humans do is a core problem in Computer Vision. Traditional recognition systems often describe visual content by producing a set of isolated labels, object locations, or by even trying to annotate every pixel in an image with a category. People instead describe the visual world using language. The rich visually descriptive language produced by people incorporates information from human intuition, world knowledge, visual saliency, and common sense that go beyond detecting individual visual concepts like objects, attributes, or scenes. Moreover, due to the rising popularity of social media, there exist billions of images with associated text on the web, yet systems that can leverage this type of annotations or try to connect language and vision are scarce. In this dissertation, we propose new approaches that explore the connections between language and vision at several levels of detail by combining techniques from Computer Vision and Natural Language Understanding. We first present a data-driven technique for understanding and generating image descriptions using natural language, including automatically collecting a big-scale dataset of images with visually descriptive captions. Then we introduce a system for retrieving short visually descriptive phrases for describing some part or aspect of an image, and a simple technique to generate full image descriptions by stitching short phrases. Next we introduce an approach for collecting and generating referring expressions for objects in natural scenes at a much larger scale than previous studies. Finally, we describe methods for learning how to name objects by using intuitions from perceptual categorization related to basic-level and entry-level categories. The main contribution of this thesis is in advancing our knowledge on how to leverage language and intuitions from human perception to create visual recognition systems that can better learn from and communicate with people.Doctor of Philosoph

    ビデオ映像に対する人間動作の認識

    Get PDF
    Our overall purpose in this dissertation is automatic construction of a large-scale action database with Web data, which could be helpful for the better exploration of action recognition. We conducted large-scale experiments on 100 human actions and 12 nonhuman actions and obtained promissing results. This disseration is constructed with 6 chapters. In the followings, we briefly introduce the content of each chapter. In Chapter 1, recent approaches on action recognition as well as the necessity of building a large-scale action database and its difficulties are described. Then our works to solve the problem are concisely explained. In Chapter 2, the first work which introduces a framework of extracting automatically relevant video shots of specific actions from Web videos is described in details. This framework at first, selects relevant videos among thousands of Web videos for a given action using tag co-occurance and then, divides selected videos into video shots. Video shots are then ranked based on their visual linkage. The top ranked video shots are supposed to be the most related shots of the action. Moreover, our method of adopting Web images to shot ranking is also introduced. Finally, large-scale experiments on 100 human actions and 12 non-human actions and their results are described. In Chapter 3, the second work which aims to further improve shot ranking of the above framework by proposing a novel ranking method is introduced. Our proposed ranking method, which is called VisualTextualRank, is an extension of a conventional method, VisualRank, which is applied to shot ranking in Chapter 2. VisualTextualRank effectively employs both textual information and visual information extracted from the data. Our experiment results showed that using our method instead of the conventional ranking method could obtain more relevant shots. In Chapter 4, the third work which aims to obtain more informative and representative features of videos is described. Based on a conventional method of extracting spatiotemporal features which was adopted in Chapter 2 and Chapter 3, we propose to extract spatio-temporal features with triangulation of dense SURF keypoints. Shape features of the triangles along with visual features and motion features of their points are taken into account to form our features. By applying our method of feature extraction to the framework introduced in Chapter 2, we show that more relevant video shots can be retrieved at the top. Furthermore, the effectiveness of our method is also validated on action classification for UCF-101 and UCF-50 which are well-known large-scale data sets. The experiment results demonstrate that our features are comparable and complementary to the state-of-the-art. In Chapter 5, the final work which focuses on recognition of hand motion based actions is introduced. We propose a system of hand detection and tracking for unconstrained videos and extract hand movement based features from detected and tracked hand regions. These features are supposed to help improve results for hand motion based actions. To evaluate the performance of our system on hand detection, we use Video-Pose2.0 dataset which is a challenging dataset with uncontrolled videos. To validate the effectiveness of our features, we conduct experiments on ne-grained action recognition with \\playing instruments" group in UCF-101 data set. The experiment results show the efficiency of our system. In Chapter 6, our works with their major points and findings are summarized. We also consider the potential of applying the results obtained by our works to further researches.電気通信大学201

    Probabilistic web image gathering

    No full text
    We propose a new method for automated large scale gathering of Web images relevant to specified concepts. Our main goal is to build a knowledge base associated with as many concepts as possible for large scale object recognition studies. A second goal is supporting the building of more accurate text-based indexes for Web images. In our method, good quality candidate sets of images for each keyword are gathered as a function of analysis of the surrounding HTML text. The gathered images are then segmented into regions, and a model for the probability distribution of regions for the concept is computed using an iterative algorithm based on the previous work on statistical image annotation. The learned model is then applied to identify which images are visually relevant to the concept implied by the keyword. Implicitly, which regions or the images are relevant is also determined. Our experiments reveal that the new method performs much better than Google Image Search and a simple method based on more standard content based image retrieval methods
    corecore