2,530 research outputs found

    Focused image search in the social Web.

    Get PDF
    Recently, social multimedia-sharing websites, which allow users to upload, annotate, and share online photo or video collections, have become increasingly popular. The user tags or annotations constitute the new multimedia meta-data . We present an image search system that exploits both image textual and visual information. First, we use focused crawling and DOM Tree based web data extraction methods to extract image textual features from social networking image collections. Second, we propose the concept of visual words to handle the image\u27s visual content for fast indexing and searching. We also develop several user friendly search options to allow users to query the index using words and image feature descriptions (visual words). The developed image search system tries to bridge the gap between the scalable industrial image search engines, which are based on keyword search, and the slower content based image retrieval systems developed mostly in the academic field and designed to search based on image content only. We have implemented a working prototype by crawling and indexing over 16,056 images from flickr.com, one of the most popular image sharing websites. Our experimental results on a working prototype confirm the efficiency and effectiveness of the methods, that we proposed

    Weaving Entities into Relations: From Page Retrieval to Relation Mining on the Web

    Get PDF
    With its sheer amount of information, the Web is clearly an important frontier for data mining. While Web mining must start with content on the Web, there is no effective ``search-based'' mechanism to help sifting through the information on the Web. Our goal is to provide a such online search-based facility for supporting query primitives, upon which Web mining applications can be built. As a first step, this paper aims at entity-relation discovery, or E-R discovery, as a useful function-- to weave scattered entities on the Web into coherent relations. To begin with, as our proposal, we formalize the concept of E-R discovery. Further, to realize E-R discovery, as our main thesis, we abstract tuple ranking-- the essential challenge of E-R discovery-- as pattern-based cooccurrence analysis. Finally, as our key insight, we observe that such relation mining shares the same core functions as traditional page-retrieval systems, which enables us to build the new E-R discovery upon today's search engines, almost for free. We report our system prototype and testbed, WISDM-ER, with real Web corpus. Our case studies have demonstrated a high promise, achieving 83%-91% accuracy for real benchmark queries-- and thus the real possibilities of enabling ad-hoc Web mining tasks with online E-R discovery
    • …
    corecore