3,512 research outputs found

    Hybrid Information Retrieval Model For Web Images

    Full text link
    The Bing Bang of the Internet in the early 90's increased dramatically the number of images being distributed and shared over the web. As a result, image information retrieval systems were developed to index and retrieve image files spread over the Internet. Most of these systems are keyword-based which search for images based on their textual metadata; and thus, they are imprecise as it is vague to describe an image with a human language. Besides, there exist the content-based image retrieval systems which search for images based on their visual information. However, content-based type systems are still immature and not that effective as they suffer from low retrieval recall/precision rate. This paper proposes a new hybrid image information retrieval model for indexing and retrieving web images published in HTML documents. The distinguishing mark of the proposed model is that it is based on both graphical content and textual metadata. The graphical content is denoted by color features and color histogram of the image; while textual metadata are denoted by the terms that surround the image in the HTML document, more particularly, the terms that appear in the tags p, h1, and h2, in addition to the terms that appear in the image's alt attribute, filename, and class-label. Moreover, this paper presents a new term weighting scheme called VTF-IDF short for Variable Term Frequency-Inverse Document Frequency which unlike traditional schemes, it exploits the HTML tag structure and assigns an extra bonus weight for terms that appear within certain particular HTML tags that are correlated to the semantics of the image. Experiments conducted to evaluate the proposed IR model showed a high retrieval precision rate that outpaced other current models.Comment: LACSC - Lebanese Association for Computational Sciences, http://www.lacsc.org/; International Journal of Computer Science & Emerging Technologies (IJCSET), Vol. 3, No. 1, February 201

    Embedding Web-based Statistical Translation Models in Cross-Language Information Retrieval

    Get PDF
    Although more and more language pairs are covered by machine translation services, there are still many pairs that lack translation resources. Cross-language information retrieval (CLIR) is an application which needs translation functionality of a relatively low level of sophistication since current models for information retrieval (IR) are still based on a bag-of-words. The Web provides a vast resource for the automatic construction of parallel corpora which can be used to train statistical translation models automatically. The resulting translation models can be embedded in several ways in a retrieval model. In this paper, we will investigate the problem of automatically mining parallel texts from the Web and different ways of integrating the translation models within the retrieval process. Our experiments on standard test collections for CLIR show that the Web-based translation models can surpass commercial MT systems in CLIR tasks. These results open the perspective of constructing a fully automatic query translation device for CLIR at a very low cost.Comment: 37 page

    Flexible information retrieval: some research trends

    Get PDF
    In this paper some research trends in the field of Information Retrieval are presented. The focus is on the definition of flexible systems, i.e. systems that can represent and manage the vagueness and uncertainty which is characteristic of the process of information searching and retrieval. In this paper the application of soft computing techniques is considered, in particular fuzzy set theory

    Accessing Textual Information Embedded in Internet Images

    No full text
    Indexing and searching for WWW pages is relying on analysing text. Current technology cannot process the text embedded in images on WWW pages. This paper argues that this is a significant problem as text in image form is usually semantically important (e.g. headers, titles). The results of a recent study are presented to show that the majority (76%) of words embedded in images do not appear elsewhere in the main text and that the majority (56%) of ALT tag descriptions of images are incorrect or do not exist at all. Research under way to devise tools to extract text from images based on the way humans perceive colour differences is outlined and results are presented

    Image retrieval by hypertext links

    Get PDF
    This paper presents a model for retrieval of images from a large World Wide Web based collection. Rather than considering complex visual recognition algorithms, the model presented is based on combining evidence of the text content and hypertext structure of the Web. The paper shows that certain types of query are amply served by this form of representation. It also presents a novel means of gathering relevance judgements

    USING SOCIAL ANNOTATIONS TO IMPROVE WEB SEARCH

    Get PDF
    Web-based tagging systems, which include social bookmarking systems such as Delicious, have become increasingly popular. These systems allow participants to annotate or tag web resources. This research examined the use of social annotations to improve the quality of web searches. The research involved three components. First, social annotations were used to index resources. Two annotation-based indexing methods were proposed: annotation based indexing and full text with annotation indexing. Second, social annotations were used to improve search result ranking. Six annotation based ranking methods were proposed: Popularity Count, Propagate Popularity Count, Query Weighted Popularity Count, Query Weighted Propagate Popularity Count, Match Tag Count and Normalized Match Tag Count. Third, social annotations were used to both index and rank resources. The result from the first experiment suggested that both static feature and similarity feature should be considered when using social annotations to re-rank search result. The result of the second experiment showed that using only annotation as an index of resources may not be a good idea. Since social Annotations could be viewed as a high level concept of the content, combining them to the content of resource could add some more important concepts to the resources. Last but not least, the result from the third experiment confirmed that the combination of using social annotations to rank the search result and using social annotations as resource index augmentation provided a promising rank of search results. It showed that social annotations could benefit web search
    • …
    corecore