869 research outputs found

    Hybrid Information Retrieval Model For Web Images

    Full text link
    The Bing Bang of the Internet in the early 90's increased dramatically the number of images being distributed and shared over the web. As a result, image information retrieval systems were developed to index and retrieve image files spread over the Internet. Most of these systems are keyword-based which search for images based on their textual metadata; and thus, they are imprecise as it is vague to describe an image with a human language. Besides, there exist the content-based image retrieval systems which search for images based on their visual information. However, content-based type systems are still immature and not that effective as they suffer from low retrieval recall/precision rate. This paper proposes a new hybrid image information retrieval model for indexing and retrieving web images published in HTML documents. The distinguishing mark of the proposed model is that it is based on both graphical content and textual metadata. The graphical content is denoted by color features and color histogram of the image; while textual metadata are denoted by the terms that surround the image in the HTML document, more particularly, the terms that appear in the tags p, h1, and h2, in addition to the terms that appear in the image's alt attribute, filename, and class-label. Moreover, this paper presents a new term weighting scheme called VTF-IDF short for Variable Term Frequency-Inverse Document Frequency which unlike traditional schemes, it exploits the HTML tag structure and assigns an extra bonus weight for terms that appear within certain particular HTML tags that are correlated to the semantics of the image. Experiments conducted to evaluate the proposed IR model showed a high retrieval precision rate that outpaced other current models.Comment: LACSC - Lebanese Association for Computational Sciences, http://www.lacsc.org/; International Journal of Computer Science & Emerging Technologies (IJCSET), Vol. 3, No. 1, February 201

    Evaluating the retrieval effectiveness of Web search engines using a representative query sample

    Full text link
    Search engine retrieval effectiveness studies are usually small-scale, using only limited query samples. Furthermore, queries are selected by the researchers. We address these issues by taking a random representative sample of 1,000 informational and 1,000 navigational queries from a major German search engine and comparing Google's and Bing's results based on this sample. Jurors were found through crowdsourcing, data was collected using specialised software, the Relevance Assessment Tool (RAT). We found that while Google outperforms Bing in both query types, the difference in the performance for informational queries was rather low. However, for navigational queries, Google found the correct answer in 95.3 per cent of cases whereas Bing only found the correct answer 76.6 per cent of the time. We conclude that search engine performance on navigational queries is of great importance, as users in this case can clearly identify queries that have returned correct results. So, performance on this query type may contribute to explaining user satisfaction with search engines

    Access to information in digital libraries : users and digital divide

    Get PDF
    Recognising the importance of information and knowledge in all spheres of human life, the recently held World Summit on Information Society came up with a plan of action for building a global information society. The goal of the world information society initiatives is the same as that of digital library research and development - to make information and knowledge accessibleto everyone in the world. Digital libraries have progressed very rapidly over the past ten or soyears. This paper addresses the two most important aspects of the information society - information users and digital divide. Findings of some large-scale studies on human information behaviour on the web and digital libraries have been discussed. The major findings of a study on access to electronic resources by university students are the presented. Proposed that a one-stop window approach with a task-based information organisation and access system may be the way forward

    WAQS : a web-based approximate query system

    Get PDF
    The Web is often viewed as a gigantic database holding vast stores of information and provides ubiquitous accessibility to end-users. Since its inception, the Internet has experienced explosive growth both in the number of users and the amount of content available on it. However, searching for information on the Web has become increasingly difficult. Although query languages have long been part of database management systems, the standard query language being the Structural Query Language is not suitable for the Web content retrieval. In this dissertation, a new technique for document retrieval on the Web is presented. This technique is designed to allow a detailed retrieval and hence reduce the amount of matches returned by typical search engines. The main objective of this technique is to allow the query to be based on not just keywords but also the location of the keywords within the logical structure of a document. In addition, the technique also provides approximate search capabilities based on the notion of Distance and Variable Length Don\u27t Cares. The proposed techniques have been implemented in a system, called Web-Based Approximate Query System, which contains an SQL-like query language called Web-Based Approximate Query Language. Web-Based Approximate Query Language has also been integrated with EnviroDaemon, an environmental domain specific search engine. It provides EnviroDaemon with more detailed searching capabilities than just keyword-based search. Implementation details, technical results and future work are presented in this dissertation

    Multimedia Chinese Web Search Engines: A Survey

    Get PDF
    The objective of this paper is to explore the state of multimedia search functionality on major general and dedicated Web search engines in Chinese language. The authors studied: a) how many Chinese Web search engines presently make use of multimedia searching, and b) the type of multimedia search functionality available. Specifically, the following were examined: a) multimedia features - features allowing multimedia search; and b) extent of personalization - the extent to which a search engine Web site allows users to control multimedia search. Overall, Chinese Web search engines offer limited multimedia searching functionality. The significance of the study is based on two factors: a) little research has been conducted on Chinese Web search engines, and b) the instrument used in the study and the results obtained by this research could help users, Web designers, and Web search engine developers. By large, general Web search engines support more multimedia features than specialized one

    Concept hierarchy across languages in text-based image retrieval: a user evaluation

    Get PDF
    The University of Sheffield participated in Interactive ImageCLEF 2005 with a comparative user evaluation of two interfaces: one displaying search results as a list, the other organizing retrieved images into a hierarchy of concepts displayed on the interface as an interactive menu. Data was analysed with respect to effectiveness (number of images retrieved), efficiency (time needed) and user satisfaction (opinions from questionnaires). Effectiveness and efficiency were calculated at both 5 minutes (CLEF condition) and at final time. The list was marginally more effective than the menu at 5 minutes (no statistical significance) but the two were equal at final time showing the menu needs more time to be effectively used. The list was more efficient at both 5 minutes and final time, although the difference was not statistically significant. Users preferred the menu (75% vs. 25% for the list) indicating it to be an interesting and engaging feature. An inspection of the logs showed that 11% of effective terms (i.e. no stop-words, single terms) were not translated and that another 5% were ill translations. Some of those terms were used by all participants and were fundamental for some of the tasks. Non translated and ill translated terms negatively affected the search, hierarchy generation and, results display. More work has to be carried out to test the system under different setting, e.g. using a dictionary instead of MT that appears to be ineffective in translating users’ queries that rarely are grammatically correct. The evaluation also indicated directions for a new interface design that allows the user to check query translation (in both input and output) and that incorporates visual content image retrieval to improve result organization

    How people find videos

    Get PDF
    At present very little is known about how people locate and view videos 'in the wild'. This study draws a rich picture of everyday video seeking strategies and video information needs, based on an ethnographic study of New Zealand university students. These insights into the participants' activities and motivations suggest potentially useful facilities for a video digital library

    Finding video on the web

    Get PDF
    At present very little is known about how people locate and view videos. This study draws a rich picture of everyday video seeking strategies and video information needs, based on an ethnographic study of New Zealand university students. These insights into the participants’ activities and motivations suggest potentially useful facilities for a video digital library

    Template Mining for Information Extraction from Digital Documents

    Get PDF
    published or submitted for publicatio
    • 

    corecore