20,115 research outputs found

    CHORUS Deliverable 2.1: State of the Art on Multimedia Search Engines

    Get PDF
    Based on the information provided by European projects and national initiatives related to multimedia search as well as domains experts that participated in the CHORUS Think-thanks and workshops, this document reports on the state of the art related to multimedia content search from, a technical, and socio-economic perspective. The technical perspective includes an up to date view on content based indexing and retrieval technologies, multimedia search in the context of mobile devices and peer-to-peer networks, and an overview of current evaluation and benchmark inititiatives to measure the performance of multimedia search engines. From a socio-economic perspective we inventorize the impact and legal consequences of these technical advances and point out future directions of research

    A three-year study on the freshness of Web search engine databases

    Get PDF
    This paper deals with one aspect of the index quality of search engines: index freshness. The purpose is to analyse the update strategies of the major Web search engines Google, Yahoo, and MSN/Live.com. We conducted a test of the updates of 40 daily updated pages and 30 irregularly updated pages, respectively. We used data from a time span of six weeks in the years 2005, 2006, and 2007. We found that the best search engine in terms of up-to-dateness changes over the years and that none of the engines has an ideal solution for index freshness. Frequency distributions for the pages’ ages are skewed, which means that search engines do differentiate between often- and seldom-updated pages. This is confirmed by the difference between the average ages of daily updated pages and our control group of pages. Indexing patterns are often irregular, and there seems to be no clear policy regarding when to revisit Web pages. A major problem identified in our research is the delay in making crawled pages available for searching, which differs from one engine to another

    Distributed Information Retrieval using Keyword Auctions

    Get PDF
    This report motivates the need for large-scale distributed approaches to information retrieval, and proposes solutions based on keyword auctions

    Distributed resource discovery using a context sensitive infrastructure

    Get PDF
    Distributed Resource Discovery in a World Wide Web environment using full-text indices will never scale. The distinct properties of WWW information (volume, rate of change, topical diversity) limits the scaleability of traditional approaches to distributed Resource Discovery. An approach combining metadata clustering and query routing can, on the other hand, be proven to scale much better. This paper presents the Content-Sensitive Infrastructure, which is a design building on these results. We also present an analytical framework for comparing scaleability of different distribution strategies

    CHORUS Deliverable 2.2: Second report - identification of multi-disciplinary key issues for gap analysis toward EU multimedia search engines roadmap

    Get PDF
    After addressing the state-of-the-art during the first year of Chorus and establishing the existing landscape in multimedia search engines, we have identified and analyzed gaps within European research effort during our second year. In this period we focused on three directions, notably technological issues, user-centred issues and use-cases and socio- economic and legal aspects. These were assessed by two central studies: firstly, a concerted vision of functional breakdown of generic multimedia search engine, and secondly, a representative use-cases descriptions with the related discussion on requirement for technological challenges. Both studies have been carried out in cooperation and consultation with the community at large through EC concertation meetings (multimedia search engines cluster), several meetings with our Think-Tank, presentations in international conferences, and surveys addressed to EU projects coordinators as well as National initiatives coordinators. Based on the obtained feedback we identified two types of gaps, namely core technological gaps that involve research challenges, and “enablers”, which are not necessarily technical research challenges, but have impact on innovation progress. New socio-economic trends are presented as well as emerging legal challenges

    Experiments in terabyte searching, genomic retrieval and novelty detection for TREC 2004

    Get PDF
    In TREC2004, Dublin City University took part in three tracks, Terabyte (in collaboration with University College Dublin), Genomic and Novelty. In this paper we will discuss each track separately and present separate conclusions from this work. In addition, we present a general description of a text retrieval engine that we have developed in the last year to support our experiments into large scale, distributed information retrieval, which underlies all of the track experiments described in this document

    Optimising metadata to make high-value content more accessible to Google users

    Get PDF
    Purpose: This paper shows how information in digital collections that have been catalogued using high-quality metadata can be retrieved more easily by users of search engines such as Google. Methodology/approach: The research and proposals described arose from an investigation into the observed phenomenon that pages from the Glasgow Digital Library (gdl.cdlr.strath.ac.uk) were regularly appearing near the top of Google search results shortly after publication, without any deliberate effort to achieve this. The reasons for this phenomenon are now well understood and are described in the second part of the paper. The first part provides context with a review of the impact of Google and a summary of recent initiatives by commercial publishers to make their content more visible to search engines. Findings/practical implications: The literature research provides firm evidence of a trend amongst publishers to ensure that their online content is indexed by Google, in recognition of its popularity with Internet users. The practical research demonstrates how search engine accessibility can be compatible with use of established collection management principles and high-quality metadata. Originality/value: The concept of data shoogling is introduced, involving some simple techniques for metadata optimisation. Details of its practical application are given, to illustrate how those working in academic, cultural and public-sector organisations could make their digital collections more easily accessible via search engines, without compromising any existing standards and practices
    corecore