87,788 research outputs found

    CC-interop : COPAC/Clumps Continuing Technical Cooperation. Final Project Report

    Get PDF
    As far as is known, CC-interop was the first project of its kind anywhere in the world and still is. Its basic aim was to test the feasibility of cross-searching between physical and virtual union catalogues, using COPAC and the three functioning "clumps" or virtual union catalogues (CAIRNS, InforM25, and RIDING), all funded or part-funded by JISC in recent years. The key issues investigated were technical interoperability of catalogues, use of collection level descriptions to search union catalogues dynamically, quality of standards in cataloguing and indexing practices, and usability of union catalogues for real users. The conclusions of the project were expected to, and indeed do, contribute to the development of the JISC Information Environment and to the ongoing debate as to the feasibility and desirability of creating a national UK catalogue. They also inhabit the territory of collection level descriptions (CLDs) and the wider services of JISC's Information Environment Services Registry (IESR). The results of this project will also have applicability for the common information environment, particularly through the landscaping work done via SCONE/CAIRNS. This work is relevant not just to HE and not just to digital materials, but encompasses other sectors and domains and caters for print resources as well. Key findings are thematically grouped as follows: System performance when inter-linking COPAC and the Z39.50 clumps. The various individual Z39.50 configurations permit technical interoperability relatively easily but only limited semantic interoperability is possible. Disparate cataloguing and indexing practices are an impairment to semantic interoperability, not just for catalogues but also for CLDs and descriptions of services (like those constituting JISC's IESR). Creating dynamic landscaping through CLDs: routines can be written to allow collection description databases to be output in formats that other UK users of CLDs, including developers of the JISC information environment. Searching a distributed (virtual) catalogue or clump via Z39.50: use of Z39.50 to Z39.50 middleware permits a distributed catalogue to be searched via Z39.50 from such disparate user services as another virtual union catalogue or clump, a physical union catalogue like COPAC, an individual Z client and other IE services. The breakthrough in this Z39.50 to Z39.50 conundrum came with the discovery that the JISC-funded JAFER software (a result of the 5/99 programme) meets many of the requirements and can be used by the current clumps services. It is technically possible for the user to select all or a sub-set of available end destination Z39.50 servers (we call this "landscaping") within this middleware. Comparing results processing between COPAC and clumps. Most distributed services (clumps) do not bring back complete results sets from associated Z servers (in order to save time for users). COPAC on-the-fly routines could feasibly be applied to the clumps services. An automated search set up to repeat its query of 17 catalogues in a clump (InforM25) hourly over nearly 3 months returned surprisingly good results; for example, over 90% of responses were received in less than one second, and no servers showed slower response times in periods of traditionally heavy OPAC use (mid-morning to early evening). User behaviour when cross-searching catalogues: the importance to users of a number of on-screen features, including the ability to refine a search and clear indication that a search is processing. The importance to users of information about the availability of an item as well as the holdings data. The impact of search tools such as Google and Amazon on user behaviour and the expectations of more information than is normally available from a library catalogue. The distrust of some librarians interviewed of the data sources in virtual union catalogues, thinking that there was not true interoperability

    An examination of automatic video retrieval technology on access to the contents of an historical video archive

    Get PDF
    Purpose – This paper aims to provide an initial understanding of the constraints that historical video collections pose to video retrieval technology and the potential that online access offers to both archive and users. Design/methodology/approach – A small and unique collection of videos on customs and folklore was used as a case study. Multiple methods were employed to investigate the effectiveness of technology and the modality of user access. Automatic keyframe extraction was tested on the visual content while the audio stream was used for automatic classification of speech and music clips. The user access (search vs browse) was assessed in a controlled user evaluation. A focus group and a survey provided insight on the actual use of the analogue archive. The results of these multiple studies were then compared and integrated (triangulation). Findings – The amateur material challenged automatic techniques for video and audio indexing, thus suggesting that the technology must be tested against the material before deciding on a digitisation strategy. Two user interaction modalities, browsing vs searching, were tested in a user evaluation. Results show users preferred searching, but browsing becomes essential when the search engine fails in matching query and indexed words. Browsing was also valued for serendipitous discovery; however the organisation of the archive was judged cryptic and therefore of limited use. This indicates that the categorisation of an online archive should be thought of in terms of users who might not understand the current classification. The focus group and the survey showed clearly the advantage of online access even when the quality of the video surrogate is poor. The evidence gathered suggests that the creation of a digital version of a video archive requires a rethinking of the collection in terms of the new medium: a new archive should be specially designed to exploit the potential that the digital medium offers. Similarly, users' needs have to be considered before designing the digital library interface, as needs are likely to be different from those imagined. Originality/value – This paper is the first attempt to understand the advantages offered and limitations held by video retrieval technology for small video archives like those often found in special collections

    Compressed Text Indexes:From Theory to Practice!

    Full text link
    A compressed full-text self-index represents a text in a compressed form and still answers queries efficiently. This technology represents a breakthrough over the text indexing techniques of the previous decade, whose indexes required several times the size of the text. Although it is relatively new, this technology has matured up to a point where theoretical research is giving way to practical developments. Nonetheless this requires significant programming skills, a deep engineering effort, and a strong algorithmic background to dig into the research results. To date only isolated implementations and focused comparisons of compressed indexes have been reported, and they missed a common API, which prevented their re-use or deployment within other applications. The goal of this paper is to fill this gap. First, we present the existing implementations of compressed indexes from a practitioner's point of view. Second, we introduce the Pizza&Chili site, which offers tuned implementations and a standardized API for the most successful compressed full-text self-indexes, together with effective testbeds and scripts for their automatic validation and test. Third, we show the results of our extensive experiments on these codes with the aim of demonstrating the practical relevance of this novel and exciting technology

    Toward Entity-Aware Search

    Get PDF
    As the Web has evolved into a data-rich repository, with the standard "page view," current search engines are becoming increasingly inadequate for a wide range of query tasks. While we often search for various data "entities" (e.g., phone number, paper PDF, date), today's engines only take us indirectly to pages. In my Ph.D. study, we focus on a novel type of Web search that is aware of data entities inside pages, a significant departure from traditional document retrieval. We study the various essential aspects of supporting entity-aware Web search. To begin with, we tackle the core challenge of ranking entities, by distilling its underlying conceptual model Impression Model and developing a probabilistic ranking framework, EntityRank, that is able to seamlessly integrate both local and global information in ranking. We also report a prototype system built to show the initial promise of the proposal. Then, we aim at distilling and abstracting the essential computation requirements of entity search. From the dual views of reasoning--entity as input and entity as output, we propose a dual-inversion framework, with two indexing and partition schemes, towards efficient and scalable query processing. Further, to recognize more entity instances, we study the problem of entity synonym discovery through mining query log data. The results we obtained so far have shown clear promise of entity-aware search, in its usefulness, effectiveness, efficiency and scalability

    Delivering the Maori-language newspapers on the Internet

    Get PDF
    Although any collection of historical newspapers provides a particularly rich and valuable record of events and social and political commentary, the content tends to be difficult to access and extremely time-consuming to browse or search. The advent of digital libraries has meant that for electronically stored text, full-text searching is now a tool readily available for researchers, or indeed anyone wishing to have asscess to specific information in text. Text in this form can be readily distributed via CD-ROM or the Internet, with a significant impact on accessibility over traditional microfiche or hard-copy distribution. For the majority of text being generated de nouveau, availability in electronic form is standard, and hence the increasing use of full-text search facilities. However, for legacy text available only in printed form, the provision of these electronic search tools is dependent on the prior electronic capture of digital facsimile images of the printed text, followed by the conversion of these images to electronic text through the process of optical character recognition (OCR). This article describes a project undertaken at the University of Waikato over the period 1999 to 2001 to produce a full-text searchable version of the Niupepa or Maori- language newspaper collection for delivery over the Internet

    The Freshness of Web search engines’ databases

    Get PDF
    This study measures the frequency in which search engines update their indices. Therefore, 38 websites that are updated on a daily basis were analysed within a time-span of six weeks. The analysed search engines were Google, Yahoo and MSN. We find that Google performs best overall with the most pages updated on a daily basis, but only MSN is able to update all pages within a time-span of less than 20 days. Both other engines have outliers that are quite older. In terms of indexing patterns, we find different approaches at the different engines: While MSN shows clear update patterns, Google shows some outliers and the update process of the Yahoo index seems to be quite chaotic. Implications are that the quality of different search engine indices varies and not only one engine should be used when searching for current content
    • 

    corecore