Search CORE

13,716 research outputs found

Using semantic indexing to improve searching performance in web archives

Author: Khan Arshad
Martin David J.
Tiropanis Thanassis
Publication venue
Publication date: 28/01/2013
Field of study

The sheer volume of electronic documents being published on the Web can be overwhelming for users if the searching aspect is not properly addressed. This problem is particularly acute inside archives and repositories containing large collections of web resources or, more precisely, web pages and other web objects. Using the existing search capabilities in web archives, results can be compromised because of the size of data, content heterogeneity and changes in scientific terminologies and meanings. During the course of this research, we will explore whether semantic web technologies, particularly ontology-based annotation and retrieval, could improve precision in search results in multi-disciplinary web archives

Southampton (e-Prints Soton)

National Centre for Research Methods: NCRM EPrints Repository

Towards improved performance and interoperability in distributed and physical union catalogues

Author: Macgregor George
Nicolaides Fraser
Publication venue: 'Emerald'
Publication date: 01/01/2004
Field of study

Purpose of this paper: This paper details research undertaken to determine the key differences in the performance of certain centralised (physical) and distributed (virtual) bibliographic catalogue services, and to suggest strategies for improving interoperability and performance in, and between, physical and virtual models. Design/methodology/approach: Methodically defined searches of a centralised catalogue service and selected distributed catalogues were conducted using the Z39.50 information retrieval protocol, allowing search types to be semantically defined. The methodology also entailed the use of two workshops comprising systems librarians and cataloguers to inform suggested strategies for improving performance and interoperability within both environments. Findings: Technical interoperability was permitted easily between centralised and distributed models, however the various individual configurations permitted only limited semantic interoperability. Significant prescription in cataloguing and indexing guidelines, greater participation in the Program for Collaborative Cataloging (PCC), consideration of future 'FRBR' migration, and greater disclosure to end users are some of the suggested strategies to improve performance and semantic interoperability. Practical implications: This paper informs the LIS research community and union catalogue administrators, but also has numerous practical implications for those establishing distributed systems based on Z39.50 and SRW, as well as those establishing centralised systems. What is original/value of the paper?: The paper moves the discussion of Z39.50 based systems away from anecdotal evidence and provides recommendations based on testing and is intimately informed by the UK cataloguing and systems librarian community

Crossref

University of Strathclyde Institutional Repository

Content-Aware DataGuides for Indexing Large Collections of XML Documents

Author: Bry François
Meuss Holger
Schulz Klaus U.
Weigel Felix
Publication venue
Publication date: 01/01/2003
Field of study

XML is well-suited for modelling structured data with textual content. However, most indexing approaches perform structure and content matching independently, combining the retrieved path and keyword occurrences in a third step. This paper shows that retrieval in XML documents can be accelerated significantly by processing text and structure simultaneously during all retrieval phases. To this end, the Content-Aware DataGuide (CADG) enhances the wellknown DataGuide with (1) simultaneous keyword and path matching and (2) a precomputed content/structure join. Extensive experiments prove the CADG to be 50-90% faster than the DataGuide for various sorts of query and document, including difficult cases such as poorly structured queries and recursive document paths. A new query classification scheme identifies precise query characteristics with a predominant influence on the performance of the individual indices. The experiments show that the CADG is applicable to many real-world applications, in particular large collections of heterogeneously structured XML documents

CiteSeerX

Open Access LMU

Bridging the Semantic Gap in Multimedia Information Retrieval: Top-down and Bottom-up approaches

Author: Enser Peter G.B.
Hare Jonathon S.
Lewis Paul H.
Martinez Kirk
Sandom Christine J.
Sinclair Patrick A. S.
Publication venue
Publication date: 01/01/2006
Field of study

Semantic representation of multimedia information is vital for enabling the kind of multimedia search capabilities that professional searchers require. Manual annotation is often not possible because of the shear scale of the multimedia information that needs indexing. This paper explores the ways in which we are using both top-down, ontologically driven approaches and bottom-up, automatic-annotation approaches to provide retrieval facilities to users. We also discuss many of the current techniques that we are investigating to combine these top-down and bottom-up approaches

CiteSeerX

Southampton (e-Prints Soton)