13,716 research outputs found
Using semantic indexing to improve searching performance in web archives
The sheer volume of electronic documents being published on the Web can be overwhelming for users if the searching aspect is not properly addressed. This problem is particularly acute inside archives and repositories containing large collections of web resources or, more precisely, web pages and other web objects. Using the existing search capabilities in web archives, results can be compromised because of the size of data, content heterogeneity and changes in scientific terminologies and meanings. During the course of this research, we will explore whether semantic web technologies, particularly ontology-based annotation and retrieval, could improve precision in search results in multi-disciplinary web archives
Towards improved performance and interoperability in distributed and physical union catalogues
Purpose of this paper: This paper details research undertaken to determine the key differences in the performance of certain centralised (physical) and distributed (virtual) bibliographic catalogue services, and to suggest strategies for improving interoperability and performance in, and between, physical and virtual models. Design/methodology/approach: Methodically defined searches of a centralised catalogue service and selected distributed catalogues were conducted using the Z39.50 information retrieval protocol, allowing search types to be semantically defined. The methodology also entailed the use of two workshops comprising systems librarians and cataloguers to inform suggested strategies for improving performance and interoperability within both environments. Findings: Technical interoperability was permitted easily between centralised and distributed models, however the various individual configurations permitted only limited semantic interoperability. Significant prescription in cataloguing and indexing guidelines, greater participation in the Program for Collaborative Cataloging (PCC), consideration of future 'FRBR' migration, and greater disclosure to end users are some of the suggested strategies to improve performance and semantic interoperability. Practical implications: This paper informs the LIS research community and union catalogue administrators, but also has numerous practical implications for those establishing distributed systems based on Z39.50 and SRW, as well as those establishing centralised systems. What is original/value of the paper?: The paper moves the discussion of Z39.50 based systems away from anecdotal evidence and provides recommendations based on testing and is intimately informed by the UK cataloguing and systems librarian community
Content-Aware DataGuides for Indexing Large Collections of XML Documents
XML is well-suited for modelling structured data with
textual content. However, most indexing approaches perform
structure and content matching independently, combining
the retrieved path and keyword occurrences in a third
step. This paper shows that retrieval in XML documents can
be accelerated significantly by processing text and structure
simultaneously during all retrieval phases. To this end,
the Content-Aware DataGuide (CADG) enhances the wellknown
DataGuide with (1) simultaneous keyword and path
matching and (2) a precomputed content/structure join. Extensive
experiments prove the CADG to be 50-90% faster
than the DataGuide for various sorts of query and document,
including difficult cases such as poorly structured
queries and recursive document paths. A new query classification
scheme identifies precise query characteristics with
a predominant influence on the performance of the individual
indices. The experiments show that the CADG is applicable
to many real-world applications, in particular large
collections of heterogeneously structured XML documents
Bridging the Semantic Gap in Multimedia Information Retrieval: Top-down and Bottom-up approaches
Semantic representation of multimedia information is vital for enabling the kind of multimedia search capabilities that professional searchers require. Manual annotation is often not possible because of the shear scale of the multimedia information that needs indexing. This paper explores the ways in which we are using both top-down, ontologically driven approaches and bottom-up, automatic-annotation approaches to provide retrieval facilities to users. We also discuss many of the current techniques that we are investigating to combine these top-down and bottom-up approaches
- …