9,020 research outputs found
Hybrid XML Retrieval: Combining Information Retrieval and a Native XML Database
This paper investigates the impact of three approaches to XML retrieval:
using Zettair, a full-text information retrieval system; using eXist, a native
XML database; and using a hybrid system that takes full article answers from
Zettair and uses eXist to extract elements from those articles. For the
content-only topics, we undertake a preliminary analysis of the INEX 2003
relevance assessments in order to identify the types of highly relevant
document components. Further analysis identifies two complementary sub-cases of
relevance assessments ("General" and "Specific") and two categories of topics
("Broad" and "Narrow"). We develop a novel retrieval module that for a
content-only topic utilises the information from the resulting answer list of a
native XML database and dynamically determines the preferable units of
retrieval, which we call "Coherent Retrieval Elements". The results of our
experiments show that -- when each of the three systems is evaluated against
different retrieval scenarios (such as different cases of relevance
assessments, different topic categories and different choices of evaluation
metrics) -- the XML retrieval systems exhibit varying behaviour and the best
performance can be reached for different values of the retrieval parameters. In
the case of INEX 2003 relevance assessments for the content-only topics, our
newly developed hybrid XML retrieval system is substantially more effective
than either Zettair or eXist, and yields a robust and a very effective XML
retrieval.Comment: Postprint version. The editor version can be accessed through the DO
PFTijah: text search in an XML database system
This paper introduces the PFTijah system, a text search system that is integrated with an XML/XQuery database management system. We present examples of its use, we explain some of the system internals, and discuss plans for future work. PFTijah is part of the open source release of MonetDB/XQuery
Queensland University of Technology at TREC 2005
The Information Retrieval and Web Intelligence (IR-WI) research group is a research team at the Faculty of Information Technology, QUT, Brisbane, Australia. The IR-WI group participated in the Terabyte and Robust track at TREC 2005, both for the first time. For the Robust track we applied our existing information retrieval system that was originally designed for use with structured (XML) retrieval to the domain of document retrieval. For the Terabyte track we experimented with an open source IR system, Zettair and performed two types of experiments. First, we compared Zettairâs performance on both a high-powered supercomputer and a distributed system across seven midrange personal computers. Second, we compared Zettairâs performance when a standard TREC title is used, compared with a natural language query, and a query expanded with synonyms. We compare the systems both in terms of efficiency and retrieval performance. Our results indicate that the distributed system is faster than the supercomputer, while slightly decreasing retrieval performance, and that natural language queries also slightly decrease retrieval performance, while our query expansion technique significantly decreased performance
HaIRST: Harvesting Institutional Resources in Scotland Testbed. Final Project Report
The HaIRST project conducted research into the design, implementation and deployment of a pilot service for UK-wide access of autonomously created institutional resources in Scotland, the aim being to investigate and advise on some of the technical, cultural, and organisational requirements associated with the deposit, disclosure, and discovery of institutional resources in the JISC Information Environment. The project involved a consortium of Scottish higher and further education institutions, with significant assistance from the Scottish Library and Information Council. The project investigated the use of technologies based on the Open Archives Initiative (OAI), including the implementation of OAI-compatible repositories for metadata which describe and link to institutional digital resources, the use of the OAI protocol for metadata harvesting (OAI-PMH) to automatically copy the metadata from multiple repositories to a central repository, and the creation of a service to search and identify resources described in the central repository. An important aim of the project was to identify issues of metadata interoperability arising from the requirements of individual institutional repositories and their impact on services based on the aggregation of metadata through harvesting. The project also sought to investigate issues in using these technologies for a wide range of resources including learning, teaching and administrative materials as well as the research and scholarly communication materials considered by many of the other projects in the JISC Focus on Access to Institutional Resources (FAIR) Programme, of which HaIRST was a part. The project tested and implemented a number of open source software packages supporting OAI, and was successful in creating a pilot service which provides effective information retrieval of a range of resources created by the project consortium institutions. The pilot service has been extended to cover research and scholarly communication materials produced by other Scottish universities, and administrative materials produced by a non-educational institution in Scotland. It is an effective testbed for further research and development in these areas. The project has worked extensively with a new OAI standard for 'static repositories' which offers a low-barrier, low-cost mechanism for participation in OAI-based consortia by smaller institutions with a low volume of resources. The project identified and successfully tested tools for transforming pre-existing metadata into a format compliant with OAI standards. The project identified and assessed OAI-related documentation in English from around the world, and has produced metadata for retrieving and accessing it. The project created a Web-based advisory service for institutions and consortia. The OAI Scotland Information Service (OAISIS) provides links to related standards, guidance and documentation, and discusses the findings of HaIRST relating to interoperability and the pilot harvesting service. The project found that open source packages relating to OAI can be installed and made to interoperate to create a viable method of sharing institutional resources within a consortium. HaIRST identified issues affecting the interoperability of shared metadata and suggested ways of resolving them to improve the effectiveness and efficiency of shared information retrieval environments based on OAI. The project demonstrated that application of OAI technologies to administrative materials is an effective way for institutions to meet obligations under Freedom of Information legislation
Large scale evaluations of multimedia information retrieval: the TRECVid experience
Information Retrieval is a supporting technique which underpins a broad range of content-based applications including retrieval, filtering, summarisation, browsing, classification, clustering, automatic linking, and others. Multimedia information retrieval (MMIR) represents those applications when applied to multimedia information such as image, video, music, etc. In this presentation and extended abstract we are primarily concerned with MMIR as applied to information in digital video format. We begin with a brief overview of large scale evaluations of IR tasks in areas such as text, image and music, just to illustrate that this phenomenon is not just restricted to MMIR on video. The main contribution, however, is a set of pointers and a summarisation of the work done as part of TRECVid, the annual benchmarking exercise for video retrieval tasks
DCU and ISI@INEX 2010: Ad-hoc and data-centric tracks
We describe the participation of Dublin City University (DCU)and the Indian Statistical Institute (ISI) in INEX 2010. The main contributions of this paper are: i) a simplified version of Hierarchical Language Model (HLM) which involves scoring XML elements with a combined probability of generating the given query from itself and the top level article node, is shown to outperform the baselines of Language Model (LM) and Vector Space Model (VSM) scoring of XML elements; ii) the Expectation Maximization (EM) feedback in LM is shown to be the most effective on the domain specic collection of IMDB; iii) automated removal of sentences indicating aspects of irrelevance from the narratives
of INEX ad-hoc topics is shown to improve retrieval eectiveness
ATLAS: A flexible and extensible architecture for linguistic annotation
We describe a formal model for annotating linguistic artifacts, from which we
derive an application programming interface (API) to a suite of tools for
manipulating these annotations. The abstract logical model provides for a range
of storage formats and promotes the reuse of tools that interact through this
API. We focus first on ``Annotation Graphs,'' a graph model for annotations on
linear signals (such as text and speech) indexed by intervals, for which
efficient database storage and querying techniques are applicable. We note how
a wide range of existing annotated corpora can be mapped to this annotation
graph model. This model is then generalized to encompass a wider variety of
linguistic ``signals,'' including both naturally occuring phenomena (as
recorded in images, video, multi-modal interactions, etc.), as well as the
derived resources that are increasingly important to the engineering of natural
language processing systems (such as word lists, dictionaries, aligned
bilingual corpora, etc.). We conclude with a review of the current efforts
towards implementing key pieces of this architecture.Comment: 8 pages, 9 figure
CHORUS Deliverable 2.1: State of the Art on Multimedia Search Engines
Based on the information provided by European projects and national initiatives related to multimedia search as well as domains experts that participated in the CHORUS Think-thanks and workshops, this document reports on the state of the art related to multimedia content search from, a technical, and socio-economic perspective.
The technical perspective includes an up to date view on content based indexing and retrieval technologies, multimedia search in the context of mobile devices and peer-to-peer networks, and an overview of current evaluation and benchmark inititiatives to measure the performance of multimedia search engines.
From a socio-economic perspective we inventorize the impact and legal consequences of these technical advances and point out future directions of research
Exploiting Query Structure and Document Structure to Improve Document Retrieval Effectiveness
In this paper we present a systematic analysis of document
retrieval using unstructured and structured queries within
the score region algebra (SRA) structured retrieval framework. The behavior of diÂźerent retrieval models, namely
Boolean, tf.idf, GPX, language models, and Okapi, is tested
using the transparent SRA framework in our three-level structured retrieval system called TIJAH. The retrieval models are implemented along four elementary retrieval aspects: element and term selection, element score computation, score combination, and score propagation.
The analysis is performed on a numerous experiments
evaluated on TREC and CLEF collections, using manually
generated unstructured and structured queries. Unstructured queries range from the short title queries to long title
+ description + narrative queries. For generating structured
queries we exploit the knowledge of the document structure
and the content used to semantically describe or classify
documents. We show that such structured information can
be utilized in retrieval engines to give more precise answers to user queries then when using unstructured queries
- âŠ