10,673 research outputs found
Investigation into Indexing XML Data Techniques
The rapid development of XML technology improves the WWW, since the XML data has many advantages and has become a common technology for transferring data cross the internet. Therefore, the objective of this research is to investigate and study the XML indexing techniques in terms of their structures. The main goal of this investigation is to identify the main limitations of these techniques and any other open issues.
Furthermore, this research considers most common XML indexing techniques and performs a comparison between them. Subsequently, this work makes an argument to find out these limitations. To conclude, the main problem of all the XML indexing techniques is the trade-off between the
size and the efficiency of the indexes. So, all the indexes become large in order to perform well, and none of them is suitable for all users’ requirements. However, each one of these techniques has some advantages in somehow
Enhancing Content-And-Structure Information Retrieval using a Native XML Database
Three approaches to content-and-structure XML retrieval are analysed in this
paper: first by using Zettair, a full-text information retrieval system; second
by using eXist, a native XML database, and third by using a hybrid XML
retrieval system that uses eXist to produce the final answers from likely
relevant articles retrieved by Zettair. INEX 2003 content-and-structure topics
can be classified in two categories: the first retrieving full articles as
final answers, and the second retrieving more specific elements within articles
as final answers. We show that for both topic categories our initial hybrid
system improves the retrieval effectiveness of a native XML database. For
ranking the final answer elements, we propose and evaluate a novel retrieval
model that utilises the structural relationships between the answer elements of
a native XML database and retrieves Coherent Retrieval Elements. The final
results of our experiments show that when the XML retrieval task focusses on
highly relevant elements our hybrid XML retrieval system with the Coherent
Retrieval Elements module is 1.8 times more effective than Zettair and 3 times
more effective than eXist, and yields an effective content-and-structure XML
retrieval
Hybrid XML Retrieval: Combining Information Retrieval and a Native XML Database
This paper investigates the impact of three approaches to XML retrieval:
using Zettair, a full-text information retrieval system; using eXist, a native
XML database; and using a hybrid system that takes full article answers from
Zettair and uses eXist to extract elements from those articles. For the
content-only topics, we undertake a preliminary analysis of the INEX 2003
relevance assessments in order to identify the types of highly relevant
document components. Further analysis identifies two complementary sub-cases of
relevance assessments ("General" and "Specific") and two categories of topics
("Broad" and "Narrow"). We develop a novel retrieval module that for a
content-only topic utilises the information from the resulting answer list of a
native XML database and dynamically determines the preferable units of
retrieval, which we call "Coherent Retrieval Elements". The results of our
experiments show that -- when each of the three systems is evaluated against
different retrieval scenarios (such as different cases of relevance
assessments, different topic categories and different choices of evaluation
metrics) -- the XML retrieval systems exhibit varying behaviour and the best
performance can be reached for different values of the retrieval parameters. In
the case of INEX 2003 relevance assessments for the content-only topics, our
newly developed hybrid XML retrieval system is substantially more effective
than either Zettair or eXist, and yields a robust and a very effective XML
retrieval.Comment: Postprint version. The editor version can be accessed through the DO
Information extraction from multimedia web documents: an open-source platform and testbed
The LivingKnowledge project aimed to enhance the current state of the art in search, retrieval and knowledge management on the web by advancing the use of sentiment and opinion analysis within multimedia applications. To achieve this aim, a diverse set of novel and complementary analysis techniques have been integrated into a single, but extensible software platform on which such applications can be built. The platform combines state-of-the-art techniques for extracting facts, opinions and sentiment from multimedia documents, and unlike earlier platforms, it exploits both visual and textual techniques to support multimedia information retrieval. Foreseeing the usefulness of this software in the wider community, the platform has been made generally available as an open-source project. This paper describes the platform design, gives an overview of the analysis algorithms integrated into the system and describes two applications that utilise the system for multimedia information retrieval
The NASA Astrophysics Data System: Data Holdings
Since its inception in 1993, the ADS Abstract Service has become an
indispensable research tool for astronomers and astrophysicists worldwide. In
those seven years, much effort has been directed toward improving both the
quantity and the quality of references in the database. From the original
database of approximately 160,000 astronomy abstracts, our dataset has grown
almost tenfold to approximately 1.5 million references covering astronomy,
astrophysics, planetary sciences, physics, optics, and engineering. We collect
and standardize data from approximately 200 journals and present the resulting
information in a uniform, coherent manner. With the cooperation of journal
publishers worldwide, we have been able to place scans of full journal articles
on-line back to the first volumes of many astronomical journals, and we are
able to link to current version of articles, abstracts, and datasets for
essentially all of the current astronomy literature. The trend toward
electronic publishing in the field, the use of electronic submission of
abstracts for journal articles and conference proceedings, and the increasingly
prominent use of the World Wide Web to disseminate information have enabled the
ADS to build a database unparalleled in other disciplines.
The ADS can be accessed at http://adswww.harvard.eduComment: 24 pages, 1 figure, 6 tables, 3 appendice
DCU and ISI@INEX 2010: Ad-hoc and data-centric tracks
We describe the participation of Dublin City University (DCU)and the Indian Statistical Institute (ISI) in INEX 2010. The main contributions of this paper are: i) a simplified version of Hierarchical Language Model (HLM) which involves scoring XML elements with a combined probability of generating the given query from itself and the top level article node, is shown to outperform the baselines of Language Model (LM) and Vector Space Model (VSM) scoring of XML elements; ii) the Expectation Maximization (EM) feedback in LM is shown to be the most effective on the domain specic collection of IMDB; iii) automated removal of sentences indicating aspects of irrelevance from the narratives
of INEX ad-hoc topics is shown to improve retrieval eectiveness
Fast, linked, and open – the future of taxonomic publishing for plants: launching the journal PhytoKeys
The paper describes the focus, scope and the rationale of PhytoKeys, a newly established, peer-reviewed, open-access journal in plant systematics. PhytoKeys is launched to respond to four main challenges of our time: (1) Appearance of electronic publications as amendments or even alternatives to paper publications; (2) Open Access (OA) as a new publishing model; (3) Linkage of electronic registers, indices and aggregators that summarize information on biological species through taxonomic names or their persistent identifiers (Globally Unique Identifiers or GUIDs; currently Life Science Identifiers or LSIDs); (4) Web 2.0 technologies that permit the semantic markup of, and semantic enhancements to, published biological texts. The journal will pursue cutting-edge technologies in publication and dissemination of biodiversity information while strictly following the requirements of the current International Code of Botanical Nomenclature (ICBN)
- …