8,069 research outputs found
Content-Aware DataGuides for Indexing Large Collections of XML Documents
XML is well-suited for modelling structured data with
textual content. However, most indexing approaches perform
structure and content matching independently, combining
the retrieved path and keyword occurrences in a third
step. This paper shows that retrieval in XML documents can
be accelerated significantly by processing text and structure
simultaneously during all retrieval phases. To this end,
the Content-Aware DataGuide (CADG) enhances the wellknown
DataGuide with (1) simultaneous keyword and path
matching and (2) a precomputed content/structure join. Extensive
experiments prove the CADG to be 50-90% faster
than the DataGuide for various sorts of query and document,
including difficult cases such as poorly structured
queries and recursive document paths. A new query classification
scheme identifies precise query characteristics with
a predominant influence on the performance of the individual
indices. The experiments show that the CADG is applicable
to many real-world applications, in particular large
collections of heterogeneously structured XML documents
File-based storage of Digital Objects and constituent datastreams: XMLtapes and Internet Archive ARC files
This paper introduces the write-once/read-many XMLtape/ARC storage approach
for Digital Objects and their constituent datastreams. The approach combines
two interconnected file-based storage mechanisms that are made accessible in a
protocol-based manner. First, XML-based representations of multiple Digital
Objects are concatenated into a single file named an XMLtape. An XMLtape is a
valid XML file; its format definition is independent of the choice of the
XML-based complex object format by which Digital Objects are represented. The
creation of indexes for both the identifier and the creation datetime of the
XML-based representation of the Digital Objects facilitates OAI-PMH-based
access to Digital Objects stored in an XMLtape. Second, ARC files, as
introduced by the Internet Archive, are used to contain the constituent
datastreams of the Digital Objects in a concatenated manner. An index for the
identifier of the datastream facilitates OpenURL-based access to an ARC file.
The interconnection between XMLtapes and ARC files is provided by conveying the
identifiers of ARC files associated with an XMLtape as administrative
information in the XMLtape, and by including OpenURL references to constituent
datastreams of a Digital Object in the XML-based representation of that Digital
Object.Comment: 12 pages, 1 figures (camera-ready copy for ECDL 2005
Innovative Evaluation System – IESM: An Architecture for the Database Management System for Mobile Application
As the mobile applications are constantly facing a rapid development in the recent years especially in the academic environment such as student response system [1-8] used in universities and other educational institutions; there has not been reported an effective and scalable Database Management System to support fast and reliable data storage and retrieval. This paper presents Database Management Architecture for an Innovative Evaluation System based on Mobile Learning Applications. The need for a relatively stable, independent and extensible data model for faster data storage and retrieval is analyzed and investigated. It concludes by emphasizing further investigation for high throughput so as to support multimedia data such as video clips, images and documents
Extending Sitemaps for ResourceSync
The documents used in the ResourceSync synchronization framework are based on
the widely adopted document format defined by the Sitemap protocol. In order to
address requirements of the framework, extensions to the Sitemap format were
necessary. This short paper describes the concerns we had about introducing
such extensions, the tests we did to evaluate their validity, and aspects of
the framework to address them.Comment: 4 pages, 6 listings, accepted at JCDL 201
Organizing the Internet
This paper examines XML and its relationships with SGML (Standardized General Markup Language) and HTML (HyperText Markup Language). It examines the importance of metatags and the XML Document Type Definition (DTD) and proposed alternatives. It looks at the differences between the two types of XML data: “valid” and “well-formed” documents
Storage Solutions for Big Data Systems: A Qualitative Study and Comparison
Big data systems development is full of challenges in view of the variety of
application areas and domains that this technology promises to serve.
Typically, fundamental design decisions involved in big data systems design
include choosing appropriate storage and computing infrastructures. In this age
of heterogeneous systems that integrate different technologies for optimized
solution to a specific real world problem, big data system are not an exception
to any such rule. As far as the storage aspect of any big data system is
concerned, the primary facet in this regard is a storage infrastructure and
NoSQL seems to be the right technology that fulfills its requirements. However,
every big data application has variable data characteristics and thus, the
corresponding data fits into a different data model. This paper presents
feature and use case analysis and comparison of the four main data models
namely document oriented, key value, graph and wide column. Moreover, a feature
analysis of 80 NoSQL solutions has been provided, elaborating on the criteria
and points that a developer must consider while making a possible choice.
Typically, big data storage needs to communicate with the execution engine and
other processing and visualization technologies to create a comprehensive
solution. This brings forth second facet of big data storage, big data file
formats, into picture. The second half of the research paper compares the
advantages, shortcomings and possible use cases of available big data file
formats for Hadoop, which is the foundation for most big data computing
technologies. Decentralized storage and blockchain are seen as the next
generation of big data storage and its challenges and future prospects have
also been discussed
The DIGMAP geo-temporal web gazetteer service
This paper presents the DIGMAP geo-temporal Web gazetteer service, a system providing access to names of places, historical periods, and associated geo-temporal information. Within the DIGMAP project, this gazetteer serves as the unified repository of geographic and temporal information, assisting in the recognition and disambiguation of geo-temporal expressions over text, as well as in resource searching and indexing. We describe the data integration methodology, the handling of temporal information and some of the applications that use the gazetteer. Initial evaluation results show that the proposed system can adequately support several tasks related to geo-temporal information extraction and retrieval
CONTEXT-BASED AUTOSUGGEST ON GRAPH DATA
Autosuggest is an important feature in any search applications. Currently, most applications only suggest a single term based on how frequent that term appears in the indexed documents or how often it is searched upon. These approaches might not provide the most relevant suggestions because users often enter a series of related query terms to answer a question they have in mind. In this project, we implemented the Smart Solr Suggester plugin using a context-based approach that takes into account the relationships among search keywords. In particular, we used the keywords that the user has chosen so far in the search text box as the context to autosuggest their next incomplete keyword. This context-based approach uses the relationships between entities in the graph data that the user is searching on and therefore would provide more meaningful suggestions
DSpace How-To Guide: Tips and tricks for managing common DSpace chores
PDF fileThis short booklet is intended to introduce the commonest non-obvious customization related tasks for newcomers to DSpace administration. It has been written against the current stable version 1.3.2 of DSpace.
We have tried to include instructions for different operating systems as required;
most customizations, however, work identically cross-platform
- …