2 research outputs found
Full-Text and Structural Indexing of XML Documents on B +-Tree ∗
SUMMARY XML query processing is one of the most active areas of database research. Although the main focus of past research has been the processing of structural XML queries, there are growing demands for a fulltext search for XML documents. In this paper, we propose XICS (XML Indices for Content and Structural search), which aims at high-speed processing of both full-text and structural queries in XML documents. An important design principle of our indices is the use of a B +-tree. To represent the structural information of XML trees, each node in the XML tree is labeled with an identifier. The identifier contains an integer number representing the path information from the root node. XICS consist of two types of indices, the COB-tree (COntent B +-tree) and the STB-tree (STructure B +-tree). The search keys of the COB-tree are a pair of text fragments in the XML document and the identifiers of the leaf nodes that contain the text, whereas the search keys of the STB-tree are the node identifiers. By using a node identifier in the search keys, we can retrieve only the entries that match the path information in the query. The STB-tree can filter nodes using structural conditions in queries, while the COB-tree can filter nodes using text conditions. We have implemented a COB-tree and an STB-tree using GiST and examined index size and query processing time. Our experimental results show the efficiency of XICS in query processing. key words: XML query processing, full-text search, B +-tree, node labeling scheme 1
Keyword-Based Querying for the Social Semantic Web
Enabling non-experts to publish data on the web is an important
achievement of the social web and one of the primary goals of the social
semantic web. Making the data easily accessible in turn has received only
little attention, which is problematic from the point of view of
incentives: users are likely to be less motivated to participate in the
creation of content if the use of this content is mostly reserved to
experts.
Querying in semantic wikis, for example, is typically realized in terms of
full text search over the textual content and a web query language such as
SPARQL for the annotations. This approach has two shortcomings that limit
the extent to which data can be leveraged by users: combined queries over
content and annotations are not possible, and users either are restricted
to expressing their query intent using simple but vague keyword queries or
have to learn a complex web query language.
The work presented in this dissertation investigates a more suitable form
of querying for semantic wikis that consolidates two seemingly conflicting
characteristics of query languages, ease of use and expressiveness. This
work was carried out in the context of the semantic wiki KiWi, but the
underlying ideas apply more generally to the social semantic and social
web.
We begin by defining a simple modular conceptual model for the KiWi wiki
that enables rich and expressive knowledge representation. A component of
this model are structured tags, an annotation formalism that is simple yet
flexible and expressive, and aims at bridging the gap between atomic tags
and RDF. The viability of the approach is confirmed by a user study, which
finds that structured tags are suitable for quickly annotating evolving
knowledge and are perceived well by the users.
The main contribution of this dissertation is the design and
implementation of KWQL, a query language for semantic wikis. KWQL combines
keyword search and web querying to enable querying that scales with user
experience and information need: basic queries are easy to express; as the
search criteria become more complex, more expertise is needed to formulate
the corresponding query. A novel aspect of KWQL is that it combines both
paradigms in a bottom-up fashion. It treats neither of the two as an
extension to the other, but instead integrates both in one framework. The
language allows for rich combined queries of full text, metadata, document
structure, and informal to formal semantic annotations. KWilt, the KWQL
query engine, provides the full expressive power of first-order queries,
but at the same time can evaluate basic queries at almost the speed of the
underlying search engine. KWQL is accompanied by the visual query language
visKWQL, and an editor that displays both the textual and visual form of
the current query and reflects changes to either representation in the
other. A user study shows that participants quickly learn to construct
KWQL and visKWQL queries, even when given only a short introduction.
KWQL allows users to sift the wealth of structure and annotations in an
information system for relevant data. If relevant data constitutes a
substantial fraction of all data, ranking becomes important. To this end,
we propose PEST, a novel ranking method that propagates relevance among
structurally related or similarly annotated data. Extensive experiments,
including a user study on a real life wiki, show that pest improves the
quality of the ranking over a range of existing ranking approaches