Cataloged from PDF version of article.The advent of the Web has raised new searching and querying problems. Keyword
matching based querying techniques that have been widely used by search
engines, return thousands of Web documents for a single query, and most of these
documents are generally unrelated to the users’ information needs. Towards the
goal of improving the information search needs of Web users, a recent promising
approach is to index the Web by using metadata and annotations.
In this thesis, we model and query Web-based information resources using
metadata for improved Web searching capabilities. Employing metadata for
querying the Web increases the precision of the query outputs by returning semantically
more meaningful results. Our Web data model, named “Web information
space model”, consists of Web-based information resources (HTML/XML documents
on the Web), expert advice repositories (domain-expert-specified metadata
for information resources), and personalized information about users (captured
as user profiles that indicate users’ preferences about experts as well as users’
knowledge about topics). Expert advice is specified using topics and relationships
among topics (i.e., metalinks), along the lines of recently proposed topic maps
standard. Topics and metalinks constitute metadata that describe the contents of
the underlying Web information resources. Experts assign scores to topics, metalinks,
and information resources to represent the “importance” of them. User
profiles store users’ preferences and navigational history information about the
information resources that the user visits. User preferences, knowledge level on
topics, and history information are used for personalizing the Web search, and
improving the precision of the results returned to the user.
We store expert advices and user profiles in an object relational database
iv
v
management system, and extend the SQL for efficient querying of Web-based information
resources through the Web information space model. SQL extensions
include the clauses for propagating input importance scores to output tuples, the
clause that specifies query stopping condition, and new operators (i.e., text similarity
based selection, text similarity based join, and topic closure). Importance
score propagation and query stopping condition allow ranking of query outputs,
and limiting the output size. Text similarity based operators and topic closure
operator support sophisticated querying facilities. We develop a new algebra
called Sideway Value generating Algebra (SVA) to process these SQL extensions.
We also propose evaluation algorithms for the text similarity based SVA directional
join operator, and report experimental results on the performance of the
operator. We demonstrate experimentally the effectiveness of metadata-based
personalized Web search through SQL extensions over the Web information space
model against keyword matching based Web search techniques.Özel, Selma AyşePh.D