5 research outputs found
Metadata-based and personalized web querying
Cataloged from PDF version of article.The advent of the Web has raised new searching and querying problems. Keyword
matching based querying techniques that have been widely used by search
engines, return thousands of Web documents for a single query, and most of these
documents are generally unrelated to the users’ information needs. Towards the
goal of improving the information search needs of Web users, a recent promising
approach is to index the Web by using metadata and annotations.
In this thesis, we model and query Web-based information resources using
metadata for improved Web searching capabilities. Employing metadata for
querying the Web increases the precision of the query outputs by returning semantically
more meaningful results. Our Web data model, named “Web information
space model”, consists of Web-based information resources (HTML/XML documents
on the Web), expert advice repositories (domain-expert-specified metadata
for information resources), and personalized information about users (captured
as user profiles that indicate users’ preferences about experts as well as users’
knowledge about topics). Expert advice is specified using topics and relationships
among topics (i.e., metalinks), along the lines of recently proposed topic maps
standard. Topics and metalinks constitute metadata that describe the contents of
the underlying Web information resources. Experts assign scores to topics, metalinks,
and information resources to represent the “importance” of them. User
profiles store users’ preferences and navigational history information about the
information resources that the user visits. User preferences, knowledge level on
topics, and history information are used for personalizing the Web search, and
improving the precision of the results returned to the user.
We store expert advices and user profiles in an object relational database
iv
v
management system, and extend the SQL for efficient querying of Web-based information
resources through the Web information space model. SQL extensions
include the clauses for propagating input importance scores to output tuples, the
clause that specifies query stopping condition, and new operators (i.e., text similarity
based selection, text similarity based join, and topic closure). Importance
score propagation and query stopping condition allow ranking of query outputs,
and limiting the output size. Text similarity based operators and topic closure
operator support sophisticated querying facilities. We develop a new algebra
called Sideway Value generating Algebra (SVA) to process these SQL extensions.
We also propose evaluation algorithms for the text similarity based SVA directional
join operator, and report experimental results on the performance of the
operator. We demonstrate experimentally the effectiveness of metadata-based
personalized Web search through SQL extensions over the Web information space
model against keyword matching based Web search techniques.Özel, Selma AyşePh.D
Effective early termination techniques for text similarity join operator
Text similarity join operator joins two relations if their join attributes are textually similar to each other, and it has a variety of application domains including integration and querying of data from heterogeneous resources; cleansing of data; and mining of data. Although, the text similarity join operator is widely used, its processing is expensive due to the huge number of similarity computations performed. In this paper, we incorporate some short cut evaluation techniques from the Information Retrieval domain, namely Harman, quit, continue, and maximal similarity filter heuristics, into the previously proposed text similarity join algorithms to reduce the amount of similarity computations needed during the join operation. We experimentally evaluate the original and the heuristic based similarity join algorithms using real data obtained from the DBLP Bibliography database, and observe performance improvements with continue and maximal similarity filter heuristics. © Springer-Verlag Berlin Heidelberg 2005
Effective early termination techniques for text similarity join operator
Bu çalışma, 26-28 Ekim 2005 tarihleri arasında İstanbul[Türkiye]'da düzenlenen 20. International Symposium on Computer and Information Sciences'da bildiri olarak sunulmuştur.Text similarity join operator joins two relations if their join attributes are textually similar to each other, and it has a variety of application domains including integration and querying of data from heterogeneous resources; cleansing of data; and mining of data. Although, the text similarity join operator is widely used, its processing is expensive due to the huge number of similarity computations performed. In this paper, we incorporate some short cut evaluation techniques from the Information Retrieval domain, namely Harman, quit, continue, and maximal similarity filter heuristics, into the previously proposed text similarity join algorithms to reduce the amount of similarity computations needed during the join operation. We experimentally evaluate the original and the heuristic based similarity join algorithms using real data obtained from the DBLP Bibliography database, and observe performance improvements with continue and maximal similarity filter heuristics.Inst Elec & Elect Engineers, Turkey SectBoğaziçi Üniversites
Metadata-Based and Personalized Web Querying
METADATA-BASED AND PERSONALIZED WEB QUERYING Ozel Ph.D. in Computer Engineering Supervisor: Prof. Dr