4,301 research outputs found
Quasi-SLCA based Keyword Query Processing over Probabilistic XML Data
The probabilistic threshold query is one of the most common queries in
uncertain databases, where a result satisfying the query must be also with
probability meeting the threshold requirement. In this paper, we investigate
probabilistic threshold keyword queries (PrTKQ) over XML data, which is not
studied before. We first introduce the notion of quasi-SLCA and use it to
represent results for a PrTKQ with the consideration of possible world
semantics. Then we design a probabilistic inverted (PI) index that can be used
to quickly return the qualified answers and filter out the unqualified ones
based on our proposed lower/upper bounds. After that, we propose two efficient
and comparable algorithms: Baseline Algorithm and PI index-based Algorithm. To
accelerate the performance of algorithms, we also utilize probability density
function. An empirical study using real and synthetic data sets has verified
the effectiveness and the efficiency of our approaches
Search and Result Presentation in Scientific Workflow Repositories
We study the problem of searching a repository of complex hierarchical
workflows whose component modules, both composite and atomic, have been
annotated with keywords. Since keyword search does not use the graph structure
of a workflow, we develop a model of workflows using context-free bag grammars.
We then give efficient polynomial-time algorithms that, given a workflow and a
keyword query, determine whether some execution of the workflow matches the
query. Based on these algorithms we develop a search and ranking solution that
efficiently retrieves the top-k grammars from a repository. Finally, we propose
a novel result presentation method for grammars matching a keyword query, based
on representative parse-trees. The effectiveness of our approach is validated
through an extensive experimental evaluation
CONTEXT-BASED AUTOSUGGEST ON GRAPH DATA
Autosuggest is an important feature in any search applications. Currently, most applications only suggest a single term based on how frequent that term appears in the indexed documents or how often it is searched upon. These approaches might not provide the most relevant suggestions because users often enter a series of related query terms to answer a question they have in mind. In this project, we implemented the Smart Solr Suggester plugin using a context-based approach that takes into account the relationships among search keywords. In particular, we used the keywords that the user has chosen so far in the search text box as the context to autosuggest their next incomplete keyword. This context-based approach uses the relationships between entities in the graph data that the user is searching on and therefore would provide more meaningful suggestions
Distributed Information Retrieval using Keyword Auctions
This report motivates the need for large-scale distributed approaches to information retrieval, and proposes solutions based on keyword auctions
Semantic Modeling of Analytic-based Relationships with Direct Qualification
Successfully modeling state and analytics-based semantic relationships of
documents enhances representation, importance, relevancy, provenience, and
priority of the document. These attributes are the core elements that form the
machine-based knowledge representation for documents. However, modeling
document relationships that can change over time can be inelegant, limited,
complex or overly burdensome for semantic technologies. In this paper, we
present Direct Qualification (DQ), an approach for modeling any semantically
referenced document, concept, or named graph with results from associated
applied analytics. The proposed approach supplements the traditional
subject-object relationships by providing a third leg to the relationship; the
qualification of how and why the relationship exists. To illustrate, we show a
prototype of an event-based system with a realistic use case for applying DQ to
relevancy analytics of PageRank and Hyperlink-Induced Topic Search (HITS).Comment: Proceedings of the 2015 IEEE 9th International Conference on Semantic
Computing (IEEE ICSC 2015
The State-of-the-arts in Focused Search
The continuous influx of various text data on the Web requires search engines to improve their retrieval abilities for more specific information. The need for relevant results to a user’s topic of interest has gone beyond search for domain or type specific documents to more focused result (e.g. document fragments or answers to a query). The introduction of XML provides a format standard for data representation, storage, and exchange. It helps focused search to be carried out at different granularities of a structured document with XML markups. This report aims at reviewing the state-of-the-arts in focused search, particularly techniques for topic-specific document retrieval, passage retrieval, XML retrieval, and entity ranking. It is concluded with highlight of open problems
- …