178,657 research outputs found
A competitive environment for exploratory query expansion
Most information workers query digital libraries many times a day. Yet people have little opportunity to hone their skills in a controlled environment, or compare their performance with others in an objective way. Conversely, although search engine logs record how users evolve queries, they lack crucial information about the user's intent. This paper describes an environment for exploratory query expansion that pits users against each other and lets them compete, and practice, in their own time and on their own workstation. The system captures query evolution behavior on predetermined information-seeking tasks. It is publicly available, and the code is open source so that others can set up their own competitive environments
An Analytical Study of Large SPARQL Query Logs
With the adoption of RDF as the data model for Linked Data and the Semantic
Web, query specification from end- users has become more and more common in
SPARQL end- points. In this paper, we conduct an in-depth analytical study of
the queries formulated by end-users and harvested from large and up-to-date
query logs from a wide variety of RDF data sources. As opposed to previous
studies, ours is the first assessment on a voluminous query corpus, span- ning
over several years and covering many representative SPARQL endpoints. Apart
from the syntactical structure of the queries, that exhibits already
interesting results on this generalized corpus, we drill deeper in the
structural char- acteristics related to the graph- and hypergraph represen-
tation of queries. We outline the most common shapes of queries when visually
displayed as pseudographs, and char- acterize their (hyper-)tree width.
Moreover, we analyze the evolution of queries over time, by introducing the
novel con- cept of a streak, i.e., a sequence of queries that appear as
subsequent modifications of a seed query. Our study offers several fresh
insights on the already rich query features of real SPARQL queries formulated
by real users, and brings us to draw a number of conclusions and pinpoint
future di- rections for SPARQL query evaluation, query optimization, tuning,
and benchmarking
Co-evolution of RDF Datasets
Linking Data initiatives have fostered the publication of large number of RDF
datasets in the Linked Open Data (LOD) cloud, as well as the development of
query processing infrastructures to access these data in a federated fashion.
However, different experimental studies have shown that availability of LOD
datasets cannot be always ensured, being RDF data replication required for
envisioning reliable federated query frameworks. Albeit enhancing data
availability, RDF data replication requires synchronization and conflict
resolution when replicas and source datasets are allowed to change data over
time, i.e., co-evolution management needs to be provided to ensure consistency.
In this paper, we tackle the problem of RDF data co-evolution and devise an
approach for conflict resolution during co-evolution of RDF datasets. Our
proposed approach is property-oriented and allows for exploiting semantics
about RDF properties during co-evolution management. The quality of our
approach is empirically evaluated in different scenarios on the DBpedia-live
dataset. Experimental results suggest that proposed proposed techniques have a
positive impact on the quality of data in source datasets and replicas.Comment: 18 pages, 4 figures, Accepted in ICWE, 201
A Query Integrator and Manager for the Query Web
We introduce two concepts: the Query Web as a layer of interconnected queries over the document web and the semantic web, and a Query Web Integrator and Manager (QI) that enables the Query Web to evolve. QI permits users to write, save and reuse queries over any web accessible source, including other queries saved in other installations of QI. The saved queries may be in any language (e.g. SPARQL, XQuery); the only condition for interconnection is that the queries return their results in some form of XML. This condition allows queries to chain off each other, and to be written in whatever language is appropriate for the task. We illustrate the potential use of QI for several biomedical use cases, including ontology view generation using a combination of graph-based and logical approaches, value set generation for clinical data management, image annotation using terminology obtained from an ontology web service, ontology-driven brain imaging data integration, small-scale clinical data integration, and wider-scale clinical data integration. Such use cases illustrate the current range of applications of QI and lead us to speculate about the potential evolution from smaller groups of interconnected queries into a larger query network that layers over the document and semantic web. The resulting Query Web could greatly aid researchers and others who now have to manually navigate through multiple information sources in order to answer specific questions
NEW DYNAMIC QUERY OPTIMIZATION TECHNIQUE IN RELATIONAL DATABASE MANAGEMENT SYSTEMS
Query optimizer is an important component in the architecture of relational data base management system. This component is responsible for translating user submitted query into an efficient query evolution program which can be executed against the database. The present query evolution existing algorithm tries to find the best possible plan to execute a query with a minimum amount of time using mostly semi accurate statistical information (e.g. sizes of temporary relations, selectivity factors, and availability of resources). It is a static approach for generating optimal or close to optimal execution plan. Which in turn increases the execution cost of the query to reduce the execution cost of the query; I propose a new dynamic query optimization algorithm which is based on greedy dynamic programming algorithm uses randomized strategies and reduces the execution cost of the queries and system resources and also it works efficiently with distributed and centralized databases
A New Approach to Tagging Data in the Astronomical Literature
Data Tags are strings used in journals to indicate the origin of the
archival data and to enable the reader to recover the data. The NASA/IPAC
Infrared Science Archive (IRSA) has recently introduced a new approach to production
of data tags and recovery of data from them. Many of the data access
services at the IRSA return filtered data sets (such as subsets of source catalogs)
and dynamically created products (such as image cutouts); these dynamically
created products are not saved permanently at the archive. Rather than tag the
data sets from which the query result sets are drawn, the archive tags the query
that generates the results. A single tag can, then, encode a complex dynamic
data set and simplifies the embedding of tags in manuscripts and journals. By
logging user queries and all the parameters for those query as Data Tags, IRSA
can re-create the query and rerun the IRSA service using the same search parameters
used when the Data Tag was created. At the same time, the logs give
a simple count of the actual numbers of queries made to the archive, a powerful
metric of archive usage unobtainable from the Apache web server logs. Currently,
IRSA creates tags for queries to more than 20 data sets, including the
Infrared Astronomical Satellite (IRAS), Cosmic Evolution Survey (COSMOS)
and Spitzer Space Telescope Legacy Data Sets. These tags are returned by the
spatial query engine, Atlas. IRSA plans to create tags for queries to the rest
of its services in late Spring 2007. The archive provides a simple web interface
which recovers a data set that corresponds to the input data tag. Archived data
sets may evolve in time due to improved calibrations or augmentations to the
data set. IRSA’s query based approach guarantees that users always receive the
best available data sets
Database queries and constraints via lifting problems
Previous work has demonstrated that categories are useful and expressive
models for databases. In the present paper we build on that model, showing that
certain queries and constraints correspond to lifting problems, as found in
modern approaches to algebraic topology. In our formulation, each so-called
SPARQL graph pattern query corresponds to a category-theoretic lifting problem,
whereby the set of solutions to the query is precisely the set of lifts. We
interpret constraints within the same formalism and then investigate some basic
properties of queries and constraints. In particular, to any database we
can associate a certain derived database \Qry(\pi) of queries on . As an
application, we explain how giving users access to certain parts of
\Qry(\pi), rather than direct access to , improves ones ability to
manage the impact of schema evolution
- …