43,771 research outputs found
A Domain-Independent Algorithm for Plan Adaptation
The paradigms of transformational planning, case-based planning, and plan
debugging all involve a process known as plan adaptation - modifying or
repairing an old plan so it solves a new problem. In this paper we provide a
domain-independent algorithm for plan adaptation, demonstrate that it is sound,
complete, and systematic, and compare it to other adaptation algorithms in the
literature. Our approach is based on a view of planning as searching a graph of
partial plans. Generative planning starts at the graph's root and moves from
node to node using plan-refinement operators. In planning by adaptation, a
library plan - an arbitrary node in the plan graph - is the starting point for
the search, and the plan-adaptation algorithm can apply both the same
refinement operators available to a generative planner and can also retract
constraints and steps from the plan. Our algorithm's completeness ensures that
the adaptation algorithm will eventually search the entire graph and its
systematicity ensures that it will do so without redundantly searching any
parts of the graph.Comment: See http://www.jair.org/ for any accompanying file
A limit process for partial match queries in random quadtrees and -d trees
We consider the problem of recovering items matching a partially specified
pattern in multidimensional trees (quadtrees and -d trees). We assume the
traditional model where the data consist of independent and uniform points in
the unit square. For this model, in a structure on points, it is known that
the number of nodes to visit in order to report the items matching
a random query , independent and uniformly distributed on ,
satisfies , where and
are explicit constants. We develop an approach based on the analysis of
the cost of any fixed query , and give precise estimates
for the variance and limit distribution of the cost . Our results
permit us to describe a limit process for the costs as varies in
; one of the consequences is that ; this settles a question of
Devroye [Pers. Comm., 2000].Comment: Published in at http://dx.doi.org/10.1214/12-AAP912 the Annals of
Applied Probability (http://www.imstat.org/aap/) by the Institute of
Mathematical Statistics (http://www.imstat.org). arXiv admin note: text
overlap with arXiv:1107.223
Performance comparison of point and spatial access methods
In the past few years a large number of multidimensional point access methods, also called
multiattribute index structures, has been suggested, all of them claiming good performance. Since no
performance comparison of these structures under arbitrary (strongly correlated nonuniform, short
"ugly") data distributions and under various types of queries has been performed, database
researchers and designers were hesitant to use any of these new point access methods. As shown in
a recent paper, such point access methods are not only important in traditional database applications.
In new applications such as CAD/CIM and geographic or environmental information systems, access
methods for spatial objects are needed. As recently shown such access methods are based on point
access methods in terms of functionality and performance. Our performance comparison naturally
consists of two parts. In part I we w i l l compare multidimensional point access methods, whereas in
part I I spatial access methods for rectangles will be compared. In part I we present a survey and
classification of existing point access methods. Then we carefully select the following four methods
for implementation and performance comparison under seven different data files (distributions) and
various types of queries: the 2-level grid file, the BANG file, the hB-tree and a new scheme, called
the BUDDY hash tree. We were surprised to see one method to be the clear winner which was the
BUDDY hash tree. It exhibits an at least 20 % better average performance than its competitors and is
robust under ugly data and queries. In part I I we compare spatial access methods for rectangles.
After presenting a survey and classification of existing spatial access methods we carefully selected
the following four methods for implementation and performance comparison under six different data
files (distributions) and various types of queries: the R-tree, the BANG file, PLOP hashing and the
BUDDY hash tree. The result presented two winners: the BANG file and the BUDDY hash tree.
This comparison is a first step towards a standardized testbed or benchmark. We offer our data and
query files to each designer of a new point or spatial access method such that he can run his
implementation in our testbed
Multidimensional Range Queries on Modern Hardware
Range queries over multidimensional data are an important part of database
workloads in many applications. Their execution may be accelerated by using
multidimensional index structures (MDIS), such as kd-trees or R-trees. As for
most index structures, the usefulness of this approach depends on the
selectivity of the queries, and common wisdom told that a simple scan beats
MDIS for queries accessing more than 15%-20% of a dataset. However, this wisdom
is largely based on evaluations that are almost two decades old, performed on
data being held on disks, applying IO-optimized data structures, and using
single-core systems. The question is whether this rule of thumb still holds
when multidimensional range queries (MDRQ) are performed on modern
architectures with large main memories holding all data, multi-core CPUs and
data-parallel instruction sets. In this paper, we study the question whether
and how much modern hardware influences the performance ratio between index
structures and scans for MDRQ. To this end, we conservatively adapted three
popular MDIS, namely the R*-tree, the kd-tree, and the VA-file, to exploit
features of modern servers and compared their performance to different flavors
of parallel scans using multiple (synthetic and real-world) analytical
workloads over multiple (synthetic and real-world) datasets of varying size,
dimensionality, and skew. We find that all approaches benefit considerably from
using main memory and parallelization, yet to varying degrees. Our evaluation
indicates that, on current machines, scanning should be favored over parallel
versions of classical MDIS even for very selective queries
The study of probability model for compound similarity searching
Information Retrieval or IR system main task is to retrieve relevant documents according to the users query. One of IR most popular retrieval model is the Vector Space Model. This model assumes relevance based on similarity, which is defined as the distance between query and document in the concept space. All currently existing chemical compound database systems have adapt the vector space model to calculate the similarity of a database entry to a query compound. However, it assumes that fragments represented by the bits are independent of one another, which is not necessarily true. Hence, the possibility of applying another IR model is explored, which is the Probabilistic Model, for chemical compound searching. This model estimates the probabilities of a chemical structure to have the same bioactivity as a target compound. It is envisioned that by ranking chemical structures in decreasing order of their probability of relevance to the query structure, the effectiveness of a molecular similarity searching system can be increased. Both fragment dependencies and independencies assumption are taken into consideration in achieving improvement towards compound similarity searching system. After conducting a series of simulated similarity searching, it is concluded that PM approaches really did perform better than the existing similarity searching. It gave better result in all evaluation criteria to confirm this statement. In terms of which probability model performs better, the BD model shown improvement over the BIR model
Substructure Discovery Using Minimum Description Length and Background Knowledge
The ability to identify interesting and repetitive substructures is an
essential component to discovering knowledge in structural data. We describe a
new version of our SUBDUE substructure discovery system based on the minimum
description length principle. The SUBDUE system discovers substructures that
compress the original data and represent structural concepts in the data. By
replacing previously-discovered substructures in the data, multiple passes of
SUBDUE produce a hierarchical description of the structural regularities in the
data. SUBDUE uses a computationally-bounded inexact graph match that identifies
similar, but not identical, instances of a substructure and finds an
approximate measure of closeness of two substructures when under computational
constraints. In addition to the minimum description length principle, other
background knowledge can be used by SUBDUE to guide the search towards more
appropriate substructures. Experiments in a variety of domains demonstrate
SUBDUE's ability to find substructures capable of compressing the original data
and to discover structural concepts important to the domain. Description of
Online Appendix: This is a compressed tar file containing the SUBDUE discovery
system, written in C. The program accepts as input databases represented in
graph form, and will output discovered substructures with their corresponding
value.Comment: See http://www.jair.org/ for an online appendix and other files
accompanying this articl
Location-based indexing for mobile context-aware access to a digital library
Mobile information systems need to collaborate with each other to provide seamless information access to the user. Information about the user and their context provides the points of contact between the systems. Location is the most basic user context.
TIP is a mobile tourist information system that provides location-based access to documents in the digital library Greenstone. This paper identifies the challenges for providing effcient access to location-based information using the various access modes a tourist requires on their travels. We discuss our extended 2DR-tree approach to meet these challenges
- …