34,424 research outputs found
On trip planning queries in spatial databases
In this paper we discuss a new type of query in Spatial Databases, called Trip Planning Query (TPQ). Given a set of points P in space, where each point belongs to a category, and given two points s and e, TPQ asks for the best trip that starts at s, passes through exactly one point from each category, and ends at e. An example of a TPQ is when a user wants to visit a set of different places and at the same time minimize the total travelling cost, e.g. what is the shortest travelling plan for me to visit an automobile shop, a CVS pharmacy outlet, and a Best Buy shop along my trip from A to B? The trip planning query is an extension of the well-known TSP problem and therefore is NP-hard. The difficulty of this query lies in the existence of multiple choices for each category. In this paper, we first study fast approximation algorithms for the trip planning query in a metric space, assuming that the data set fits in main memory, and give the theory analysis of their approximation bounds. Then, the trip planning query is examined for data sets that do not fit in main memory and must be stored on disk. For the disk-resident data, we consider two cases. In one case, we assume that the points are located in Euclidean space and indexed with an Rtree. In the other case, we consider the problem of points that lie on the edges of a spatial network (e.g. road network) and the distance between two points is defined using the shortest distance over the network. Finally, we give an experimental evaluation of the proposed algorithms using synthetic data sets generated on real road networks
On trip planning queries in spatial databases
In this paper we discuss a new type of query in Spatial Databases, called Trip Planning Query (TPQ). Given a set of points P in space, where each point belongs to a category, and given two points s and e, TPQ asks for the best trip that starts at s, passes through exactly one point from each category, and ends at e. An example of a TPQ is when a user wants to visit a set of different places and at the same time minimize the total travelling cost, e.g. what is the shortest travelling plan for me to visit an automobile shop, a CVS pharmacy outlet, and a Best Buy shop along my trip from A to B? The trip planning query is an extension of the well-known TSP problem and therefore is NP-hard. The difficulty of this query lies in the existence of multiple choices for each category. In this paper, we first study fast approximation algorithms for the trip planning query in a metric space, assuming that the data set fits in main memory, and give the theory analysis of their approximation bounds. Then, the trip planning query is examined for data sets that do not fit in main memory and must be stored on disk. For the disk-resident data, we consider two cases. In one case, we assume that the points are located in Euclidean space and indexed with an Rtree. In the other case, we consider the problem of points that lie on the edges of a spatial network (e.g. road network) and the distance between two points is defined using the shortest distance over the network. Finally, we give an experimental evaluation of the proposed algorithms using synthetic data sets generated on real road networks
Hybrid Information Retrieval Model For Web Images
The Bing Bang of the Internet in the early 90's increased dramatically the
number of images being distributed and shared over the web. As a result, image
information retrieval systems were developed to index and retrieve image files
spread over the Internet. Most of these systems are keyword-based which search
for images based on their textual metadata; and thus, they are imprecise as it
is vague to describe an image with a human language. Besides, there exist the
content-based image retrieval systems which search for images based on their
visual information. However, content-based type systems are still immature and
not that effective as they suffer from low retrieval recall/precision rate.
This paper proposes a new hybrid image information retrieval model for indexing
and retrieving web images published in HTML documents. The distinguishing mark
of the proposed model is that it is based on both graphical content and textual
metadata. The graphical content is denoted by color features and color
histogram of the image; while textual metadata are denoted by the terms that
surround the image in the HTML document, more particularly, the terms that
appear in the tags p, h1, and h2, in addition to the terms that appear in the
image's alt attribute, filename, and class-label. Moreover, this paper presents
a new term weighting scheme called VTF-IDF short for Variable Term
Frequency-Inverse Document Frequency which unlike traditional schemes, it
exploits the HTML tag structure and assigns an extra bonus weight for terms
that appear within certain particular HTML tags that are correlated to the
semantics of the image. Experiments conducted to evaluate the proposed IR model
showed a high retrieval precision rate that outpaced other current models.Comment: LACSC - Lebanese Association for Computational Sciences,
http://www.lacsc.org/; International Journal of Computer Science & Emerging
Technologies (IJCSET), Vol. 3, No. 1, February 201
Indexing Metric Spaces for Exact Similarity Search
With the continued digitalization of societal processes, we are seeing an
explosion in available data. This is referred to as big data. In a research
setting, three aspects of the data are often viewed as the main sources of
challenges when attempting to enable value creation from big data: volume,
velocity and variety. Many studies address volume or velocity, while much fewer
studies concern the variety. Metric space is ideal for addressing variety
because it can accommodate any type of data as long as its associated distance
notion satisfies the triangle inequality. To accelerate search in metric space,
a collection of indexing techniques for metric data have been proposed.
However, existing surveys each offers only a narrow coverage, and no
comprehensive empirical study of those techniques exists. We offer a survey of
all the existing metric indexes that can support exact similarity search, by i)
summarizing all the existing partitioning, pruning and validation techniques
used for metric indexes, ii) providing the time and storage complexity analysis
on the index construction, and iii) report on a comprehensive empirical
comparison of their similarity query processing performance. Here, empirical
comparisons are used to evaluate the index performance during search as it is
hard to see the complexity analysis differences on the similarity query
processing and the query performance depends on the pruning and validation
abilities related to the data distribution. This article aims at revealing
different strengths and weaknesses of different indexing techniques in order to
offer guidance on selecting an appropriate indexing technique for a given
setting, and directing the future research for metric indexes
Hybrid XML Retrieval: Combining Information Retrieval and a Native XML Database
This paper investigates the impact of three approaches to XML retrieval:
using Zettair, a full-text information retrieval system; using eXist, a native
XML database; and using a hybrid system that takes full article answers from
Zettair and uses eXist to extract elements from those articles. For the
content-only topics, we undertake a preliminary analysis of the INEX 2003
relevance assessments in order to identify the types of highly relevant
document components. Further analysis identifies two complementary sub-cases of
relevance assessments ("General" and "Specific") and two categories of topics
("Broad" and "Narrow"). We develop a novel retrieval module that for a
content-only topic utilises the information from the resulting answer list of a
native XML database and dynamically determines the preferable units of
retrieval, which we call "Coherent Retrieval Elements". The results of our
experiments show that -- when each of the three systems is evaluated against
different retrieval scenarios (such as different cases of relevance
assessments, different topic categories and different choices of evaluation
metrics) -- the XML retrieval systems exhibit varying behaviour and the best
performance can be reached for different values of the retrieval parameters. In
the case of INEX 2003 relevance assessments for the content-only topics, our
newly developed hybrid XML retrieval system is substantially more effective
than either Zettair or eXist, and yields a robust and a very effective XML
retrieval.Comment: Postprint version. The editor version can be accessed through the DO
- …