591 research outputs found
A storage and access architecture for efficient query processing in spatial database systems
Due to the high complexity of objects and queries and also due to extremely
large data volumes, geographic database systems impose stringent requirements on their
storage and access architecture with respect to efficient query processing. Performance
improving concepts such as spatial storage and access structures, approximations, object
decompositions and multi-phase query processing have been suggested and analyzed as
single building blocks. In this paper, we describe a storage and access architecture which
is composed from the above building blocks in a modular fashion. Additionally, we incorporate
into our architecture a new ingredient, the scene organization, for efficiently
supporting set-oriented access of large-area region queries. An experimental performance
comparison demonstrates that the concept of scene organization leads to considerable
performance improvements for large-area region queries by a factor of up to 150
Querying Probabilistic Neighborhoods in Spatial Data Sets Efficiently
In this paper we define the notion
of a probabilistic neighborhood in spatial data: Let a set of points in
, a query point , a distance metric \dist,
and a monotonically decreasing function be
given. Then a point belongs to the probabilistic neighborhood of with respect to with probability f(\dist(p,q)). We envision
applications in facility location, sensor networks, and other scenarios where a
connection between two entities becomes less likely with increasing distance. A
straightforward query algorithm would determine a probabilistic neighborhood in
time by probing each point in .
To answer the query in sublinear time for the planar case, we augment a
quadtree suitably and design a corresponding query algorithm. Our theoretical
analysis shows that -- for certain distributions of planar -- our algorithm
answers a query in time with high probability
(whp). This matches up to a logarithmic factor the cost induced by
quadtree-based algorithms for deterministic queries and is asymptotically
faster than the straightforward approach whenever .
As practical proofs of concept we use two applications, one in the Euclidean
and one in the hyperbolic plane. In particular, our results yield the first
generator for random hyperbolic graphs with arbitrary temperatures in
subquadratic time. Moreover, our experimental data show the usefulness of our
algorithm even if the point distribution is unknown or not uniform: The running
time savings over the pairwise probing approach constitute at least one order
of magnitude already for a modest number of points and queries.Comment: The final publication is available at Springer via
http://dx.doi.org/10.1007/978-3-319-44543-4_3
Query processing of spatial objects: Complexity versus Redundancy
The management of complex spatial objects in applications, such as geography and cartography,
imposes stringent new requirements on spatial database systems, in particular on efficient
query processing. As shown before, the performance of spatial query processing can be improved
by decomposing complex spatial objects into simple components. Up to now, only decomposition
techniques generating a linear number of very simple components, e.g. triangles or trapezoids, have
been considered. In this paper, we will investigate the natural trade-off between the complexity of
the components and the redundancy, i.e. the number of components, with respect to its effect on
efficient query processing. In particular, we present two new decomposition methods generating
a better balance between the complexity and the number of components than previously known
techniques. We compare these new decomposition methods to the traditional undecomposed representation
as well as to the well-known decomposition into convex polygons with respect to their
performance in spatial query processing. This comparison points out that for a wide range of query
selectivity the new decomposition techniques clearly outperform both the undecomposed representation
and the convex decomposition method. More important than the absolute gain in performance
by a factor of up to an order of magnitude is the robust performance of our new decomposition
techniques over the whole range of query selectivity
Electro-responsivity in electrolyte-free and solution processed Bragg stacks
Achieving an active manipulation of colours has huge implications in optoelectronics, as colour engineering can be exploited in a number of applications, ranging from display to lightning. In the last decade, the synergy of the highly pure colours of 1D photonic crystals, also known as Bragg stacks, with electro-tunable materials have been proposed as an interesting route to attain such a technologically relevant effect. However, recent works rely on the use of liquid electrolytes, which can pose issues in terms of chemical and environmental stability. Here, we report on the proof-of-concept of an electrolyte free and solution-processed electro-responsive Bragg stack. We integrate an electro-responsive plasmonic metal oxide, namely indium tin oxide, in a 1D photonic crystal structure made of alternating layers of ITO and TiO2 nanoparticles. In such a device, we observed a maximum of 23 nm blue-shift upon the application of an external bias (10 V). Our data suggest that electrochromism can be attained in all-solid state systems by combining a judicious selection of the constituent materials with device architecture optimisation. This journal i
Model-based probabilistic frequent itemset mining
Data uncertainty is inherent in emerging applications such as location-based services, sensor monitoring systems, and data integration. To handle a large amount of imprecise information, uncertain databases have been recently developed. In this paper, we study how to efficiently discover frequent itemsets from large uncertain databases, interpreted under the Possible World Semantics. This is technically challenging, since an uncertain database induces an exponential number of possible worlds. To tackle this problem, we propose a novel methods to capture the itemset mining process as a probability distribution function taking two models into account: the Poisson distribution and the normal distribution. These model-based approaches extract frequent itemsets with a high degree of accuracy and support large databases. We apply our techniques to improve the performance of the algorithms for (1) finding itemsets whose frequentness probabilities are larger than some threshold and (2) mining itemsets with the {Mathematical expression} highest frequentness probabilities. Our approaches support both tuple and attribute uncertainty models, which are commonly used to represent uncertain databases. Extensive evaluation on real and synthetic datasets shows that our methods are highly accurate and four orders of magnitudes faster than previous approaches. In further theoretical and experimental studies, we give an intuition which model-based approach fits best to different types of data sets. © 2012 The Author(s).published_or_final_versio
Considerations about multistep community detection
The problem and implications of community detection in networks have raised a
huge attention, for its important applications in both natural and social
sciences. A number of algorithms has been developed to solve this problem,
addressing either speed optimization or the quality of the partitions
calculated. In this paper we propose a multi-step procedure bridging the
fastest, but less accurate algorithms (coarse clustering), with the slowest,
most effective ones (refinement). By adopting heuristic ranking of the nodes,
and classifying a fraction of them as `critical', a refinement step can be
restricted to this subset of the network, thus saving computational time.
Preliminary numerical results are discussed, showing improvement of the final
partition.Comment: 12 page
The Epistemology of Intentionality: Notional Constituents vs. Direct Grasp
Franz Brentano is well known for highlighting the importance of intentionality, but he said curiously little about the nature of intentionality. According to Mark Textor, there is a deep reason for this: Brentano took intentionality to be a conceptual primitive the nature of which is revealed only in direct grasp. Although there is certainly textual support for this interpretation, it appears in tension with Brentano’s repeated attempts to analyze intentionality in terms of ‘notional constituents’ – aspects of intentionality which cannot come apart in reality but which can be conceptually distinguished. After bringing out this tension, I explore some options for resolving it, ultimately offering my own favored interpretation
Ontology of core data mining entities
In this article, we present OntoDM-core, an ontology of core data mining
entities. OntoDM-core defines themost essential datamining entities in a three-layered
ontological structure comprising of a specification, an implementation and an application
layer. It provides a representational framework for the description of mining
structured data, and in addition provides taxonomies of datasets, data mining tasks,
generalizations, data mining algorithms and constraints, based on the type of data.
OntoDM-core is designed to support a wide range of applications/use cases, such as
semantic annotation of data mining algorithms, datasets and results; annotation of
QSAR studies in the context of drug discovery investigations; and disambiguation of
terms in text mining. The ontology has been thoroughly assessed following the practices
in ontology engineering, is fully interoperable with many domain resources and
is easy to extend
Improving cluster recovery with feature rescaling factors
The data preprocessing stage is crucial in clustering. Features may describe entities using different scales. To rectify this, one usually applies feature normalisation aiming at rescaling features so that none of them overpowers the others in the objective function of the selected clustering algorithm. In this paper, we argue that the rescaling procedure should not treat all features identically. Instead, it should favour the features that are more meaningful for clustering. With this in mind, we introduce a feature rescaling method that takes into account the within-cluster degree of relevance of each feature. Our comprehensive simulation study, carried out on real and synthetic data, with and without noise features, clearly demonstrates that clustering methods that use the proposed data normalization strategy clearly outperform those that use traditional data normalization
- …