29 research outputs found
Les temps du document et la recherche d'information
http://dn.revuesonline.com/article.jsp?articleId=5637Cet article présente un panorama des liens entre recherche d'information et aspects temporels des documents. Une première analyse amène à distinguer le temps évoqué par le discours des documents et le temps de situation de ces documents dans le temps historique. Le temps de l'univers du discours doit être pris en compte dans la phase d'indexation de la recherche documentaire. Il peut être traité par extraction d'entités nommées et plus finement par une analyse de la langue pour déterminer les relations temporelles. Le traitement des informations de catalogage si elles ne suivent pas des normes très strictes est en fait un problème voisin. Le temps de publication, qui est dans le monde de l'édition traditionnelle la principale donnée de catalogage à caractère temporel, devient dans le monde du document numérique une donnée fondamentale permettant de modéliser l'évolution des documents. Nous introduisons les notions de collections " muable " et immuable. Nous évoquons aussi les questions de granularité de représentation du temps
Conditional Anomaly Detection with Soft Harmonic Functions
International audienceIn this paper, we consider the problem of conditional anomaly detection that aims to identify data instances with an unusual response or a class label. We develop a new non-parametric approach for conditional anomaly detection based on the soft harmonic solution, with which we estimate the confidence of the label to detect anomalous mislabeling. We further regularize the solution to avoid the detection of isolated examples and examples on the boundary of the distribution support. We demonstrate the efficacy of the proposed method on several synthetic and UCI ML datasets in detecting unusual labels when compared to several baseline approaches. We also evaluate the performance of our method on a real-world electronic health record dataset where we seek to identify unusual patient-management decisions
Aggregate Nearest Neighbor Queries in Spatial Databases
Given two spatial datasets P (e.g., facilities) and Q (queries), an aggregate nearest neighbor (ANN) query retrieves the point(s) of P with the smallest aggregate distance(s) to points in Q. Assuming, for example, n users at locations q1,... qn,anANN query outputs the facility p ∈ P that minimizes the sum of distances |pqi | for 1 ≤ i ≤ n that the users have to travel in order to meet there. Similarly, another ANN query may report the point p ∈ P that minimizes the maximum distance that any user has to travel, or the minimum distance from some user to his/her closest facility. If Q fits in memory and P is indexed by an R-tree, we develop algorithms for aggregate nearest neighbors that capture several versions of the problem, including weighted queries and incremental reporting of results. Then, we analyze their performance and propose cost models for query optimization. Finally, we extend our techniques for disk-resident queries and approximate ANN retrieval. The efficiency of the algorithms and the accuracy of the cost models are evaluated through extensive experiments with real and synthetic datasets
A geo-service semantic integration in Spatial Data Infrastructures
In this paper we focus on the semantic heterogeneity problem as one of the main challenges in current Spatial Data Infrastructures (SDIs). We first report on the state of the art in reducing such a heterogeneity in SDIs. We then consider a particular geo-service integration scenario. We discuss an approach of how to semantically coordinate geographic services, which is based on a view of the semantics of web service coordination, implemented by using the Lightweight Coordination Calculus (LCC) language. In this approach, service providers share explicit knowledge of the interactions in which their services are engaged and these models of interaction are used operationally as the anchor for describing the semantics of the interaction. We achieve web service discovery and integration by using semantic matching between particular interactions and web service descriptions. For this purpose we introduce a specific solution, called structure preserving semantic matching. We present a real world application scenario to illustrate how semantic integration of geo web services can be performed by using this approach. Finally, we provide a preliminary evaluation of the solution discussed
Efficient Indexing Structure for Trajectories in Geographical Information Systems
Technologies dealing with location such as GPS are producing more and more data of moving objects. Spatio-temporal databases store information about the positions of individual objects over time. Real-world applications of spatio-temporal data include vehicle navigation, migration of people, tracking and monitoring air-based, sea or land-based vehicles. Also the location technologies, such as GPS and telegraphy, are producing more and more data of moving objects. Spatio-temporal database is needed to manage these data, so as to solve the problems in spatio-temporal applications. A spatio-temporal database adopts an exhaustive search strategy for querying the trajectories. This is very time-consuming when processing large datasets for the given spatio-temporal query conditions. As a result, efficient Spatio-Temporal indexing methods are highly demanded to improve the performance of the system in searching such large datasets.Computer Science Departmen
Multi-Dimensional Joins
We present three novel algorithms for performing multi-dimensional
joins and an in-depth survey and analysis of a low-dimensional
spatial join. The first algorithm, the Iterative Spatial Join,
performs a spatial join on low-dimensional data and is based
on a plane-sweep technique.
As we show analytically and experimentally,
the Iterative Spatial Join performs well when internal memory is
limited, compared to competing methods. This suggests that
the Iterative Spatial Join would be useful for very large data sets
or in situations where internal memory is a shared resource and
is therefore limited, such as with today's database engines which
share internal memory amongst several queries. Furthermore, the
performance of the Iterative Spatial Join is predictable and has
no parameters which need to be tuned, unlike other algorithms.
The second algorithm, the Quickjoin algorithm,
performs a higher-dimensional
similarity join in which pairs of objects that lie within a
certain distance epsilon of each other are reported.
The Quickjoin algorithm overcomes drawbacks of competing methods,
such as requiring embedding methods on the data first or using
multi-dimensional indices, which limit
the ability to discriminate between objects in each
dimension, thereby degrading performance.
A formal analysis is provided of the Quickjoin method, and
experiments show that the Quickjoin method significantly outperforms
competing methods.
The third algorithm adapts
incremental join techniques to improve the
speed of calculating the Hausdorff distance, which
is used in applications such as image matching, image analysis,
and surface approximations.
The nearest neighbor incremental join technique for indices that
are based on hierarchical containment use a priority queue
of index node pairs and bounds on the distance values between
pairs, both of which need to modified in order to calculate the
Hausdorff distance. Results of experiments are described that
confirm the performance improvement.
Finally, a survey is provided which
instead of just summarizing the literature and presenting each
technique in its entirety, describes distinct components of
the different techniques, and each technique is decomposed into
an overall framework for performing a spatial join