29 research outputs found

    Les temps du document et la recherche d'information

    Get PDF
    http://dn.revuesonline.com/article.jsp?articleId=5637Cet article présente un panorama des liens entre recherche d'information et aspects temporels des documents. Une première analyse amène à distinguer le temps évoqué par le discours des documents et le temps de situation de ces documents dans le temps historique. Le temps de l'univers du discours doit être pris en compte dans la phase d'indexation de la recherche documentaire. Il peut être traité par extraction d'entités nommées et plus finement par une analyse de la langue pour déterminer les relations temporelles. Le traitement des informations de catalogage si elles ne suivent pas des normes très strictes est en fait un problème voisin. Le temps de publication, qui est dans le monde de l'édition traditionnelle la principale donnée de catalogage à caractère temporel, devient dans le monde du document numérique une donnée fondamentale permettant de modéliser l'évolution des documents. Nous introduisons les notions de collections " muable " et immuable. Nous évoquons aussi les questions de granularité de représentation du temps

    Conditional Anomaly Detection with Soft Harmonic Functions

    Get PDF
    International audienceIn this paper, we consider the problem of conditional anomaly detection that aims to identify data instances with an unusual response or a class label. We develop a new non-parametric approach for conditional anomaly detection based on the soft harmonic solution, with which we estimate the confidence of the label to detect anomalous mislabeling. We further regularize the solution to avoid the detection of isolated examples and examples on the boundary of the distribution support. We demonstrate the efficacy of the proposed method on several synthetic and UCI ML datasets in detecting unusual labels when compared to several baseline approaches. We also evaluate the performance of our method on a real-world electronic health record dataset where we seek to identify unusual patient-management decisions

    Aggregate Nearest Neighbor Queries in Spatial Databases

    Get PDF
    Given two spatial datasets P (e.g., facilities) and Q (queries), an aggregate nearest neighbor (ANN) query retrieves the point(s) of P with the smallest aggregate distance(s) to points in Q. Assuming, for example, n users at locations q1,... qn,anANN query outputs the facility p ∈ P that minimizes the sum of distances |pqi | for 1 ≤ i ≤ n that the users have to travel in order to meet there. Similarly, another ANN query may report the point p ∈ P that minimizes the maximum distance that any user has to travel, or the minimum distance from some user to his/her closest facility. If Q fits in memory and P is indexed by an R-tree, we develop algorithms for aggregate nearest neighbors that capture several versions of the problem, including weighted queries and incremental reporting of results. Then, we analyze their performance and propose cost models for query optimization. Finally, we extend our techniques for disk-resident queries and approximate ANN retrieval. The efficiency of the algorithms and the accuracy of the cost models are evaluated through extensive experiments with real and synthetic datasets

    A geo-service semantic integration in Spatial Data Infrastructures

    Get PDF
    In this paper we focus on the semantic heterogeneity problem as one of the main challenges in current Spatial Data Infrastructures (SDIs). We first report on the state of the art in reducing such a heterogeneity in SDIs. We then consider a particular geo-service integration scenario. We discuss an approach of how to semantically coordinate geographic services, which is based on a view of the semantics of web service coordination, implemented by using the Lightweight Coordination Calculus (LCC) language. In this approach, service providers share explicit knowledge of the interactions in which their services are engaged and these models of interaction are used operationally as the anchor for describing the semantics of the interaction. We achieve web service discovery and integration by using semantic matching between particular interactions and web service descriptions. For this purpose we introduce a specific solution, called structure preserving semantic matching. We present a real world application scenario to illustrate how semantic integration of geo web services can be performed by using this approach. Finally, we provide a preliminary evaluation of the solution discussed

    Efficient Indexing Structure for Trajectories in Geographical Information Systems

    Get PDF
    Technologies dealing with location such as GPS are producing more and more data of moving objects. Spatio-temporal databases store information about the positions of individual objects over time. Real-world applications of spatio-temporal data include vehicle navigation, migration of people, tracking and monitoring air-based, sea or land-based vehicles. Also the location technologies, such as GPS and telegraphy, are producing more and more data of moving objects. Spatio-temporal database is needed to manage these data, so as to solve the problems in spatio-temporal applications. A spatio-temporal database adopts an exhaustive search strategy for querying the trajectories. This is very time-consuming when processing large datasets for the given spatio-temporal query conditions. As a result, efficient Spatio-Temporal indexing methods are highly demanded to improve the performance of the system in searching such large datasets.Computer Science Departmen

    Multi-Dimensional Joins

    Get PDF
    We present three novel algorithms for performing multi-dimensional joins and an in-depth survey and analysis of a low-dimensional spatial join. The first algorithm, the Iterative Spatial Join, performs a spatial join on low-dimensional data and is based on a plane-sweep technique. As we show analytically and experimentally, the Iterative Spatial Join performs well when internal memory is limited, compared to competing methods. This suggests that the Iterative Spatial Join would be useful for very large data sets or in situations where internal memory is a shared resource and is therefore limited, such as with today's database engines which share internal memory amongst several queries. Furthermore, the performance of the Iterative Spatial Join is predictable and has no parameters which need to be tuned, unlike other algorithms. The second algorithm, the Quickjoin algorithm, performs a higher-dimensional similarity join in which pairs of objects that lie within a certain distance epsilon of each other are reported. The Quickjoin algorithm overcomes drawbacks of competing methods, such as requiring embedding methods on the data first or using multi-dimensional indices, which limit the ability to discriminate between objects in each dimension, thereby degrading performance. A formal analysis is provided of the Quickjoin method, and experiments show that the Quickjoin method significantly outperforms competing methods. The third algorithm adapts incremental join techniques to improve the speed of calculating the Hausdorff distance, which is used in applications such as image matching, image analysis, and surface approximations. The nearest neighbor incremental join technique for indices that are based on hierarchical containment use a priority queue of index node pairs and bounds on the distance values between pairs, both of which need to modified in order to calculate the Hausdorff distance. Results of experiments are described that confirm the performance improvement. Finally, a survey is provided which instead of just summarizing the literature and presenting each technique in its entirety, describes distinct components of the different techniques, and each technique is decomposed into an overall framework for performing a spatial join
    corecore