17 research outputs found

    Les temps du document et la recherche d'information

    Get PDF
    http://dn.revuesonline.com/article.jsp?articleId=5637Cet article présente un panorama des liens entre recherche d'information et aspects temporels des documents. Une première analyse amène à distinguer le temps évoqué par le discours des documents et le temps de situation de ces documents dans le temps historique. Le temps de l'univers du discours doit être pris en compte dans la phase d'indexation de la recherche documentaire. Il peut être traité par extraction d'entités nommées et plus finement par une analyse de la langue pour déterminer les relations temporelles. Le traitement des informations de catalogage si elles ne suivent pas des normes très strictes est en fait un problème voisin. Le temps de publication, qui est dans le monde de l'édition traditionnelle la principale donnée de catalogage à caractère temporel, devient dans le monde du document numérique une donnée fondamentale permettant de modéliser l'évolution des documents. Nous introduisons les notions de collections " muable " et immuable. Nous évoquons aussi les questions de granularité de représentation du temps

    Conditional Anomaly Detection with Soft Harmonic Functions

    Get PDF
    International audienceIn this paper, we consider the problem of conditional anomaly detection that aims to identify data instances with an unusual response or a class label. We develop a new non-parametric approach for conditional anomaly detection based on the soft harmonic solution, with which we estimate the confidence of the label to detect anomalous mislabeling. We further regularize the solution to avoid the detection of isolated examples and examples on the boundary of the distribution support. We demonstrate the efficacy of the proposed method on several synthetic and UCI ML datasets in detecting unusual labels when compared to several baseline approaches. We also evaluate the performance of our method on a real-world electronic health record dataset where we seek to identify unusual patient-management decisions

    A geo-service semantic integration in Spatial Data Infrastructures

    Get PDF
    In this paper we focus on the semantic heterogeneity problem as one of the main challenges in current Spatial Data Infrastructures (SDIs). We first report on the state of the art in reducing such a heterogeneity in SDIs. We then consider a particular geo-service integration scenario. We discuss an approach of how to semantically coordinate geographic services, which is based on a view of the semantics of web service coordination, implemented by using the Lightweight Coordination Calculus (LCC) language. In this approach, service providers share explicit knowledge of the interactions in which their services are engaged and these models of interaction are used operationally as the anchor for describing the semantics of the interaction. We achieve web service discovery and integration by using semantic matching between particular interactions and web service descriptions. For this purpose we introduce a specific solution, called structure preserving semantic matching. We present a real world application scenario to illustrate how semantic integration of geo web services can be performed by using this approach. Finally, we provide a preliminary evaluation of the solution discussed

    Multi-Dimensional Joins

    Get PDF
    We present three novel algorithms for performing multi-dimensional joins and an in-depth survey and analysis of a low-dimensional spatial join. The first algorithm, the Iterative Spatial Join, performs a spatial join on low-dimensional data and is based on a plane-sweep technique. As we show analytically and experimentally, the Iterative Spatial Join performs well when internal memory is limited, compared to competing methods. This suggests that the Iterative Spatial Join would be useful for very large data sets or in situations where internal memory is a shared resource and is therefore limited, such as with today's database engines which share internal memory amongst several queries. Furthermore, the performance of the Iterative Spatial Join is predictable and has no parameters which need to be tuned, unlike other algorithms. The second algorithm, the Quickjoin algorithm, performs a higher-dimensional similarity join in which pairs of objects that lie within a certain distance epsilon of each other are reported. The Quickjoin algorithm overcomes drawbacks of competing methods, such as requiring embedding methods on the data first or using multi-dimensional indices, which limit the ability to discriminate between objects in each dimension, thereby degrading performance. A formal analysis is provided of the Quickjoin method, and experiments show that the Quickjoin method significantly outperforms competing methods. The third algorithm adapts incremental join techniques to improve the speed of calculating the Hausdorff distance, which is used in applications such as image matching, image analysis, and surface approximations. The nearest neighbor incremental join technique for indices that are based on hierarchical containment use a priority queue of index node pairs and bounds on the distance values between pairs, both of which need to modified in order to calculate the Hausdorff distance. Results of experiments are described that confirm the performance improvement. Finally, a survey is provided which instead of just summarizing the literature and presenting each technique in its entirety, describes distinct components of the different techniques, and each technique is decomposed into an overall framework for performing a spatial join

    Geographical places as a personalisation element: extracting profiles from human activities and services of visited places in mobility logs

    Get PDF
    Collecting personal mobility traces of individuals is currently applicable on a large scale due to the popularity of position-aware mobile phones. Statistical analysis of GPS data streams, collected with a mobile phone, can reveal several interesting measures such as the most frequently visited geographical places by some individual. Applying probabilistic models to such data sets can predict the next place to visit, and when. Several practical applications can utilise the results of such analysis. Current state of the art, however, is limited in terms of the qualitative analysis of personal mobility logs. Without explicit user-interactions, not much semantics can be inferred from a GPS log. This work proposes the utilisation of the common human activities and services provided at certain place types to extract semantically rich profiles from personal mobility logs. The resulting profiles include spatial, temporal and generic thematic description of a user. The work introduces several pre-processing methods for GPS data streams, collected with personal mobile devices, which improved the quality of the place extraction process from GPS logs. The thesis also introduces a method for extracting place semantics from multiple data sources. A textual corpus of functional descriptions of human activities and services associated with certain geographic place types is analysed to identify the frequent linguistic patterns used to describe such terms. The patterns found are then matched against multiple textual data sources of place semantics, to extract such terms, for a collection of place types. The results were evaluated in comparison to an equivalent expert ontology, as well as to semantics collected from the general public. Finally, the work proposes a model for the resulting profiles, the necessary algorithms to build and utilise such profiles, along with an encoding mark-up language. A simulated mobile application was developed to show the usability and for evaluation of the resulting profiles

    Mobility mining for time-dependent urban network modeling

    Get PDF
    170 p.Mobility planning, monitoring and analysis in such a complex ecosystem as a city are very challenging.Our contributions are expected to be a small step forward towards a more integrated vision of mobilitymanagement. The main hypothesis behind this thesis is that the transportation offer and the mobilitydemand are greatly coupled, and thus, both need to be thoroughly and consistently represented in a digitalmanner so as to enable good quality data-driven advanced analysis. Data-driven analytics solutions relyon measurements. However, sensors do only provide a measure of movements that have already occurred(and associated magnitudes, such as vehicles per hour). For a movement to happen there are two mainrequirements: i) the demand (the need or interest) and ii) the offer (the feasibility and resources). Inaddition, for good measurement, the sensor needs to be located at an adequate location and be able tocollect data at the right moment. All this information needs to be digitalised accordingly in order to applyadvanced data analytic methods and take advantage of good digital transportation resource representation.Our main contributions, focused on mobility data mining over urban transportation networks, can besummarised in three groups. The first group consists of a comprehensive description of a digitalmultimodal transport infrastructure representation from global and local perspectives. The second groupis oriented towards matching diverse sensor data onto the transportation network representation,including a quantitative analysis of map-matching algorithms. The final group of contributions covers theprediction of short-term demand based on various measures of urban mobility

    Mobility mining for time-dependent urban network modeling

    Get PDF
    170 p.Mobility planning, monitoring and analysis in such a complex ecosystem as a city are very challenging.Our contributions are expected to be a small step forward towards a more integrated vision of mobilitymanagement. The main hypothesis behind this thesis is that the transportation offer and the mobilitydemand are greatly coupled, and thus, both need to be thoroughly and consistently represented in a digitalmanner so as to enable good quality data-driven advanced analysis. Data-driven analytics solutions relyon measurements. However, sensors do only provide a measure of movements that have already occurred(and associated magnitudes, such as vehicles per hour). For a movement to happen there are two mainrequirements: i) the demand (the need or interest) and ii) the offer (the feasibility and resources). Inaddition, for good measurement, the sensor needs to be located at an adequate location and be able tocollect data at the right moment. All this information needs to be digitalised accordingly in order to applyadvanced data analytic methods and take advantage of good digital transportation resource representation.Our main contributions, focused on mobility data mining over urban transportation networks, can besummarised in three groups. The first group consists of a comprehensive description of a digitalmultimodal transport infrastructure representation from global and local perspectives. The second groupis oriented towards matching diverse sensor data onto the transportation network representation,including a quantitative analysis of map-matching algorithms. The final group of contributions covers theprediction of short-term demand based on various measures of urban mobility

    Spatial Probabilistic Temporal Databases

    Get PDF
    Research in spatio-temporal probabilistic reasoning examines algorithms for handling data such as cell phone triangulation, GPS systems, movement prediction software, and other inexact but useful data sources. In this thesis I describe a probabilistic model theory for such data. The Spatial PrObabilistic Temporal database framework (or SPOT database framework) provides methods for interpreting, checking consistency, automatically revising, and querying such databases. This thesis examines two different semantics within the SPOT framework and presents polynomial-time consistency checking algorithms for both. It introduces several revision techniques for repairing inconsistent databases and compares them to the AGM Axioms for belief state revision; finding an algorithm that, by only changing the probability bounds in the SPOT atoms, can repair a SPOT database in polynomial time while still satisfying the AGM axioms. Also included is an investigation into optimistic and cautious versions of a selection query that returns all objects in a given region with at least (or at most) a certain probability. For these queries, I introduce an indexing structure akin to the R-tree called a SPOT tree, and show experiments where indexing speeds up selection with both artificial and real-world data. I also introduce query preprocessing techniques that bound the sets of solutions with both circumscribing and inscribing regions, and discover these to also provide query time improvements in practice. By covering semantics, consistency checking, database revision, indexing, and query preprocessing techniques for SPOT database, this thesis provides a significant step towards a SPOT database framework that may be applied to the sorts of real-world problems in the impressive amount of semi-accurate spatio-temporal data available today
    corecore