963 research outputs found

    Efficient Data Modelling, Indexing and Processing in Large Datasets

    Full text link
    Many devices and applications in social networks and on-line services are producing, storing, and using description, location, and occurrence time of objects. There are various systems to study, model, index, and process a huge amount of data. In this thesis, we study graphs and publish/subscribe systems. Firstly, we study the problem of continuously updating top-k messages with the highest ranks, each of which contains all the requested keywords when the rank of a message calculates based on freshness and distance to query’s location. Since new incoming messages are arriving all the time and the score of existing top-k results are decreasing over time, providing the most recent information needs continuously computing and maintaining the best results. We propose an efficient indexing and matching method using keywords, location, and the most recent top-k results of queries. Secondly, we study the problem of the decomposition of (k,s)-core. As both the user engagement of nodes and the strength of relationships are important, the (k, s)-core model is proposed in the literature to discover strong communities. Nevertheless, the decomposition algorithm regarding (k,s)-core is not yet investigated. We propose (k,s)-core algorithms to decompose a graph into its hierarchical structures considering both user engagement and tie strength. We first present the basic (k,s)-core decomposition methods. Then, we propose the advanced algorithms DES and DEK which index the support of edges to enable higher-level cost-sharing in the peeling process. In addition, effective pruning strategies are applied to DES/DEK to further enhance performance. Moreover, we build a novel index based on the decomposition result and investigate efficient (k,s)-core query algorithm based on our index. Finally, we develop efficient algorithm for maintaining the (k, s)-core index of the dynamic graph where vertices and edges are inserted and deleted. The algorithm, uses pruning strategies by exploiting the lower and upper bounds of the core number. We define a new Smax core and develop an efficient method for updating (k,s) numbers of nodes

    ETGP: Top-K Geography-Text P/S Approach without Threshold

    Get PDF
    Social media are more and more popular. Subsequently, geography-text data has caused wide attention. Different from the traditional publish/subscribe (P/S), geography-text data is published and subscribed in the form of dynamic data flow in the mobile network. The difference raises higher demands for facility. However, previous top-k geography-text P/S approaches want to set a set of thresholds. A user should take time to set a threshold for each subscription, which is not facile enough. The threshold yields many weaknesses to users. Therefore, we herein propose an efficient top-k geography-text P/S approach that excludes the threshold, called ETGP. Our approach does not need users to set any threshold. Subsequently, the ETGP returns the highest score results to the subscriber without setting a threshold. Therefore, our approach can lessen redundant computations, promote the query integrity rate, and make P/S system easier for the user to use. Comprehensive experiments prove the efficiency of the proposed approach with high facility

    THREE TEMPORAL PERSPECTIVES ON DECENTRALIZED LOCATION-AWARE COMPUTING: PAST, PRESENT, FUTURE

    Get PDF
    Durant les quatre dernières décennies, la miniaturisation a permis la diffusion à large échelle des ordinateurs, les rendant omniprésents. Aujourd’hui, le nombre d’objets connectés à Internet ne cesse de croitre et cette tendance n’a pas l’air de ralentir. Ces objets, qui peuvent être des téléphones mobiles, des véhicules ou des senseurs, génèrent de très grands volumes de données qui sont presque toujours associés à un contexte spatiotemporel. Le volume de ces données est souvent si grand que leur traitement requiert la création de système distribués qui impliquent la coopération de plusieurs ordinateurs. La capacité de traiter ces données revêt une importance sociétale. Par exemple: les données collectées lors de trajets en voiture permettent aujourd’hui d’éviter les em-bouteillages ou de partager son véhicule. Un autre exemple: dans un avenir proche, les données collectées à l’aide de gyroscopes capables de détecter les trous dans la chaussée permettront de mieux planifier les interventions de maintenance à effectuer sur le réseau routier. Les domaines d’applications sont par conséquent nombreux, de même que les problèmes qui y sont associés. Les articles qui composent cette thèse traitent de systèmes qui partagent deux caractéristiques clés: un contexte spatiotemporel et une architecture décentralisée. De plus, les systèmes décrits dans ces articles s’articulent autours de trois axes temporels: le présent, le passé, et le futur. Les systèmes axés sur le présent permettent à un très grand nombre d’objets connectés de communiquer en fonction d’un contexte spatial avec des temps de réponses proche du temps réel. Nos contributions dans ce domaine permettent à ce type de système décentralisé de s’adapter au volume de donnée à traiter en s’étendant sur du matériel bon marché. Les systèmes axés sur le passé ont pour but de faciliter l’accès a de très grands volumes données spatiotemporelles collectées par des objets connectés. En d’autres termes, il s’agit d’indexer des trajectoires et d’exploiter ces indexes. Nos contributions dans ce domaine permettent de traiter des jeux de trajectoires particulièrement denses, ce qui n’avait pas été fait auparavant. Enfin, les systèmes axés sur le futur utilisent les trajectoires passées pour prédire les trajectoires que des objets connectés suivront dans l’avenir. Nos contributions permettent de prédire les trajectoires suivies par des objets connectés avec une granularité jusque là inégalée. Bien qu’impliquant des domaines différents, ces contributions s’articulent autour de dénominateurs communs des systèmes sous-jacents, ouvrant la possibilité de pouvoir traiter ces problèmes avec plus de généricité dans un avenir proche. -- During the past four decades, due to miniaturization computing devices have become ubiquitous and pervasive. Today, the number of objects connected to the Internet is in- creasing at a rapid pace and this trend does not seem to be slowing down. These objects, which can be smartphones, vehicles, or any kind of sensors, generate large amounts of data that are almost always associated with a spatio-temporal context. The amount of this data is often so large that their processing requires the creation of a distributed system, which involves the cooperation of several computers. The ability to process these data is important for society. For example: the data collected during car journeys already makes it possible to avoid traffic jams or to know about the need to organize a carpool. Another example: in the near future, the maintenance interventions to be carried out on the road network will be planned with data collected using gyroscopes that detect potholes. The application domains are therefore numerous, as are the prob- lems associated with them. The articles that make up this thesis deal with systems that share two key characteristics: a spatio-temporal context and a decentralized architec- ture. In addition, the systems described in these articles revolve around three temporal perspectives: the present, the past, and the future. Systems associated with the present perspective enable a very large number of connected objects to communicate in near real-time, according to a spatial context. Our contributions in this area enable this type of decentralized system to be scaled-out on commodity hardware, i.e., to adapt as the volume of data that arrives in the system increases. Systems associated with the past perspective, often referred to as trajectory indexes, are intended for the access to the large volume of spatio-temporal data collected by connected objects. Our contributions in this area makes it possible to handle particularly dense trajectory datasets, a problem that has not been addressed previously. Finally, systems associated with the future per- spective rely on past trajectories to predict the trajectories that the connected objects will follow. Our contributions predict the trajectories followed by connected objects with a previously unmet granularity. Although involving different domains, these con- tributions are structured around the common denominators of the underlying systems, which opens the possibility of being able to deal with these problems more generically in the near future

    Design and Implementation of a Middleware for Uniform, Federated and Dynamic Event Processing

    Get PDF
    In recent years, real-time processing of massive event streams has become an important topic in the area of data analytics. It will become even more important in the future due to cheap sensors, a growing amount of devices and their ubiquitous inter-connection also known as the Internet of Things (IoT). Academia, industry and the open source community have developed several event processing (EP) systems that allow users to define, manage and execute continuous queries over event streams. They achieve a significantly better performance than the traditional store-then-process'' approach in which events are first stored and indexed in a database. Because EP systems have different roots and because of the lack of standardization, the system landscape became highly heterogenous. Today's EP systems differ in APIs, execution behaviors and query languages. This thesis presents the design and implementation of a novel middleware that abstracts from different EP systems and provides a uniform API, execution behavior and query language to users and developers. As a consequence, the presented middleware overcomes the problem of vendor lock-in and different EP systems are enabled to cooperate with each other. In practice, event streams differ dramatically in volume and velocity. We show therefore how the middleware can connect to not only different EP systems, but also database systems and a native implementation. Emerging applications such as the IoT raise novel challenges and require EP to be more dynamic. We present extensions to the middleware that enable self-adaptivity which is needed in context-sensitive applications and those that deal with constantly varying sets of event producers and consumers. Lastly, we extend the middleware to fully support the processing of events containing spatial data and to be able to run distributed in the form of a federation of heterogenous EP systems

    Continuous Top-k Queries over Real-Time Web Streams

    Get PDF
    The Web has become a large-scale real-time information system forcing us to revise both how to effectively assess relevance of information for a user and how to efficiently implement information retrieval and dissemination functionality. To increase information relevance, Real-time Web applications such as Twitter and Facebook, extend content and social-graph relevance scores with " real-time " user generated events (e.g. re-tweets, replies, likes). To accommodate high arrival rates of information items and user events we explore a pub-lish/subscribe paradigm in which we index queries and update on the fly their results each time a new item and relevant events arrive. In this setting, we need to process continuous top-k text queries combining both static and dynamic scores. To the best of our knowledge, this is the first work addressing how non-predictable, dynamic scores can be handled in a continuous top-k query setting

    Web-oriented Event Processing

    Get PDF
    How can the Web be made situation-aware? Event processing is a suitable technology for gaining the necessary real-time results. The Web, however, has many users and many application domains. Thus, we developed multi-schema friendly data models allowing the re-use and mix from diverse users and application domains. Furthermore, our methods describe protocols to exchange events on the Web, algorithms to execute the language and to calculate access rights

    Semantically-Enabled Sensor Plug & Play for the Sensor Web

    Get PDF
    Environmental sensors have continuously improved by becoming smaller, cheaper, and more intelligent over the past years. As consequence of these technological advancements, sensors are increasingly deployed to monitor our environment. The large variety of available sensor types with often incompatible protocols complicates the integration of sensors into observing systems. The standardized Web service interfaces and data encodings defined within OGC’s Sensor Web Enablement (SWE) framework make sensors available over the Web and hide the heterogeneous sensor protocols from applications. So far, the SWE framework does not describe how to integrate sensors on-the-fly with minimal human intervention. The driver software which enables access to sensors has to be implemented and the measured sensor data has to be manually mapped to the SWE models. In this article we introduce a Sensor Plug & Play infrastructure for the Sensor Web by combining (1) semantic matchmaking functionality, (2) a publish/subscribe mechanism underlying the SensorWeb, as well as (3) a model for the declarative description of sensor interfaces which serves as a generic driver mechanism. We implement and evaluate our approach by applying it to an oil spill scenario. The matchmaking is realized using existing ontologies and reasoning engines and provides a strong case for the semantic integration capabilities provided by Semantic Web research

    Acquisition and Declarative Analytical Processing of Spatio-Temporal Observation Data

    Get PDF
    A generic framework for spatio-temporal observation data acquisition and declarative analytical processing has been designed and implemented in this Thesis. The main contributions of this Thesis may be summarized as follows: 1) generalization of a data acquisition and dissemination server, with great applicability in many scientific and industrial domains, providing flexibility in the incorporation of different technologies for data acquisition, data persistence and data dissemination, 2) definition of a new hybrid logical-functional paradigm to formalize a novel data model for the integrated management of entity and sampled data, 3) definition of a novel spatio-temporal declarative data analysis language for the previous data model, 4) definition of a data warehouse data model supporting observation data semantics, including application of the above language to the declarative definition of observation processes executed during observation data load, and 5) column-oriented parallel and distributed implementation of the spatial analysis declarative language. The huge amount of data to be processed forces the exploitation of current multi-core hardware architectures and multi-node cluster infrastructures

    Adaptive Processing of Spatial-Keyword Data Over a Distributed Streaming Cluster

    Full text link
    The widespread use of GPS-enabled smartphones along with the popularity of micro-blogging and social networking applications, e.g., Twitter and Facebook, has resulted in the generation of huge streams of geo-tagged textual data. Many applications require real-time processing of these streams. For example, location-based e-coupon and ad-targeting systems enable advertisers to register millions of ads to millions of users. The number of users is typically very high and they are continuously moving, and the ads change frequently as well. Hence sending the right ad to the matching users is very challenging. Existing streaming systems are either centralized or are not spatial-keyword aware, and cannot efficiently support the processing of rapidly arriving spatial-keyword data streams. This paper presents Tornado, a distributed spatial-keyword stream processing system. Tornado features routing units to fairly distribute the workload, and furthermore, co-locate the data objects and the corresponding queries at the same processing units. The routing units use the Augmented-Grid, a novel structure that is equipped with an efficient search algorithm for distributing the data objects and queries. Tornado uses evaluators to process the data objects against the queries. The routing units minimize the redundant communication by not sending data updates for processing when these updates do not match any query. By applying dynamically evaluated cost formulae that continuously represent the processing overhead at each evaluator, Tornado is adaptive to changes in the workload. Extensive experimental evaluation using spatio-textual range queries over real Twitter data indicates that Tornado outperforms the non-spatio-textually aware approaches by up to two orders of magnitude in terms of the overall system throughput
    corecore