537 research outputs found
Quality of Service Aware Data Stream Processing for Highly Dynamic and Scalable Applications
Huge amounts of georeferenced data streams are arriving daily to data stream management systems that are deployed for serving highly scalable and dynamic applications. There are innumerable ways at which those loads can be exploited to gain deep insights in various domains. Decision makers require an interactive visualization of such data in the form of maps and dashboards for decision making and strategic planning. Data streams normally exhibit fluctuation and oscillation in arrival rates and skewness. Those are the two predominant factors that greatly impact the overall quality of service. This requires data stream management systems to be attuned to those factors in addition to the spatial shape of the data that may exaggerate the negative impact of those factors. Current systems do not natively support services with quality guarantees for dynamic scenarios, leaving the handling of those logistics to the user which is challenging and cumbersome. Three workloads are predominant for any data stream, batch processing, scalable storage and stream processing. In this thesis, we have designed a quality of service aware system, SpatialDSMS, that constitutes several subsystems that are covering those loads and any mixed load that results from intermixing them. Most importantly, we natively have incorporated quality of service optimizations for processing avalanches of geo-referenced data streams in highly dynamic application scenarios. This has been achieved transparently on top of the codebases of emerging de facto standard best-in-class representatives, thus relieving the overburdened shoulders of the users in the presentation layer from having to reason about those services. Instead, users express their queries with quality goals and our system optimizers compiles that down into query plans with an embedded quality guarantee and leaves logistic handling to the underlying layers. We have developed standard compliant prototypes for all the subsystems that constitutes SpatialDSMS
MASTAQ: A Middleware Architecture for Sensor Applications with Statistical Quality Constraints
We present the design goals and functional components of MASTAQ, a data management middleware for pervasive applications that utilize sensor data. MASTAQ allows applications to specify their quality-of information (QoI) preferences (in terms of statistical metrics over the data) independent of the underlying network topology. It then achieves energy efficiency by adaptively activating and querying only the subset of sensor nodes needed to meet the target QoI bounds. We also present a closed-loop feedback mechanism based on broadcasting of activation probabilities, which allows MASTAQ to activate the appropriate number of sensors without requiring any inter-sensor coordination or knowledge of the actual deployment.1
Adaptive Filters for Continuous Queries over Distributed Data Stream
We consider an environment where distributed data sources continuously stream updates to a centralized processor that monitors continuous queries over the distributed data. Significant communication overhead is incurred in the presence of rapid update streams, and we propose a new technique for reducing the overhead. Users register continuous queries with precision requirements at the central stream processor, which installs filters at remote data sources. The filters adapt to changing conditions to minimize stream rates while guaranteeing that all continuous queries still receive the updates necessary to provide answers of adequate precision at all times. Our approach enables applications to trade precision for communication overhead at a fine granularity by individually adjusting the precision constraints of continuous queries over streams in a multi-query workload
The Homeostasis Protocol: Avoiding Transaction Coordination Through Program Analysis
Datastores today rely on distribution and replication to achieve improved
performance and fault-tolerance. But correctness of many applications depends
on strong consistency properties - something that can impose substantial
overheads, since it requires coordinating the behavior of multiple nodes. This
paper describes a new approach to achieving strong consistency in distributed
systems while minimizing communication between nodes. The key insight is to
allow the state of the system to be inconsistent during execution, as long as
this inconsistency is bounded and does not affect transaction correctness. In
contrast to previous work, our approach uses program analysis to extract
semantic information about permissible levels of inconsistency and is fully
automated. We then employ a novel homeostasis protocol to allow sites to
operate independently, without communicating, as long as any inconsistency is
governed by appropriate treaties between the nodes. We discuss mechanisms for
optimizing treaties based on workload characteristics to minimize
communication, as well as a prototype implementation and experiments that
demonstrate the benefits of our approach on common transactional benchmarks
Optimizing Notifications of Subscription-Based Forecast Queries
Integrating sophisticated statistical methods into database management systems is gaining more and more attention in research and industry. One important statistical method is time series forecasting, which is crucial for decision management in many domains. In this context, previous work addressed the processing of ad-hoc and recurring forecast queries. In contrast, we focus on subscription-based forecast queries that arise when an application (subscriber) continuously requires forecast values for further processing. Forecast queries exhibit the unique characteristic that the underlying forecast model is updated with each new actual value and better forecast values might be available. However, (re-)sending new forecast values to the subscriber for every new value is infeasible because this can cause significant overhead at the subscriber side. The subscriber therefore wishes to be notified only when forecast values have changed relevant to the application. In this paper, we reduce the costs of the subscriber by optimizing the notifications sent to the subscriber, i.e., by balancing the number of notifications and the notification length. We introduce a generic cost model to capture arbitrary subscriber cost functions and discuss different optimization approaches that reduce the subscriber costs while ensuring constrained forecast values deviations. Our experimental evaluation on real datasets shows the validity of our approach with low computational costs
Quality and context-aware smart health care: Evaluating the cost-quality dynamics
Many emerging pervasive health-care applications require the determination of a variety of context attributes of an individual\u27s activities and medical parameters and her surrounding environment. Context is a high-level representation of an entity\u27s state, which captures activities, relationships, capabilities, etc. In practice, high-level context measures are often difficult to sense from a single data source and must instead be inferred using multiple sensors embedded in the environment. A key challenge in deploying context-driven health-care applications involves energy-efficient determination or inference of high-level context information from low-level sensor data streams. Because this abstraction has the potential to reduce the quality of the context information, it is also necessary to model the tradeoff between the cost of sensor data collection and the quality of the inferred context. This article describes a model of context inference in pervasive computing, the associated research challenges, and the significant practical impact of intelligent use of such context in pervasive health-care environments
Analyzing the Impact of RDF Graph Structure on Dataset Search: A Case Study with ACORDAR
openNel mondo del Semantic Web, RDF si pone come elemento cardine per la modellazione precisa dei dati e dei loro legami.
L'obiettivo centrale di questo lavoro è esplorare le dinamiche dei grafi RDF, mettendo in luce le principali problematiche e potenzialità nell'ambito della ricerca di dataset.
Il caso studio di ACORDAR viene esaminato per illustrare l'effetto delle strutture a grafo sull'organizzazione dei dati.
Vengono analizzate le tecniche di serializzazione in RDF, sottolineando la centralità di elementi quali gli URI e le capacità avanzate offerte da SPARQL.
Si affronta il tema della riproducibilità di ACORDAR, mettendo in risalto l'importanza dei metadati nella fase di ricerca dei dataset.
In conclusione, si delineano prospettive future per ottimizzare la ricerca di dataset,
arricchendo l'analisi con informazioni tratte dalle strutture a grafo e avvalendosi delle tecnologie emergenti.RDF plays a central role in the era of the Semantic Web, enabling a structured representation of datasets and their relationships.
The complex nature of RDF graph structures significantly influences the retrieval of datasets, offering a blend of both challenges and possibilities.
Delving deeply into the ACORDAR case study, the work unveils how graph structures influence dataset retrieval and the organization of data.
Furthermore, it introduces serialization methods within RDF, emphasizing the importance of URI and the capabilities of the SPARQL.
Presenting the ACORDAR reproducibility, the research underscores the significance of metadata in dataset search.
Exploring potential avenues for future research in dataset search, the investigation integrates graph structures and harnesses emerging technologies from the Semantic Web era
Trade-off among timeliness, messages and accuracy for large-Ssale information management
The increasing amount of data and the number of nodes in large-scale environments
require new techniques for information management. Examples of such environments
are the decentralized infrastructures of Computational Grid and Computational
Cloud applications. These large-scale applications need different kinds
of aggregated information such as resource monitoring, resource discovery or economic
information. The challenge of providing timely and accurate information
in large scale environments arise from the distribution of the information. Reasons
for delays in distributed information system are a long information transmission
time due to the distribution, churn and failures.
A problem of large applications such as peer-to-peer (P2P) systems is the increasing
retrieval time of the information due to the decentralization of the data
and the failure proneness. However, many applications need a timely information
provision. Another problem is an increasing network consumption when the application
scales to millions of users and data. Using approximation techniques allows
reducing the retrieval time and the network consumption. However, the usage of
approximation techniques decreases the accuracy of the results. Thus, the remaining
problem is to offer a trade-off in order to solve the conflicting requirements of
fast information retrieval, accurate results and low messaging cost.
Our goal is to reach a self-adaptive decision mechanism to offer a trade-off
among the retrieval time, the network consumption and the accuracy of the result.
Self-adaption enables distributed software to modify its behavior based on
changes in the operating environment. In large-scale information systems that use
hierarchical data aggregation, we apply self-adaptation to control the approximation
used for the information retrieval and reduces the network consumption and
the retrieval time. The hypothesis of the thesis is that approximation techniquescan reduce the retrieval time and the network consumption while guaranteeing an
accuracy of the results, while considering user’s defined priorities.
First, this presented research addresses the problem of a trade-off among a
timely information retrieval, accurate results and low messaging cost by proposing
a summarization algorithm for resource discovery in P2P-content networks.
After identifying how summarization can improve the discovery process, we propose
an algorithm which uses a precision-recall metric to compare the accuracy
and to offer a user-driven trade-off. Second, we propose an algorithm that applies
a self-adaptive decision making on each node. The decision is about the pruning
of the query and returning the result instead of continuing the query. The pruning
reduces the retrieval time and the network consumption at the cost of a lower accuracy
in contrast to continuing the query. The algorithm uses an analytic hierarchy
process to assess the user’s priorities and to propose a trade-off in order to satisfy
the accuracy requirements with a low message cost and a short delay.
A quantitative analysis evaluates our presented algorithms with a simulator,
which is fed with real data of a network topology and the nodes’ attributes. The
usage of a simulator instead of the prototype allows the evaluation in a large scale
of several thousands of nodes. The algorithm for content summarization is evaluated
with half a million of resources and with different query types. The selfadaptive
algorithm is evaluated with a simulator of several thousands of nodes
that are created from real data. A qualitative analysis addresses the integration
of the simulator’s components in existing market frameworks for Computational
Grid and Cloud applications.
The proposed content summarization algorithm reduces the information retrieval
time from a logarithmic increase to a constant factor. Furthermore, the
message size is reduced significantly by applying the summarization technique.
For the user, a precision-recall metric allows defining the relation between the retrieval
time and the accuracy. The self-adaptive algorithm reduces the number of
messages needed from an exponential increase to a constant factor. At the same
time, the retrieval time is reduced to a constant factor under an increasing number
of nodes. Finally, the algorithm delivers the data with the required accuracy
adjusting the depth of the query according to the network conditions.La gestió de la informació exigeix noves tècniques que tractin amb la creixent
quantitat de dades i nodes en entorns a gran escala. Alguns exemples d’aquests
entorns són les infraestructures descentralitzades de Computacional Grid i Cloud.
Les aplicacions a gran escala necessiten diferents classes d’informació agregada
com monitorització de recursos i informació econòmica. El desafiament de proporcionar
una provisió ràpida i acurada d’informació en ambients de grans escala
sorgeix de la distribució de la informació. Una raó és que el sistema d’informació
ha de tractar amb l’adaptabilitat i fracassos d’aquests ambients.
Un problema amb aplicacions molt grans com en sistemes peer-to-peer (P2P)
és el creixent temps de recuperació de l’informació a causa de la descentralització
de les dades i la facilitat al fracàs. No obstant això, moltes aplicacions necessiten
una provisió d’informació puntual. A més, alguns usuaris i aplicacions accepten
inexactituds dels resultats si la informació es reparteix a temps. A més i més, el
consum de xarxa creixent fa que sorgeixi un altre problema per l’escalabilitat del
sistema. La utilització de tècniques d’aproximació permet reduir el temps de recuperació
i el consum de xarxa. No obstant això, l’ús de tècniques d’aproximació
disminueix la precisió dels resultats. Així, el problema restant és oferir un compromís
per resoldre els requisits en conflicte d’extracció de la informació ràpida,
resultats acurats i cost d’enviament baix.
El nostre objectiu és obtenir un mecanisme de decisió completament autoadaptatiu
per tal d’oferir el compromís entre temps de recuperació, consum de
xarxa i precisió del resultat. Autoadaptacío permet al programari distribuït modificar
el seu comportament en funció dels canvis a l’entorn d’operació. En sistemes
d’informació de gran escala que utilitzen agregació de dades jeràrquica,
l’auto-adaptació permet controlar l’aproximació utilitzada per a l’extracció de la informació i redueixen el consum de xarxa i el temps de recuperació. La hipòtesi
principal d’aquesta tesi és que els tècniques d’aproximació permeten reduir el
temps de recuperació i el consum de xarxa mentre es garanteix una precisió adequada
definida per l’usari.
La recerca que es presenta, introdueix un algoritme de sumarització de continguts
per a la descoberta de recursos a xarxes de contingut P2P. Després d’identificar
com sumarització pot millorar el procés de descoberta, proposem una mètrica que
s’utilitza per comparar la precisió i oferir un compromís definit per l’usuari. Després,
introduïm un algoritme nou que aplica l’auto-adaptació a un ordre per satisfer
els requisits de precisió amb un cost de missatge baix i un retard curt. Basat
en les prioritats d’usuari, l’algoritme troba automàticament un compromís.
L’anàlisi quantitativa avalua els algoritmes presentats amb un simulador per
permetre l’evacuació d’uns quants milers de nodes. El simulador s’alimenta amb
dades d’una topologia de xarxa i uns atributs dels nodes reals. L’algoritme de
sumarització de contingut s’avalua amb mig milió de recursos i amb diferents
tipus de sol·licituds. L’anàlisi qualitativa avalua la integració del components del
simulador en estructures de mercat existents per a aplicacions de Computacional
Grid i Cloud. Així, la funcionalitat implementada del simulador (com el procés
d’agregació i la query language) és comprovada per la integració de prototips.
L’algoritme de sumarització de contingut proposat redueix el temps d’extracció
de l’informació d’un augment logarítmic a un factor constant. A més, també permet
que la mida del missatge es redueix significativament. Per a l’usuari, una
precision-recall mètric permet definir la relació entre el nivell de precisió i el
temps d’extracció de la informació. Alhora, el temps de recuperació es redueix
a un factor constant sota un nombre creixent de nodes. Finalment, l’algoritme
reparteix les dades amb la precisió exigida i ajusta la profunditat de la sol·licitud
segons les condicions de xarxa. Els algoritmes introduïts són prometedors per ser
utilitzats per l’agregació d’informació en nous sistemes de gestió de la informació
de gran escala en el futur
CAPS: Energy-Efficient Processing of Continuous Aggregate Queries in Sensor Networks
In this paper, we design and evaluate an energy efficient data retrieval architecture for continuous aggregate queries in wireless sensor networks. We show how the modification of precision in one sensor affects the sample-reporting fre-quency of other sensors, and how the precisions of a group of sensors may be collectively modified to achieve the target Quality of Information (QoI) with higher energy-efficiency. The proposed Collective Adaptive Precision Setting (CAPS) architecture is then extended to exploit the observed tempo-ral correlation among successive sensor samples for even greater energy efficiency. Detailed simulations with syn-thetic and real data traces demonstrate how the combi-nation of weak consistency semantics and temporal corre-lation can dramatically lower the energy consumption in practical sensor environments.
- …