896 research outputs found

    Automated Assessment of Aggregate Query Imprecision in Dynamic Environments

    Full text link
    Abstract. Queries are widely used for acquiring data distributed in opportunistically formed mobile networks. However, when queries are executed in such dynamic settings, the returned result may not be con-sistent, i.e., it may not accurately reflect the state of the environment. It can thus be difficult to reason about the meaning of a query’s result. Reasoning about imperfections in the result becomes even more complex when in-network aggregation is employed, since only a single aggregate value is returned. We define the semantics of aggregate queries in terms of a qualitative description of consistency and a quantitative measure of imprecision. We provide a protocol that performs in-network aggregation while simultaneously generating quality assessments for the query result. The protocol enables intuitive interpretations of the semantics associated with an aggregate query’s execution in a dynamic environment.

    Trade-off among timeliness, messages and accuracy for large-Ssale information management

    Get PDF
    The increasing amount of data and the number of nodes in large-scale environments require new techniques for information management. Examples of such environments are the decentralized infrastructures of Computational Grid and Computational Cloud applications. These large-scale applications need different kinds of aggregated information such as resource monitoring, resource discovery or economic information. The challenge of providing timely and accurate information in large scale environments arise from the distribution of the information. Reasons for delays in distributed information system are a long information transmission time due to the distribution, churn and failures. A problem of large applications such as peer-to-peer (P2P) systems is the increasing retrieval time of the information due to the decentralization of the data and the failure proneness. However, many applications need a timely information provision. Another problem is an increasing network consumption when the application scales to millions of users and data. Using approximation techniques allows reducing the retrieval time and the network consumption. However, the usage of approximation techniques decreases the accuracy of the results. Thus, the remaining problem is to offer a trade-off in order to solve the conflicting requirements of fast information retrieval, accurate results and low messaging cost. Our goal is to reach a self-adaptive decision mechanism to offer a trade-off among the retrieval time, the network consumption and the accuracy of the result. Self-adaption enables distributed software to modify its behavior based on changes in the operating environment. In large-scale information systems that use hierarchical data aggregation, we apply self-adaptation to control the approximation used for the information retrieval and reduces the network consumption and the retrieval time. The hypothesis of the thesis is that approximation techniquescan reduce the retrieval time and the network consumption while guaranteeing an accuracy of the results, while considering user’s defined priorities. First, this presented research addresses the problem of a trade-off among a timely information retrieval, accurate results and low messaging cost by proposing a summarization algorithm for resource discovery in P2P-content networks. After identifying how summarization can improve the discovery process, we propose an algorithm which uses a precision-recall metric to compare the accuracy and to offer a user-driven trade-off. Second, we propose an algorithm that applies a self-adaptive decision making on each node. The decision is about the pruning of the query and returning the result instead of continuing the query. The pruning reduces the retrieval time and the network consumption at the cost of a lower accuracy in contrast to continuing the query. The algorithm uses an analytic hierarchy process to assess the user’s priorities and to propose a trade-off in order to satisfy the accuracy requirements with a low message cost and a short delay. A quantitative analysis evaluates our presented algorithms with a simulator, which is fed with real data of a network topology and the nodes’ attributes. The usage of a simulator instead of the prototype allows the evaluation in a large scale of several thousands of nodes. The algorithm for content summarization is evaluated with half a million of resources and with different query types. The selfadaptive algorithm is evaluated with a simulator of several thousands of nodes that are created from real data. A qualitative analysis addresses the integration of the simulator’s components in existing market frameworks for Computational Grid and Cloud applications. The proposed content summarization algorithm reduces the information retrieval time from a logarithmic increase to a constant factor. Furthermore, the message size is reduced significantly by applying the summarization technique. For the user, a precision-recall metric allows defining the relation between the retrieval time and the accuracy. The self-adaptive algorithm reduces the number of messages needed from an exponential increase to a constant factor. At the same time, the retrieval time is reduced to a constant factor under an increasing number of nodes. Finally, the algorithm delivers the data with the required accuracy adjusting the depth of the query according to the network conditions.La gestió de la informació exigeix noves tècniques que tractin amb la creixent quantitat de dades i nodes en entorns a gran escala. Alguns exemples d’aquests entorns són les infraestructures descentralitzades de Computacional Grid i Cloud. Les aplicacions a gran escala necessiten diferents classes d’informació agregada com monitorització de recursos i informació econòmica. El desafiament de proporcionar una provisió ràpida i acurada d’informació en ambients de grans escala sorgeix de la distribució de la informació. Una raó és que el sistema d’informació ha de tractar amb l’adaptabilitat i fracassos d’aquests ambients. Un problema amb aplicacions molt grans com en sistemes peer-to-peer (P2P) és el creixent temps de recuperació de l’informació a causa de la descentralització de les dades i la facilitat al fracàs. No obstant això, moltes aplicacions necessiten una provisió d’informació puntual. A més, alguns usuaris i aplicacions accepten inexactituds dels resultats si la informació es reparteix a temps. A més i més, el consum de xarxa creixent fa que sorgeixi un altre problema per l’escalabilitat del sistema. La utilització de tècniques d’aproximació permet reduir el temps de recuperació i el consum de xarxa. No obstant això, l’ús de tècniques d’aproximació disminueix la precisió dels resultats. Així, el problema restant és oferir un compromís per resoldre els requisits en conflicte d’extracció de la informació ràpida, resultats acurats i cost d’enviament baix. El nostre objectiu és obtenir un mecanisme de decisió completament autoadaptatiu per tal d’oferir el compromís entre temps de recuperació, consum de xarxa i precisió del resultat. Autoadaptacío permet al programari distribuït modificar el seu comportament en funció dels canvis a l’entorn d’operació. En sistemes d’informació de gran escala que utilitzen agregació de dades jeràrquica, l’auto-adaptació permet controlar l’aproximació utilitzada per a l’extracció de la informació i redueixen el consum de xarxa i el temps de recuperació. La hipòtesi principal d’aquesta tesi és que els tècniques d’aproximació permeten reduir el temps de recuperació i el consum de xarxa mentre es garanteix una precisió adequada definida per l’usari. La recerca que es presenta, introdueix un algoritme de sumarització de continguts per a la descoberta de recursos a xarxes de contingut P2P. Després d’identificar com sumarització pot millorar el procés de descoberta, proposem una mètrica que s’utilitza per comparar la precisió i oferir un compromís definit per l’usuari. Després, introduïm un algoritme nou que aplica l’auto-adaptació a un ordre per satisfer els requisits de precisió amb un cost de missatge baix i un retard curt. Basat en les prioritats d’usuari, l’algoritme troba automàticament un compromís. L’anàlisi quantitativa avalua els algoritmes presentats amb un simulador per permetre l’evacuació d’uns quants milers de nodes. El simulador s’alimenta amb dades d’una topologia de xarxa i uns atributs dels nodes reals. L’algoritme de sumarització de contingut s’avalua amb mig milió de recursos i amb diferents tipus de sol·licituds. L’anàlisi qualitativa avalua la integració del components del simulador en estructures de mercat existents per a aplicacions de Computacional Grid i Cloud. Així, la funcionalitat implementada del simulador (com el procés d’agregació i la query language) és comprovada per la integració de prototips. L’algoritme de sumarització de contingut proposat redueix el temps d’extracció de l’informació d’un augment logarítmic a un factor constant. A més, també permet que la mida del missatge es redueix significativament. Per a l’usuari, una precision-recall mètric permet definir la relació entre el nivell de precisió i el temps d’extracció de la informació. Alhora, el temps de recuperació es redueix a un factor constant sota un nombre creixent de nodes. Finalment, l’algoritme reparteix les dades amb la precisió exigida i ajusta la profunditat de la sol·licitud segons les condicions de xarxa. Els algoritmes introduïts són prometedors per ser utilitzats per l’agregació d’informació en nous sistemes de gestió de la informació de gran escala en el futur.Postprint (published version

    Trade-off among timeliness, messages and accuracy for large-Ssale information management

    Get PDF
    The increasing amount of data and the number of nodes in large-scale environments require new techniques for information management. Examples of such environments are the decentralized infrastructures of Computational Grid and Computational Cloud applications. These large-scale applications need different kinds of aggregated information such as resource monitoring, resource discovery or economic information. The challenge of providing timely and accurate information in large scale environments arise from the distribution of the information. Reasons for delays in distributed information system are a long information transmission time due to the distribution, churn and failures. A problem of large applications such as peer-to-peer (P2P) systems is the increasing retrieval time of the information due to the decentralization of the data and the failure proneness. However, many applications need a timely information provision. Another problem is an increasing network consumption when the application scales to millions of users and data. Using approximation techniques allows reducing the retrieval time and the network consumption. However, the usage of approximation techniques decreases the accuracy of the results. Thus, the remaining problem is to offer a trade-off in order to solve the conflicting requirements of fast information retrieval, accurate results and low messaging cost. Our goal is to reach a self-adaptive decision mechanism to offer a trade-off among the retrieval time, the network consumption and the accuracy of the result. Self-adaption enables distributed software to modify its behavior based on changes in the operating environment. In large-scale information systems that use hierarchical data aggregation, we apply self-adaptation to control the approximation used for the information retrieval and reduces the network consumption and the retrieval time. The hypothesis of the thesis is that approximation techniquescan reduce the retrieval time and the network consumption while guaranteeing an accuracy of the results, while considering user’s defined priorities. First, this presented research addresses the problem of a trade-off among a timely information retrieval, accurate results and low messaging cost by proposing a summarization algorithm for resource discovery in P2P-content networks. After identifying how summarization can improve the discovery process, we propose an algorithm which uses a precision-recall metric to compare the accuracy and to offer a user-driven trade-off. Second, we propose an algorithm that applies a self-adaptive decision making on each node. The decision is about the pruning of the query and returning the result instead of continuing the query. The pruning reduces the retrieval time and the network consumption at the cost of a lower accuracy in contrast to continuing the query. The algorithm uses an analytic hierarchy process to assess the user’s priorities and to propose a trade-off in order to satisfy the accuracy requirements with a low message cost and a short delay. A quantitative analysis evaluates our presented algorithms with a simulator, which is fed with real data of a network topology and the nodes’ attributes. The usage of a simulator instead of the prototype allows the evaluation in a large scale of several thousands of nodes. The algorithm for content summarization is evaluated with half a million of resources and with different query types. The selfadaptive algorithm is evaluated with a simulator of several thousands of nodes that are created from real data. A qualitative analysis addresses the integration of the simulator’s components in existing market frameworks for Computational Grid and Cloud applications. The proposed content summarization algorithm reduces the information retrieval time from a logarithmic increase to a constant factor. Furthermore, the message size is reduced significantly by applying the summarization technique. For the user, a precision-recall metric allows defining the relation between the retrieval time and the accuracy. The self-adaptive algorithm reduces the number of messages needed from an exponential increase to a constant factor. At the same time, the retrieval time is reduced to a constant factor under an increasing number of nodes. Finally, the algorithm delivers the data with the required accuracy adjusting the depth of the query according to the network conditions.La gestió de la informació exigeix noves tècniques que tractin amb la creixent quantitat de dades i nodes en entorns a gran escala. Alguns exemples d’aquests entorns són les infraestructures descentralitzades de Computacional Grid i Cloud. Les aplicacions a gran escala necessiten diferents classes d’informació agregada com monitorització de recursos i informació econòmica. El desafiament de proporcionar una provisió ràpida i acurada d’informació en ambients de grans escala sorgeix de la distribució de la informació. Una raó és que el sistema d’informació ha de tractar amb l’adaptabilitat i fracassos d’aquests ambients. Un problema amb aplicacions molt grans com en sistemes peer-to-peer (P2P) és el creixent temps de recuperació de l’informació a causa de la descentralització de les dades i la facilitat al fracàs. No obstant això, moltes aplicacions necessiten una provisió d’informació puntual. A més, alguns usuaris i aplicacions accepten inexactituds dels resultats si la informació es reparteix a temps. A més i més, el consum de xarxa creixent fa que sorgeixi un altre problema per l’escalabilitat del sistema. La utilització de tècniques d’aproximació permet reduir el temps de recuperació i el consum de xarxa. No obstant això, l’ús de tècniques d’aproximació disminueix la precisió dels resultats. Així, el problema restant és oferir un compromís per resoldre els requisits en conflicte d’extracció de la informació ràpida, resultats acurats i cost d’enviament baix. El nostre objectiu és obtenir un mecanisme de decisió completament autoadaptatiu per tal d’oferir el compromís entre temps de recuperació, consum de xarxa i precisió del resultat. Autoadaptacío permet al programari distribuït modificar el seu comportament en funció dels canvis a l’entorn d’operació. En sistemes d’informació de gran escala que utilitzen agregació de dades jeràrquica, l’auto-adaptació permet controlar l’aproximació utilitzada per a l’extracció de la informació i redueixen el consum de xarxa i el temps de recuperació. La hipòtesi principal d’aquesta tesi és que els tècniques d’aproximació permeten reduir el temps de recuperació i el consum de xarxa mentre es garanteix una precisió adequada definida per l’usari. La recerca que es presenta, introdueix un algoritme de sumarització de continguts per a la descoberta de recursos a xarxes de contingut P2P. Després d’identificar com sumarització pot millorar el procés de descoberta, proposem una mètrica que s’utilitza per comparar la precisió i oferir un compromís definit per l’usuari. Després, introduïm un algoritme nou que aplica l’auto-adaptació a un ordre per satisfer els requisits de precisió amb un cost de missatge baix i un retard curt. Basat en les prioritats d’usuari, l’algoritme troba automàticament un compromís. L’anàlisi quantitativa avalua els algoritmes presentats amb un simulador per permetre l’evacuació d’uns quants milers de nodes. El simulador s’alimenta amb dades d’una topologia de xarxa i uns atributs dels nodes reals. L’algoritme de sumarització de contingut s’avalua amb mig milió de recursos i amb diferents tipus de sol·licituds. L’anàlisi qualitativa avalua la integració del components del simulador en estructures de mercat existents per a aplicacions de Computacional Grid i Cloud. Així, la funcionalitat implementada del simulador (com el procés d’agregació i la query language) és comprovada per la integració de prototips. L’algoritme de sumarització de contingut proposat redueix el temps d’extracció de l’informació d’un augment logarítmic a un factor constant. A més, també permet que la mida del missatge es redueix significativament. Per a l’usuari, una precision-recall mètric permet definir la relació entre el nivell de precisió i el temps d’extracció de la informació. Alhora, el temps de recuperació es redueix a un factor constant sota un nombre creixent de nodes. Finalment, l’algoritme reparteix les dades amb la precisió exigida i ajusta la profunditat de la sol·licitud segons les condicions de xarxa. Els algoritmes introduïts són prometedors per ser utilitzats per l’agregació d’informació en nous sistemes de gestió de la informació de gran escala en el futur

    Who you gonna call? Analyzing Web Requests in Android Applications

    Full text link
    Relying on ubiquitous Internet connectivity, applications on mobile devices frequently perform web requests during their execution. They fetch data for users to interact with, invoke remote functionalities, or send user-generated content or meta-data. These requests collectively reveal common practices of mobile application development, like what external services are used and how, and they point to possible negative effects like security and privacy violations, or impacts on battery life. In this paper, we assess different ways to analyze what web requests Android applications make. We start by presenting dynamic data collected from running 20 randomly selected Android applications and observing their network activity. Next, we present a static analysis tool, Stringoid, that analyzes string concatenations in Android applications to estimate constructed URL strings. Using Stringoid, we extract URLs from 30, 000 Android applications, and compare the performance with a simpler constant extraction analysis. Finally, we present a discussion of the advantages and limitations of dynamic and static analyses when extracting URLs, as we compare the data extracted by Stringoid from the same 20 applications with the dynamically collected data

    Performance assessment of real-time data management on wireless sensor networks

    Get PDF
    Technological advances in recent years have allowed the maturity of Wireless Sensor Networks (WSNs), which aim at performing environmental monitoring and data collection. This sort of network is composed of hundreds, thousands or probably even millions of tiny smart computers known as wireless sensor nodes, which may be battery powered, equipped with sensors, a radio transceiver, a Central Processing Unit (CPU) and some memory. However due to the small size and the requirements of low-cost nodes, these sensor node resources such as processing power, storage and especially energy are very limited. Once the sensors perform their measurements from the environment, the problem of data storing and querying arises. In fact, the sensors have restricted storage capacity and the on-going interaction between sensors and environment results huge amounts of data. Techniques for data storage and query in WSN can be based on either external storage or local storage. The external storage, called warehousing approach, is a centralized system on which the data gathered by the sensors are periodically sent to a central database server where user queries are processed. The local storage, in the other hand called distributed approach, exploits the capabilities of sensors calculation and the sensors act as local databases. The data is stored in a central database server and in the devices themselves, enabling one to query both. The WSNs are used in a wide variety of applications, which may perform certain operations on collected sensor data. However, for certain applications, such as real-time applications, the sensor data must closely reflect the current state of the targeted environment. However, the environment changes constantly and the data is collected in discreet moments of time. As such, the collected data has a temporal validity, and as time advances, it becomes less accurate, until it does not reflect the state of the environment any longer. Thus, these applications must query and analyze the data in a bounded time in order to make decisions and to react efficiently, such as industrial automation, aviation, sensors network, and so on. In this context, the design of efficient real-time data management solutions is necessary to deal with both time constraints and energy consumption. This thesis studies the real-time data management techniques for WSNs. It particularly it focuses on the study of the challenges in handling real-time data storage and query for WSNs and on the efficient real-time data management solutions for WSNs. First, the main specifications of real-time data management are identified and the available real-time data management solutions for WSNs in the literature are presented. Secondly, in order to provide an energy-efficient real-time data management solution, the techniques used to manage data and queries in WSNs based on the distributed paradigm are deeply studied. In fact, many research works argue that the distributed approach is the most energy-efficient way of managing data and queries in WSNs, instead of performing the warehousing. In addition, this approach can provide quasi real-time query processing because the most current data will be retrieved from the network. Thirdly, based on these two studies and considering the complexity of developing, testing, and debugging this kind of complex system, a model for a simulation framework of the real-time databases management on WSN that uses a distributed approach and its implementation are proposed. This will help to explore various solutions of real-time database techniques on WSNs before deployment for economizing money and time. Moreover, one may improve the proposed model by adding the simulation of protocols or place part of this simulator on another available simulator. For validating the model, a case study considering real-time constraints as well as energy constraints is discussed. Fourth, a new architecture that combines statistical modeling techniques with the distributed approach and a query processing algorithm to optimize the real-time user query processing are proposed. This combination allows performing a query processing algorithm based on admission control that uses the error tolerance and the probabilistic confidence interval as admission parameters. The experiments based on real world data sets as well as synthetic data sets demonstrate that the proposed solution optimizes the real-time query processing to save more energy while meeting low latency.Fundação para a Ciência e Tecnologi

    Privacy, Access Control, and Integrity for Large Graph Databases

    Get PDF
    Graph data are extensively utilized in social networks, collaboration networks, geo-social networks, and communication networks. Their growing usage in cyberspaces poses daunting security and privacy challenges. Data publication requires privacy-protection mechanisms to guard against information breaches. In addition, access control mechanisms can be used to allow controlled sharing of data. Provision of privacy-protection, access control, and data integrity for graph data require a holistic approach for data management and secure query processing. This thesis presents such an approach. In particular, the thesis addresses two notable challenges for graph databases, which are: i) how to ensure users\u27 privacy in published graph data under an access control policy enforcement, and ii) how to verify the integrity and query results of graph datasets. To address the first challenge, a privacy-protection framework under role-based access control (RBAC) policy constraints is proposed. The design of such a framework poses a trade-off problem, which is proved to be NP-complete. Novel heuristic solutions are provided to solve the constraint problem. To the best of our knowledge, this is the first scheme that studies the trade-off between RBAC policy constraints and privacy-protection for graph data. To address the second challenge, a cryptographic security model based on Hash Message Authentic Codes (HMACs) is proposed. The model ensures integrity and completeness verification of data and query results under both two-party and third-party data distribution environments. Unique solutions based on HMACs for integrity verification of graph data are developed and detailed security analysis is provided for the proposed schemes. Extensive experimental evaluations are conducted to illustrate the performance of proposed algorithms

    A technique for determining viable military logistics support alternatives

    Get PDF
    A look at today's US military will see them operating much beyond the scope of protecting and defending the United States. These operations now consist of, but are not limited to humanitarian aid, disaster relief, and conflict resolution. This broad spectrum of operational environments has necessitated a transformation of the individual military services into a hybrid force that can leverage the inherent and emerging capabilities from the strengths of those under the umbrella of the Department of Defense (DOD), this concept has been coined Joint Operations. Supporting Joint Operations requires a new approach to determining a viable military logistics support system. The logistics architecture for these operations has to accommodate scale, time, varied mission objectives, and imperfect information. Compounding the problem is the human in the loop (HITL) decision maker (DM) who is a necessary component for quickly assessing and planning logistics support activities. Past outcomes are not necessarily good indicators of future results, but they can provide a reasonable starting point for planning and prediction of specific needs for future requirements. Adequately forecasting the necessary logistical support structure and commodities needed for any resource intensive environment has progressed well beyond stable demand assumptions to one in which dynamic and nonlinear environments can be captured with some degree of fidelity and accuracy. While these advances are important, a holistic approach that allows exploration of the operational environment or design space does not exist to guide the military logistician in a methodical way to support military forecasting activities. To bridge this capability gap, a method called A Technique for Logistics Architecture Selection (ATLAS) has been developed. This thesis describes and applies the ATLAS method to a notional military scenario that involves the Navy concept of Seabasing and the Marine Corps concept of Distributed Operations applied to a platoon sized element. This work uses modeling and simulation to incorporate expert opinion and knowledge of military operations, dynamic reasoning methods, and certainty analysis to create a decisions support system (DSS) that can be used to provide the DM an enhanced view of the logistics environment and variables that impact specific measures of effectiveness.Ph.D.Committee Chair: Mavris, Dimitri; Committee Member: Fahringer, Philip; Committee Member: Nixon, Janel; Committee Member: Schrage, Daniel; Committee Member: Soban, Danielle; Committee Member: Vachtsevanos, Georg

    Early aspects: aspect-oriented requirements engineering and architecture design

    Get PDF
    This paper reports on the third Early Aspects: Aspect-Oriented Requirements Engineering and Architecture Design Workshop, which has been held in Lancaster, UK, on March 21, 2004. The workshop included a presentation session and working sessions in which the particular topics on early aspects were discussed. The primary goal of the workshop was to focus on challenges to defining methodical software development processes for aspects from early on in the software life cycle and explore the potential of proposed methods and techniques to scale up to industrial applications

    A Probabilistic Framework for Imitating Human Race Driver Behavior

    Full text link
    Understanding and modeling human driver behavior is crucial for advanced vehicle development. However, unique driving styles, inconsistent behavior, and complex decision processes render it a challenging task, and existing approaches often lack variability or robustness. To approach this problem, we propose Probabilistic Modeling of Driver behavior (ProMoD), a modular framework which splits the task of driver behavior modeling into multiple modules. A global target trajectory distribution is learned with Probabilistic Movement Primitives, clothoids are utilized for local path generation, and the corresponding choice of actions is performed by a neural network. Experiments in a simulated car racing setting show considerable advantages in imitation accuracy and robustness compared to other imitation learning algorithms. The modular architecture of the proposed framework facilitates straightforward extensibility in driving line adaptation and sequencing of multiple movement primitives for future research.Comment: updated references [17] and [33]; added journal inf

    Uncertainty Management of Intelligent Feature Selection in Wireless Sensor Networks

    Get PDF
    Wireless sensor networks (WSN) are envisioned to revolutionize the paradigm of monitoring complex real-world systems at a very high resolution. However, the deployment of a large number of unattended sensor nodes in hostile environments, frequent changes of environment dynamics, and severe resource constraints pose uncertainties and limit the potential use of WSN in complex real-world applications. Although uncertainty management in Artificial Intelligence (AI) is well developed and well investigated, its implications in wireless sensor environments are inadequately addressed. This dissertation addresses uncertainty management issues of spatio-temporal patterns generated from sensor data. It provides a framework for characterizing spatio-temporal pattern in WSN. Using rough set theory and temporal reasoning a novel formalism has been developed to characterize and quantify the uncertainties in predicting spatio-temporal patterns from sensor data. This research also uncovers the trade-off among the uncertainty measures, which can be used to develop a multi-objective optimization model for real-time decision making in sensor data aggregation and samplin
    corecore