2,778 research outputs found

    Distributed top-k aggregation queries at large

    Get PDF
    Top-k query processing is a fundamental building block for efficient ranking in a large number of applications. Efficiency is a central issue, especially for distributed settings, when the data is spread across different nodes in a network. This paper introduces novel optimization methods for top-k aggregation queries in such distributed environments. The optimizations can be applied to all algorithms that fall into the frameworks of the prior TPUT and KLEE methods. The optimizations address three degrees of freedom: 1) hierarchically grouping input lists into top-k operator trees and optimizing the tree structure, 2) computing data-adaptive scan depths for different input sources, and 3) data-adaptive sampling of a small subset of input sources in scenarios with hundreds or thousands of query-relevant network nodes. All optimizations are based on a statistical cost model that utilizes local synopses, e.g., in the form of histograms, efficiently computed convolutions, and estimators based on order statistics. The paper presents comprehensive experiments, with three different real-life datasets and using the ns-2 network simulator for a packet-level simulation of a large Internet-style network

    Distributed Mega-Datasets: The Need for Novel Computing Primitives

    Get PDF
    © 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.With the ongoing digitalization, an increasing number of sensors is becoming part of our digital infrastructure. These sensors produce highly, even globally, distributed data streams. The aggregate data rate of these streams far exceeds local storage and computing capabilities. Yet, for radical new services (e.g., predictive maintenance and autonomous driving), which depend on various control loops, this data needs to be analyzed in a timely fashion. In this position paper, we outline a system architecture that can effectively handle distributed mega-datasets using data aggregation. Hereby, we point out two research challenges: The need for (1) novel computing primitives that allow us to aggregate data at scale across multiple hierarchies (i.e., time and location) while answering a multitude of a priori unknown queries, and (2) transfer optimizations that enable rapid local and global decision making.EC/H2020/679158/EU/Resolving the Tussle in the Internet: Mapping, Architecture, and Policy Making/ResolutioNe

    Interactive Data Exploration of Distributed Raw Files: A Systematic Mapping Study

    Get PDF
    When exploring big amounts of data without a clear target, providing an interactive experience becomes really dif cult, since this tentative inspection usually defeats any early decision on data structures or indexing strategies. This is also true in the physics domain, speci cally in high-energy physics, where the huge volume of data generated by the detectors are normally explored via C++ code using batch processing, which introduces a considerable latency. An interactive tool, when integrated into the existing data management systems, can add a great value to the usability of these platforms. Here, we intend to review the current state-of-the-art of interactive data exploration, aiming at satisfying three requirements: access to raw data les, stored in a distributed environment, and with a reasonably low latency. This paper follows the guidelines for systematic mapping studies, which is well suited for gathering and classifying available studies.We summarize the results after classifying the 242 papers that passed our inclusion criteria. While there are many proposed solutions that tackle the problem in different manners, there is little evidence available about their implementation in practice. Almost all of the solutions found by this paper cover a subset of our requirements, with only one partially satisfying the three. The solutions for data exploration abound. It is an active research area and, considering the continuous growth of data volume and variety, is only to become harder. There is a niche for research on a solution that covers our requirements, and the required building blocks are there

    Characterizing the IoT ecosystem at scale

    Get PDF
    Internet of Things (IoT) devices are extremely popular with home, business, and industrial users. To provide their services, they typically rely on a backend server in- frastructure on the Internet, which collectively form the IoT Ecosystem. This ecosys- tem is rapidly growing and offers users an increasing number of services. It also has been a source and target of significant security and privacy risks. One notable exam- ple is the recent large-scale coordinated global attacks, like Mirai, which disrupted large service providers. Thus, characterizing this ecosystem yields insights that help end-users, network operators, policymakers, and researchers better understand it, obtain a detailed view, and keep track of its evolution. In addition, they can use these insights to inform their decision-making process for mitigating this ecosystem’s security and privacy risks. In this dissertation, we characterize the IoT ecosystem at scale by (i) detecting the IoT devices in the wild, (ii) conducting a case study to measure how deployed IoT devices can affect users’ privacy, and (iii) detecting and measuring the IoT backend infrastructure. To conduct our studies, we collaborated with a large European Internet Service Provider (ISP) and a major European Internet eXchange Point (IXP). They rou- tinely collect large volumes of passive, sampled data, e.g., NetFlow and IPFIX, for their operational purposes. These data sources help providers obtain insights about their networks, and we used them to characterize the IoT ecosystem at scale. We start with IoT devices and study how to track and trace their activity in the wild. We developed and evaluated a scalable methodology to accurately detect and monitor IoT devices with limited, sparsely sampled data in the ISP and IXP. Next, we conduct a case study to measure how a myriad of deployed devices can affect the privacy of ISP subscribers. Unfortunately, we found that the privacy of a substantial fraction of IPv6 end-users is at risk. We noticed that a single device at home that encodes its MAC address into the IPv6 address could be utilized as a tracking identifier for the entire end-user prefix—even if other devices use IPv6 privacy extensions. Our results showed that IoT devices contribute the most to this privacy leakage. Finally, we focus on the backend server infrastructure and propose a methodology to identify and locate IoT backend servers operated by cloud services and IoT vendors. We analyzed their IoT traffic patterns as observed in the ISP. Our analysis sheds light on their diverse operational and deployment strategies. The need for issuing a priori unknown network-wide queries against large volumes of network flow capture data, which we used in our studies, motivated us to develop Flowyager. It is a system built on top of existing traffic capture utilities, and it relies on flow summarization techniques to reduce (i) the storage and transfer cost of flow captures and (ii) query response time. We deployed a prototype of Flowyager at both the IXP and ISP.Internet-of-Things-GerĂ€te (IoT) sind aus vielen Haushalten, BĂŒrorĂ€umen und In- dustrieanlagen nicht mehr wegzudenken. Um ihre Dienste zu erbringen, nutzen IoT- GerĂ€te typischerweise auf eine Backend-Server-Infrastruktur im Internet, welche als Gesamtheit das IoT-Ökosystem bildet. Dieses Ökosystem wĂ€chst rapide an und bie- tet den Nutzern immer mehr Dienste an. Das IoT-Ökosystem ist jedoch sowohl eine Quelle als auch ein Ziel von signifikanten Risiken fĂŒr die Sicherheit und PrivatsphĂ€re. Ein bemerkenswertes Beispiel sind die jĂŒngsten groß angelegten, koordinierten globa- len Angriffe wie Mirai, durch die große Diensteanbieter gestört haben. Deshalb ist es wichtig, dieses Ökosystem zu charakterisieren, eine ganzheitliche Sicht zu bekommen und die Entwicklung zu verfolgen, damit Forscher, EntscheidungstrĂ€ger, Endnutzer und Netzwerkbetreibern Einblicke und ein besseres VerstĂ€ndnis erlangen. Außerdem können alle Teilnehmer des Ökosystems diese Erkenntnisse nutzen, um ihre Entschei- dungsprozesse zur Verhinderung von Sicherheits- und PrivatsphĂ€rerisiken zu verbes- sern. In dieser Dissertation charakterisieren wir die Gesamtheit des IoT-Ökosystems indem wir (i) IoT-GerĂ€te im Internet detektieren, (ii) eine Fallstudie zum Einfluss von benutzten IoT-GerĂ€ten auf die PrivatsphĂ€re von Nutzern durchfĂŒhren und (iii) die IoT-Backend-Infrastruktur aufdecken und vermessen. Um unsere Studien durchzufĂŒhren, arbeiten wir mit einem großen europĂ€ischen Internet- Service-Provider (ISP) und einem großen europĂ€ischen Internet-Exchange-Point (IXP) zusammen. Diese sammeln routinemĂ€ĂŸig fĂŒr operative Zwecke große Mengen an pas- siven gesampelten Daten (z.B. als NetFlow oder IPFIX). Diese Datenquellen helfen Netzwerkbetreibern Einblicke in ihre Netzwerke zu erlangen und wir verwendeten sie, um das IoT-Ökosystem ganzheitlich zu charakterisieren. Wir beginnen unsere Analysen mit IoT-GerĂ€ten und untersuchen, wie diese im Inter- net aufgespĂŒrt und verfolgt werden können. Dazu entwickelten und evaluierten wir eine skalierbare Methodik, um IoT-GerĂ€te mit Hilfe von eingeschrĂ€nkten gesampelten Daten des ISPs und IXPs prĂ€zise erkennen und beobachten können. Als NĂ€chstes fĂŒhren wir eine Fallstudie durch, in der wir messen, wie eine Unzahl von eingesetzten GerĂ€ten die PrivatsphĂ€re von ISP-Nutzern beeinflussen kann. Lei- der fanden wir heraus, dass die PrivatsphĂ€re eines substantiellen Teils von IPv6- Endnutzern bedroht ist. Wir entdeckten, dass bereits ein einzelnes GerĂ€t im Haus, welches seine MAC-Adresse in die IPv6-Adresse kodiert, als Tracking-Identifikator fĂŒr das gesamte Endnutzer-PrĂ€fix missbraucht werden kann — auch wenn andere GerĂ€te IPv6-Privacy-Extensions verwenden. Unsere Ergebnisse zeigten, dass IoT-GerĂ€te den Großteil dieses PrivatsphĂ€re-Verlusts verursachen. Abschließend fokussieren wir uns auf die Backend-Server-Infrastruktur und wir schla- gen eine Methodik zur Identifizierung und Lokalisierung von IoT-Backend-Servern vor, welche von Cloud-Diensten und IoT-Herstellern betrieben wird. Wir analysier- ten Muster im IoT-Verkehr, der vom ISP beobachtet wird. Unsere Analyse gibt Auf- schluss ĂŒber die unterschiedlichen Strategien, wie IoT-Backend-Server betrieben und eingesetzt werden. Die Notwendigkeit a-priori unbekannte netzwerkweite Anfragen an große Mengen von Netzwerk-Flow-Daten zu stellen, welche wir in in unseren Studien verwenden, moti- vierte uns zur Entwicklung von Flowyager. Dies ist ein auf bestehenden Netzwerkverkehrs- Tools aufbauendes System und es stĂŒtzt sich auf die Zusammenfassung von Verkehrs- flĂŒssen, um (i) die Kosten fĂŒr Archivierung und Transfer von Flow-Daten und (ii) die Antwortzeit von Anfragen zu reduzieren. Wir setzten einen Prototypen von Flowyager sowohl im IXP als auch im ISP ein

    Power efficiency through tuple ranking in wireless sensor network monitoring

    Get PDF
    In this paper, we present an innovative framework for efficiently monitoring Wireless Sensor Networks (WSNs). Our framework, coined KSpot, utilizes a novel top-k query processing algorithm we developed, in conjunction with the concept of in-network views, in order to minimize the cost of query execution. For ease of exposition, consider a set of sensors acquiring data from their environment at a given time instance. The generated information can conceptually be thought as a horizontally fragmented base relation R. Furthermore, the results to a user-defined query Q, registered at some sink point, can conceptually be thought as a view V . Maintaining consistency between V and R is very expensive in terms of communication and energy. Thus, KSpot focuses on a subset Vâ€Č (⊆ V ) that unveils only the k highest-ranked answers at the sink, for some user defined parameter k. To illustrate the efficiency of our framework, we have implemented a real system in nesC, which combines the traditional advantages of declarative acquisition frameworks, like TinyDB, with the ideas presented in this work. Extensive real-world testing and experimentation with traces from University of California-Berkeley, the University of Washington and Intel Research Berkeley, show that KSpot provides an up to 66% of energy savings compared to TinyDB, minimizes both the size and number of packets transmitted over the network (up to 77%), and prolongs the longevity of a WSN deployment to new scales
    • 

    corecore