Search CORE

125 research outputs found

Revealing Hidden Hierarchical Heavy Hitters in network traffic

Author: Antichi G
Bianchi G
Bifulco R
Galea S
Moore AW
Publication venue: SIGCOMM 2018 - Proceedings of the 2018 Posters and Demos, Part of SIGCOMM 2018
Publication date: 01/01/2018
Field of study

© 2018 Association for Computing Machinery. The idea to enable advanced in-network monitoring functionality has been lately fostered by the advent of massive data-plane programmability. A specific example includes the detection of traffic aggregates with programmable switches, i.e., heavy hitters. So far, proposed solutions implement the mining process by partitioning the network stream in disjoint windows. This practice allows efficient implementations but comes at a well-known cost: the results are tightly coupled with the traffic and window's characteristics. This poster quantifies the limitations of disjoint time windows approaches by showing that they hardly cope with traffic dynamics. We report the results of our analysis and unveil that up to 34% of the total number of the hierarchical heavy hitters might not be detected with those approaches. This is a call for a new set of windowless-based algorithms to be implemented with the match-action paradigm

Lightweight Techniques for Private Heavy Hitters

Author: Boneh Dan
Boyle Elette
Corrigan-Gibbs Henry
Gilboa Niv
Ishai Yuval
Publication venue
Publication date: 02/04/2021
Field of study

This paper presents a new protocol for solving the private heavy-hitters problem. In this problem, there are many clients and a small set of data-collection servers. Each client holds a private bitstring. The servers want to recover the set of all popular strings, without learning anything else about any client's string. A web-browser vendor, for instance, can use our protocol to figure out which homepages are popular, without learning any user's homepage. We also consider the simpler private subset-histogram problem, in which the servers want to count how many clients hold strings in a particular set without revealing this set to the clients. Our protocols use two data-collection servers and, in a protocol run, each client send sends only a single message to the servers. Our protocols protect client privacy against arbitrary misbehavior by one of the servers and our approach requires no public-key cryptography (except for secure channels), nor general-purpose multiparty computation. Instead, we rely on incremental distributed point functions, a new cryptographic tool that allows a client to succinctly secret-share the labels on the nodes of an exponentially large binary tree, provided that the tree has a single non-zero path. Along the way, we develop new general tools for providing malicious security in applications of distributed point functions. In an experimental evaluation with two servers on opposite sides of the U.S., the servers can find the 200 most popular strings among a set of 400,000 client-held 256-bit strings in 54 minutes. Our protocols are highly parallelizable. We estimate that with 20 physical machines per logical server, our protocols could compute heavy hitters over ten million clients in just over one hour of computation.Comment: To appear in IEEE Security & Privacy 202

arXiv.org e-Print Archive

Vogue: Faster Computation of Private Heavy Hitters

Author: Arpita Patra
Bhavish Raj Gopal
Nishat Koti
Pranav Jangir
Somya Sangal
Varsha Bhat Kukkala
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 09/11/2022
Field of study

Consider the problem of securely identifying τ -heavy hitters, where given a set of client inputs, the goal is to identify those inputs which are held by at least τ clients in a privacy-preserving manner. Towards this, we design a novel system Vogue, whose key highlight in comparison to prior works, is that it ensures complete privacy and does not leak any information other than the heavy hitters. In doing so, Vogue aims to achieve as efficient a solution as possible. To showcase these efficiency improvements, we benchmark our solution and observe that it requires around 14 minutes to compute the heavy hitters for τ = 900 on 256-bit inputs when considering 400K clients. This is in contrast to the state of the art solution that requires over an hour for the same. In addition to the static input setting described above, Vogue also accounts for streaming inputs and provides a protocol that outperforms the state-of-the-art therein. The efficiency improvements witnessed while computing heavy hitters in both, the static and streaming input settings, are attributed to our new secure stable compaction protocol, whose round complexity is independent of the size of the input array to be compacte

PLASMA: Private, Lightweight Aggregated Statistics against Malicious Adversaries with Full Security

Author: Dimitris Mouris
Nektarios Georgios Tsoutsos
Pratik Sarkar
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 23/01/2023
Field of study

The private heavy-hitters problem is a data-collection task where many clients possess private bit strings, and data-collection servers aim to identify the most popular strings without learning anything about the clients\u27 inputs. The recent work of Poplar constructed a protocol for private heavy hitters but their solution was susceptible to additive attacks by a malicious server, compromising both the correctness and the security of the protocol. In this paper, we introduce PLASMA, a private analytics framework that addresses these challenges by using three data-collection servers and a novel primitive, called verifiable incremental distributed point function (VIDPF). PLASMA allows each client to non-interactively send a message to the servers as its input and then go offline. Our new VIDPF primitive employs lightweight techniques based on efficient hashing and allows the servers to non-interactively validate client inputs and preemptively reject malformed ones. PLASMA drastically reduces the communication overhead incurred by the servers using our novel batched consistency checks. Specifically, our server-to-server communication depends only on the number of malicious clients, as opposed to the total number of clients, yielding a

182\times

and

235\times

improvement over Poplar and other state-of-the-art sorting-based protocols respectively. Compared to recent works, PLASMA enables both client input validation and succinct communication, while ensuring full security. At runtime, PLASMA computes the 1000 most popular strings among a set of 1 million client-held 32-bit strings in 67 seconds and 256-bit strings in less than 20 minutes respectively

Recommended from our members

From Controlled Data-Center Environments to Open Distributed Environments: Scalable, Efficient, and Robust Systems with Extended Functionality

Author: Zakhary Victor
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

The past two decades have witnessed several paradigm shifts in computing environments. Starting from cloud computing which offers on-demand allocation of storage, network, compute, and memory resources, as well as other services, in a pay-as-you-go billingmodel. Ending with the rise of permissionless blockchain technology, a decentralized computing paradigm with lower trust assumptions and limitless number of participants. Unlike in the cloud, where all the computing resources are owned by some trusted cloud provider, permissionless blockchains allow computing resources owned by possibly malicious parties to join and leave their network without obtaining permission from some centralized trusted authority. Still, in the presence of malicious parties, permissionlessblockchain networks can perform general computations and make progress. Cloud computing is powered by geographically distributed data-centers controlled and managed by trusted cloud service providers and promises theoretically infinite computing resources. On the other hand, permissionless blockchains are powered by open networks of geographically distributed computing nodes owned by entities that are not necessarily known or trusted. This paradigm shift requires a reconsideration of distributed data management protocols and distributed system designs that assume low latency across system components, inelastic computing resources, or fully trusted computing resources.In this dissertation, we propose new system designs and optimizations that address scalability and efficiency of distributed data management systems in cloud environments. We also propose several protocols and new programming paradigms to extend the functionality and enhance the robustness of permissionless blockchains. The work presented spans global-scale transaction processing, large-scale stream processing, atomic transaction processing across permissionless blockchains, and extending the functionality and the use-cases of permissionless blockchains. In all these directions, the focus is on rethinking system and protocol designs to account for novel cloud and permissionless blockchain assumptions. For global-scale transaction processing, we propose GPlacer, a placement optimization framework that decides replica placement of fully and partial geo-replicated databases. For large-scale stream processing, we propose Cache-on-Track (CoT) an adaptive and elastic client-side cache that addresses server-side load-imbalances that occur in large-scale distributed storage layers. In permissionless blockchain transaction processing, we propose AC3WN, the first correct cross-chain commitment protocol that guarantees atomicity of cross-chain transactions. Also, we propose TXSC, a transactional smart contract programming framework. TXSC provides smart contract developers with transaction primitives. These primitives allow developers to write smart contracts without the need to reason about the anomalies that can arise due to concurrent smart contract function executions. In addition, we propose a forward-looking architecture that unifies both permissioned and permissionless blockchains and exploits the running infrastructure of permissionless blockchains to build global asset management systems

eScholarship - University of California

Rethinking Routing and Peering in the era of Vertical Integration of Network Functions

Author: Dey Prasun Kanti
Publication venue: 'Information Bulletin on Variable Stars (IBVS)'
Publication date: 01/01/2019
Field of study

Content providers typically control the digital content consumption services and are getting the most revenue by implementing an all-you-can-eat model via subscription or hyper-targeted advertisements. Revamping the existing Internet architecture and design, a vertical integration where a content provider and access ISP will act as unibody in a sugarcane form seems to be the recent trend. As this vertical integration trend is emerging in the ISP market, it is questionable if existing routing architecture will suffice in terms of sustainable economics, peering, and scalability. It is expected that the current routing will need careful modifications and smart innovations to ensure effective and reliable end-to-end packet delivery. This involves new feature developments for handling traffic with reduced latency to tackle routing scalability issues in a more secure way and to offer new services at cheaper costs. Considering the fact that prices of DRAM or TCAM in legacy routers are not necessarily decreasing at the desired pace, cloud computing can be a great solution to manage the increasing computation and memory complexity of routing functions in a centralized manner with optimized expenses. Focusing on the attributes associated with existing routing cost models and by exploring a hybrid approach to SDN, we also compare recent trends in cloud pricing (for both storage and service) to evaluate whether it would be economically beneficial to integrate cloud services with legacy routing for improved cost-efficiency. In terms of peering, using the US as a case study, we show the overlaps between access ISPs and content providers to explore the viability of a future in terms of peering between the new emerging content-dominated sugarcane ISPs and the healthiness of Internet economics. To this end, we introduce meta-peering, a term that encompasses automation efforts related to peering – from identifying a list of ISPs likely to peer, to injecting control-plane rules, to continuous monitoring and notifying any violation – one of the many outcroppings of vertical integration procedure which could be offered to the ISPs as a standalone service

Metodologias para caracterização de tráfego em redes de comunicações

Author: Rocha Eduardo de Oliveira Estanqueiro
Publication venue: Universidade de Aveiro
Publication date: 01/01/2011
Field of study

Tese de doutoramento em Metodologias para caracterização de tráfego em redes de comunicaçõesInternet Tra c, Internet Applications, Internet Attacks, Tra c Pro ling, Multi-Scale Analysis abstract Nowadays, the Internet can be seen as an ever-changing platform where new and di erent types of services and applications are constantly emerging. In fact, many of the existing dominant applications, such as social networks, have appeared recently, being rapidly adopted by the user community. All these new applications required the implementation of novel communication protocols that present di erent network requirements, according to the service they deploy. All this diversity and novelty has lead to an increasing need of accurately pro ling Internet users, by mapping their tra c to the originating application, in order to improve many network management tasks such as resources optimization, network performance, service personalization and security. However, accurately mapping tra c to its originating application is a di cult task due to the inherent complexity of existing network protocols and to several restrictions that prevent the analysis of the contents of the generated tra c. In fact, many technologies, such as tra c encryption, are widely deployed to assure and protect the con dentiality and integrity of communications over the Internet. On the other hand, many legal constraints also forbid the analysis of the clients' tra c in order to protect their con dentiality and privacy. Consequently, novel tra c discrimination methodologies are necessary for an accurate tra c classi cation and user pro ling. This thesis proposes several identi cation methodologies for an accurate Internet tra c pro ling while coping with the di erent mentioned restrictions and with the existing encryption techniques. By analyzing the several frequency components present in the captured tra c and inferring the presence of the di erent network and user related events, the proposed approaches are able to create a pro le for each one of the analyzed Internet applications. The use of several probabilistic models will allow the accurate association of the analyzed tra c to the corresponding application. Several enhancements will also be proposed in order to allow the identi cation of hidden illicit patterns and the real-time classi cation of captured tra c. In addition, a new network management paradigm for wired and wireless networks will be proposed. The analysis of the layer 2 tra c metrics and the di erent frequency components that are present in the captured tra c allows an e cient user pro ling in terms of the used web-application. Finally, some usage scenarios for these methodologies will be presented and discussed

Repositório Institucional da Universidade de Aveiro

Differential Privacy - A Balancing Act

Author: Nelson Boel
Publication venue
Publication date: 01/01/2021
Field of study

Data privacy is an ever important aspect of data analyses. Historically, a plethora of privacy techniques have been introduced to protect data, but few have stood the test of time. From investigating the overlap between big data research, and security and privacy research, I have found that differential privacy presents itself as a promising defender of data privacy.Differential privacy is a rigorous, mathematical notion of privacy. Nevertheless, privacy comes at a cost. In order to achieve differential privacy, we need to introduce some form of inaccuracy (i.e. error) to our analyses. Hence, practitioners need to engage in a balancing act between accuracy and privacy when adopting differential privacy. As a consequence, understanding this accuracy/privacy trade-off is vital to being able to use differential privacy in real data analyses.In this thesis, I aim to bridge the gap between differential privacy in theory, and differential privacy in practice. Most notably, I aim to convey a better understanding of the accuracy/privacy trade-off, by 1) implementing tools to tweak accuracy/privacy in a real use case, 2) presenting a methodology for empirically predicting error, and 3) systematizing and analyzing known accuracy improvement techniques for differentially private algorithms. Additionally, I also put differential privacy into context by investigating how it can be applied in the automotive domain. Using the automotive domain as an example, I introduce the main challenges that constitutes the balancing act, and provide advice for moving forward

Chalmers Research

Efficient estimation of statistical functions while preserving client-side privacy

Author: Kulkarni Tejas
Publication venue
Publication date
Field of study

Aggregating service users’ personal data for analytical purposes is a common practice in today’s Internet economy. However, distrust in the data aggregator, data breaches and risks of subpoenas pose significant challenges in the availability of data. The framework of differential privacy is enjoying wide attention due to its scalability and rigour of privacy protection it provides, and has become a de facto standard for facilitating privacy preserving information extraction. In this dissertation, we design and implement resource efficient algorithms for three fundamental data analysis primitives, marginal, range, and count queries while providing strong differential privacy guarantees. The first two queries are studied in the strict scenario of untrusted aggregation (aka local model) in which the data collector is allowed to only access the noisy/perturbed version of users’ data but not their true data. To the best of our knowledge, marginal and range queries have not been studied in detail in the local setting before our works. We show that our simple data transfomation techniques help us achieve great accuracy in practice and can be used for performing more interesting analysis. Finally, we revisit the problem of count queries under trusted aggregation. This setting can also be viewed as a relaxation of the local model called limited precision local differential privacy. We first discover certain weakness in a well-known optimization framework leading to solutions exhibiting pathological behaviours. We then propose more constraints in the framework to remove these weaknesses without compromising too much on utility

Warwick Research Archives Portal Repository

Understanding the Real World through the Analysis of User Behavior and Topics in Online Social Media

Author: Georgiou Theodore
Publication venue: eScholarship, University of California
Publication date: 01/01/2017
Field of study

Physical events happening in the real world usually trigger reactions and discussions in the digital world; a world most often represented by Online Social Media such as Twitter or Facebook. Mining these reactions through social sensors offers a fast and low cost way to explain what is happening in the physical world. A thorough understanding of these discussions and the context behind them has become critical for many applications like business or political analysis. This context includes the characteristics of the population participating in a discussion, or when it is being discussed, or why. As an example, we demonstrate how the time of the day affects the prediction of traffic on highways through the analysis of social media content. Obtaining an understanding of what is happening online and the ramifications on the real world can be enabled through the automatic summarization of Social Media. Trending topics are offered as a high level content recommendation system where users are suggested to view related content if they deem the displayed topics interesting. However, identifying the characteristics of the users focused on each topic can boost the importance even for topics that might not be popular or bursty. We define a way to characterize groups of users that are focused in such topics and propose an efficient and accurate algorithm to extract such communities. Through qualitative and quantitative experimentation we observe that topics with a strong community focus are interesting and more likely to catch the attention of users.Consequently, as trending topic extraction algorithms become more sophisticated and report additional information like the characteristics of the users that participate in a trend, significant and novel privacy issues arise. We introduce a statistical attack to infer sensitive attribute values of Online Social Networks users that utilizes such reported community-aware trending topics. Additionally, we provide an algorithmic methodology that alters an existing community-aware trending topic algorithm so that it can preserve the privacy of the involved users while still reporting trending topics with a satisfactory level of utility. From the user’s perspective, we explore the idea of a cyborg that can constantly monitor its owner’s privacy and alert them when necessary. However, apart from individuals, the notion of privacy can also extend to a group of people (or community). We study how non-private behavior of individuals can lead to exposure of the identity of a larger group. This exposure poses certain dangers, like online harassment targeted to the members of a group, potential physical attacks, group identity shift, etc. We discuss how this new privacy notion can be modeled and identify a set of core challenges and potential solutions

eScholarship - University of California