125 research outputs found
Revealing Hidden Hierarchical Heavy Hitters in network traffic
© 2018 Association for Computing Machinery. The idea to enable advanced in-network monitoring functionality has been lately fostered by the advent of massive data-plane programmability. A specific example includes the detection of traffic aggregates with programmable switches, i.e., heavy hitters. So far, proposed solutions implement the mining process by partitioning the network stream in disjoint windows. This practice allows efficient implementations but comes at a well-known cost: the results are tightly coupled with the traffic and window's characteristics. This poster quantifies the limitations of disjoint time windows approaches by showing that they hardly cope with traffic dynamics. We report the results of our analysis and unveil that up to 34% of the total number of the hierarchical heavy hitters might not be detected with those approaches. This is a call for a new set of windowless-based algorithms to be implemented with the match-action paradigm
Lightweight Techniques for Private Heavy Hitters
This paper presents a new protocol for solving the private heavy-hitters
problem. In this problem, there are many clients and a small set of
data-collection servers. Each client holds a private bitstring. The servers
want to recover the set of all popular strings, without learning anything else
about any client's string. A web-browser vendor, for instance, can use our
protocol to figure out which homepages are popular, without learning any user's
homepage. We also consider the simpler private subset-histogram problem, in
which the servers want to count how many clients hold strings in a particular
set without revealing this set to the clients.
Our protocols use two data-collection servers and, in a protocol run, each
client send sends only a single message to the servers. Our protocols protect
client privacy against arbitrary misbehavior by one of the servers and our
approach requires no public-key cryptography (except for secure channels), nor
general-purpose multiparty computation. Instead, we rely on incremental
distributed point functions, a new cryptographic tool that allows a client to
succinctly secret-share the labels on the nodes of an exponentially large
binary tree, provided that the tree has a single non-zero path. Along the way,
we develop new general tools for providing malicious security in applications
of distributed point functions.
In an experimental evaluation with two servers on opposite sides of the U.S.,
the servers can find the 200 most popular strings among a set of 400,000
client-held 256-bit strings in 54 minutes. Our protocols are highly
parallelizable. We estimate that with 20 physical machines per logical server,
our protocols could compute heavy hitters over ten million clients in just over
one hour of computation.Comment: To appear in IEEE Security & Privacy 202
Vogue: Faster Computation of Private Heavy Hitters
Consider the problem of securely identifying Ď„ -heavy hitters, where given a set of client inputs, the goal is to identify those inputs which are held by at least Ď„ clients in a privacy-preserving manner. Towards this, we design a novel system Vogue, whose key highlight in comparison to prior works, is that it ensures complete privacy and does not leak any information other than the heavy hitters. In doing so, Vogue aims to achieve as efficient a solution as possible. To showcase these efficiency improvements, we benchmark our solution and observe that it requires around 14 minutes to compute the heavy hitters for Ď„ = 900 on 256-bit inputs when considering 400K clients. This is in contrast to the state of the art solution that requires over an hour for the same. In addition to the static input setting described above, Vogue also accounts for streaming inputs and provides a protocol that outperforms the state-of-the-art therein. The efficiency improvements witnessed while computing heavy hitters in both, the static and streaming input settings, are attributed to our new secure stable compaction protocol, whose round complexity is independent of the size of the input array to be compacte
PLASMA: Private, Lightweight Aggregated Statistics against Malicious Adversaries with Full Security
The private heavy-hitters problem is a data-collection task where many clients possess private bit strings, and data-collection servers aim to identify the most popular strings without learning anything about the clients\u27 inputs. The recent work of Poplar constructed a protocol for private heavy hitters but their solution was susceptible to additive attacks by a malicious server, compromising both the correctness and the security of the protocol.
In this paper, we introduce PLASMA, a private analytics framework that addresses these challenges by using three data-collection servers and a novel primitive, called verifiable incremental distributed point function (VIDPF). PLASMA allows each client to non-interactively send a message to the servers as its input and then go offline. Our new VIDPF primitive employs lightweight techniques based on efficient hashing and allows the servers to non-interactively validate client inputs and preemptively reject malformed ones.
PLASMA drastically reduces the communication overhead incurred by the servers using our novel batched consistency checks. Specifically, our server-to-server communication depends only on the number of malicious clients, as opposed to the total number of clients, yielding a and improvement over Poplar and other state-of-the-art sorting-based protocols respectively. Compared to recent works, PLASMA enables both client input validation and succinct communication, while ensuring full security. At runtime, PLASMA computes the 1000 most popular strings among a set of 1 million client-held 32-bit strings in 67 seconds and 256-bit strings in less than 20 minutes respectively
Recommended from our members
From Controlled Data-Center Environments to Open Distributed Environments: Scalable, Efficient, and Robust Systems with Extended Functionality
The past two decades have witnessed several paradigm shifts in computing environments. Starting from cloud computing which offers on-demand allocation of storage, network, compute, and memory resources, as well as other services, in a pay-as-you-go billingmodel. Ending with the rise of permissionless blockchain technology, a decentralized computing paradigm with lower trust assumptions and limitless number of participants. Unlike in the cloud, where all the computing resources are owned by some trusted cloud provider, permissionless blockchains allow computing resources owned by possibly malicious parties to join and leave their network without obtaining permission from some centralized trusted authority. Still, in the presence of malicious parties, permissionlessblockchain networks can perform general computations and make progress. Cloud computing is powered by geographically distributed data-centers controlled and managed by trusted cloud service providers and promises theoretically infinite computing resources. On the other hand, permissionless blockchains are powered by open networks of geographically distributed computing nodes owned by entities that are not necessarily known or trusted. This paradigm shift requires a reconsideration of distributed data management protocols and distributed system designs that assume low latency across system components, inelastic computing resources, or fully trusted computing resources.In this dissertation, we propose new system designs and optimizations that address scalability and efficiency of distributed data management systems in cloud environments. We also propose several protocols and new programming paradigms to extend the functionality and enhance the robustness of permissionless blockchains. The work presented spans global-scale transaction processing, large-scale stream processing, atomic transaction processing across permissionless blockchains, and extending the functionality and the use-cases of permissionless blockchains. In all these directions, the focus is on rethinking system and protocol designs to account for novel cloud and permissionless blockchain assumptions. For global-scale transaction processing, we propose GPlacer, a placement optimization framework that decides replica placement of fully and partial geo-replicated databases. For large-scale stream processing, we propose Cache-on-Track (CoT) an adaptive and elastic client-side cache that addresses server-side load-imbalances that occur in large-scale distributed storage layers. In permissionless blockchain transaction processing, we propose AC3WN, the first correct cross-chain commitment protocol that guarantees atomicity of cross-chain transactions. Also, we propose TXSC, a transactional smart contract programming framework. TXSC provides smart contract developers with transaction primitives. These primitives allow developers to write smart contracts without the need to reason about the anomalies that can arise due to concurrent smart contract function executions. In addition, we propose a forward-looking architecture that unifies both permissioned and permissionless blockchains and exploits the running infrastructure of permissionless blockchains to build global asset management systems
Rethinking Routing and Peering in the era of Vertical Integration of Network Functions
Content providers typically control the digital content consumption services and are getting the most revenue by implementing an all-you-can-eat model via subscription or hyper-targeted advertisements. Revamping the existing Internet architecture and design, a vertical integration where a content provider and access ISP will act as unibody in a sugarcane form seems to be the recent trend. As this vertical integration trend is emerging in the ISP market, it is questionable if existing routing architecture will suffice in terms of sustainable economics, peering, and scalability. It is expected that the current routing will need careful modifications and smart innovations to ensure effective and reliable end-to-end packet delivery. This involves new feature developments for handling traffic with reduced latency to tackle routing scalability issues in a more secure way and to offer new services at cheaper costs. Considering the fact that prices of DRAM or TCAM in legacy routers are not necessarily decreasing at the desired pace, cloud computing can be a great solution to manage the increasing computation and memory complexity of routing functions in a centralized manner with optimized expenses. Focusing on the attributes associated with existing routing cost models and by exploring a hybrid approach to SDN, we also compare recent trends in cloud pricing (for both storage and service) to evaluate whether it would be economically beneficial to integrate cloud services with legacy routing for improved cost-efficiency. In terms of peering, using the US as a case study, we show the overlaps between access ISPs and content providers to explore the viability of a future in terms of peering between the new emerging content-dominated sugarcane ISPs and the healthiness of Internet economics. To this end, we introduce meta-peering, a term that encompasses automation efforts related to peering – from identifying a list of ISPs likely to peer, to injecting control-plane rules, to continuous monitoring and notifying any violation – one of the many outcroppings of vertical integration procedure which could be offered to the ISPs as a standalone service
Metodologias para caracterização de tráfego em redes de comunicações
Tese de doutoramento em Metodologias para caracterização de tráfego em redes de comunicaçõesInternet Tra c, Internet Applications, Internet Attacks, Tra c Pro ling,
Multi-Scale Analysis
abstract Nowadays, the Internet can be seen as an ever-changing platform where new
and di erent types of services and applications are constantly emerging. In
fact, many of the existing dominant applications, such as social networks,
have appeared recently, being rapidly adopted by the user community. All
these new applications required the implementation of novel communication
protocols that present di erent network requirements, according to the service
they deploy. All this diversity and novelty has lead to an increasing need
of accurately pro ling Internet users, by mapping their tra c to the originating
application, in order to improve many network management tasks such
as resources optimization, network performance, service personalization and
security. However, accurately mapping tra c to its originating application
is a di cult task due to the inherent complexity of existing network protocols
and to several restrictions that prevent the analysis of the contents of
the generated tra c. In fact, many technologies, such as tra c encryption,
are widely deployed to assure and protect the con dentiality and integrity
of communications over the Internet. On the other hand, many legal constraints
also forbid the analysis of the clients' tra c in order to protect
their con dentiality and privacy. Consequently, novel tra c discrimination
methodologies are necessary for an accurate tra c classi cation and user
pro ling. This thesis proposes several identi cation methodologies for an
accurate Internet tra c pro ling while coping with the di erent mentioned
restrictions and with the existing encryption techniques. By analyzing the
several frequency components present in the captured tra c and inferring
the presence of the di erent network and user related events, the proposed
approaches are able to create a pro le for each one of the analyzed Internet
applications. The use of several probabilistic models will allow the accurate
association of the analyzed tra c to the corresponding application. Several
enhancements will also be proposed in order to allow the identi cation of
hidden illicit patterns and the real-time classi cation of captured tra c.
In addition, a new network management paradigm for wired and wireless
networks will be proposed. The analysis of the layer 2 tra c metrics and
the di erent frequency components that are present in the captured tra c
allows an e cient user pro ling in terms of the used web-application. Finally,
some usage scenarios for these methodologies will be presented and
discussed
Differential Privacy - A Balancing Act
Data privacy is an ever important aspect of data analyses. Historically, a plethora of privacy techniques have been introduced to protect data, but few have stood the test of time. From investigating the overlap between big data research, and security and privacy research, I have found that differential privacy presents itself as a promising defender of data privacy.Differential privacy is a rigorous, mathematical notion of privacy. Nevertheless, privacy comes at a cost. In order to achieve differential privacy, we need to introduce some form of inaccuracy (i.e. error) to our analyses. Hence, practitioners need to engage in a balancing act between accuracy and privacy when adopting differential privacy. As a consequence, understanding this accuracy/privacy trade-off is vital to being able to use differential privacy in real data analyses.In this thesis, I aim to bridge the gap between differential privacy in theory, and differential privacy in practice. Most notably, I aim to convey a better understanding of the accuracy/privacy trade-off, by 1) implementing tools to tweak accuracy/privacy in a real use case, 2) presenting a methodology for empirically predicting error, and 3) systematizing and analyzing known accuracy improvement techniques for differentially private algorithms. Additionally, I also put differential privacy into context by investigating how it can be applied in the automotive domain. Using the automotive domain as an example, I introduce the main challenges that constitutes the balancing act, and provide advice for moving forward
Efficient estimation of statistical functions while preserving client-side privacy
Aggregating service users’ personal data for analytical purposes is a common practice in today’s Internet economy. However, distrust in the data aggregator, data breaches and risks of subpoenas pose significant challenges in the availability of data. The framework of differential privacy is enjoying wide attention due to its scalability and rigour of privacy protection it provides, and has become a de facto standard for facilitating privacy preserving information extraction. In this dissertation, we design and implement resource efficient algorithms for three fundamental data analysis primitives, marginal, range, and count queries while providing strong differential privacy guarantees.
The first two queries are studied in the strict scenario of untrusted aggregation (aka local model) in which the data collector is allowed to only access the noisy/perturbed version of users’ data but not their true data. To the best of our knowledge, marginal and range queries have not been studied in detail in the local setting before our works. We show that our simple data transfomation techniques help us achieve great accuracy in practice and can be used for performing more interesting analysis.
Finally, we revisit the problem of count queries under trusted aggregation. This setting can also be viewed as a relaxation of the local model called limited precision local differential privacy. We first discover certain weakness in a well-known optimization framework leading to solutions exhibiting pathological behaviours. We then propose more constraints in the framework to remove these weaknesses without compromising too much on utility
Understanding the Real World through the Analysis of User Behavior and Topics in Online Social Media
Physical events happening in the real world usually trigger reactions and discussions in the digital world; a world most often represented by Online Social Media such as Twitter or Facebook. Mining these reactions through social sensors offers a fast and low cost way to explain what is happening in the physical world. A thorough understanding of these discussions and the context behind them has become critical for many applications like business or political analysis. This context includes the characteristics of the population participating in a discussion, or when it is being discussed, or why. As an example, we demonstrate how the time of the day affects the prediction of traffic on highways through the analysis of social media content. Obtaining an understanding of what is happening online and the ramifications on the real world can be enabled through the automatic summarization of Social Media. Trending topics are offered as a high level content recommendation system where users are suggested to view related content if they deem the displayed topics interesting. However, identifying the characteristics of the users focused on each topic can boost the importance even for topics that might not be popular or bursty. We define a way to characterize groups of users that are focused in such topics and propose an efficient and accurate algorithm to extract such communities. Through qualitative and quantitative experimentation we observe that topics with a strong community focus are interesting and more likely to catch the attention of users.Consequently, as trending topic extraction algorithms become more sophisticated and report additional information like the characteristics of the users that participate in a trend, significant and novel privacy issues arise. We introduce a statistical attack to infer sensitive attribute values of Online Social Networks users that utilizes such reported community-aware trending topics. Additionally, we provide an algorithmic methodology that alters an existing community-aware trending topic algorithm so that it can preserve the privacy of the involved users while still reporting trending topics with a satisfactory level of utility. From the user’s perspective, we explore the idea of a cyborg that can constantly monitor its owner’s privacy and alert them when necessary. However, apart from individuals, the notion of privacy can also extend to a group of people (or community). We study how non-private behavior of individuals can lead to exposure of the identity of a larger group. This exposure poses certain dangers, like online harassment targeted to the members of a group, potential physical attacks, group identity shift, etc. We discuss how this new privacy notion can be modeled and identify a set of core challenges and potential solutions
- …