36 research outputs found
The Dark Menace: Characterizing Network-based Attacks in the Cloud
ABSTRACT As the cloud computing market continues to grow, the cloud platform is becoming an attractive target for attackers to disrupt services and steal data, and to compromise resources to launch attacks. In this paper, using three months of NetFlow data in 2013 from a large cloud provider, we present the first large-scale characterization of inbound attacks towards the cloud and outbound attacks from the cloud. We investigate nine types of attacks ranging from network-level attacks such as DDoS to application-level attacks such as SQL injection and spam. Our analysis covers the complexity, intensity, duration, and distribution of these attacks, highlighting the key challenges in defending against attacks in the cloud. By characterizing the diversity of cloud attacks, we aim to motivate the research community towards developing future security solutions for cloud systems
filtering
Search engines have primarily focused on presenting the most relevant pages to the user quickly. A less well explored aspect of improving the search experience is to remove or group all near-duplicate documents in the results presented to the user. In this paper, we apply a Bloom filter based similarity detection technique to address this issue by refining the search results presented to the user. First, we present and analyze our technique for finding similar documents using contentdefined chunking and Bloom filters, and demonstrate its effectiveness in compactly representing and quickly matching pages for similarity testing. Later, we demonstrate how a number of results of popular and random search queries retrieved from different search engines, Google, Yahoo, MSN, are similar and can be eliminated or re-organized. Finally, we apply our near-duplicate detection technique to show how to effectively remove similar search results and improve user experience
Recommended from our members
Precision-integrated scalable monitoring
textScalable system monitoring is a fundamental abstraction for large-scale networked systems. The goal of this dissertation is to design and build a scalable monitoring middleware that provides system introspection for large distributed systems and that will facilitate the design, development, and deployment of distributed monitoring applications. This middleware will enable monitoring applications to flexibly control the tradeoff between result precision and communication cost and to improve result accuracy in the face of node failures, network delays, and system reconfigurations. We present PRISM (PRecision-Integrated Scalable Monitoring), a scalable monitoring middleware that provides a global aggregate view of large-scale networked systems and that can serve as a building block for a broad range of distributed monitoring applications by coordinating views of multiple vantage points across the network. To coordinate a global view for system introspection, PRISM faces two key challenges: (1) scalability to large systems and high data volumes and (2) safeguarding accuracy in the face of node and network failures. To address these challenges, we design, implement, and evaluate PRISM, a system that defines precision as a new unified abstraction to enable scalable monitoring. PRISM quantifies (im)precision along a three-dimensional vector: arithmetic imprecision (AI) and temporal imprecision (TI) balance precision against monitoring overhead for scalability while network imprecision (NI) addresses the challenge of providing consistency guarantees despite failures. Our prototype implementation of PRISM addresses the challenge of providing these metrics while scaling to a large number of nodes and attributes by (1) leveraging Distributed Hash Tables (DHTs) to create scalable aggregation trees, (2) self-tuning AI budgets across nodes in a principled, near-optimal manner to shift precision to where it is useful, (3) pipelining TI delays across tree levels to maximize batching of updates, and (4) applying dual-tree prefix aggregation which exploits symmetry in our DHT topology to drastically reduce the cost of the active probing needed to maintain NI. Through extensive simulations and experiments on four large-scale testbeds, we observe that PRISM provides a key substrate for scalable monitoring by (1) reducing monitoring load by up to two orders of magnitude compared to existing approaches, (2) providing a flexible framework to control the tradeoff between accuracy, bandwidth cost, and response latency, (3) characterizing and improving confidence in the accuracy of results in the face of system disruptions, and (4) improving the observed accuracy by up to an order of magnitude despite churn. We have built several monitoring applications on top of PRISM including a distributed heavy hitter detection service, a distributed monitoring service for Internet-scale systems, and a detection service for monitoring distributed-denial-of-service (DDoS) attacks at the source-side in distributed networked systems. Finally, we demonstrate how the unified precision abstraction enables new monitoring applications by presenting experiences from these applications.Computer Science
Water security: a Geospatial Framework for urban water resilience
Urban water issues impacting sustainable development can be analyzed, modeled, and mapped through cutting-edge geospatial technologies; however, the water sector in developing countries suffers various spatial data-related problems such as limited coverage, unreliable data, limited coordination, and sharing. Available spatial data are limited to the aggregate level (i.e., national, state, and district levels) and lack details to make informed policy decisions and allocations. Despite significant advancements in geospatial technologies, their application and integration at the policy and decision-making level are rare. The current research provides a broad GIS-centric framework for actionable science, which focuses on real context and facilitates geospatial maps and theoretical and practical knowledge to address various water issues. The study demonstrates the application of the proposed Geospatial Framework from technical and institutional perspectives in water-stressed zones in Pune city, showing where and how to solve problems and where proposed actions can most impact creating a sustainable water-secured future. The framework makes it possible for everyone to explore datasets that can provide a baseline for research, and analysis, contribute to the process, propose, and act on solutions, and take the benefits of the outcomes and policy recommendations.
HIGHLIGHTS
A Geospatial Framework is developed to measure and monitor water security through geospatial technologies.;
The study demonstrates the application of the proposed Geospatial Framework from technical and institutional perspectives in water-stressed zones in Pune city.;
The study can collaborate with Municipal Corporation mutually beneficial and work toward open-linked geospatial data for water security.
On-demand, Spot, or Both: Dynamic Resource Allocation for Executing Batch Jobs in the Cloud
Abstract Cloud computing provides an attractive computing paradigm in which computational resources are rented on-demand to users with zero capital and maintenance costs. Cloud providers offer different pricing options to meet computing requirements of a wide variety of applications. An attractive option for batch computing is spot-instances, which allows users to place bids for spare computing instances and rent them at a (often) substantially lower price compared to the fixed on-demand price. However, this raises three main challenges for users: how many instances to rent at any time? what type (on-demand, spot, or both)? and what bid value to use for spot instances? In particular, renting on-demand risks high costs while renting spot instances risks job interruption and delayed completion when the spot market price exceeds the bid. This paper introduces an online learning algorithm for resource allocation to address this fundamental tradeoff between computation cost and performance. Our algorithm dynamically adapts resource allocation by learning from its performance on prior job executions while incorporating history of spot prices and workload characteristics. We provide theoretical bounds on its performance and prove that the average regret of our approach (compared to the best policy in hindsight) vanishes to zero with time. Evaluation on traces from a large datacenter cluster shows that our algorithm outperforms greedy allocation heuristics and quickly converges to a small set of best performing policies
Taper: Tiered approach for eliminating redundancy in replica synchronization
We present TAPER, a scalable data replication protocol that synchronizes a large collection of data across multiple geographically distributed replica locations. TAPER can be applied to a broad range of systems, such as software distribution mirrors, content distribution networks, backup and recovery, and federated file systems. TA-PER is designed to be bandwidth efficient, scalable and content-based, and it does not require prior knowledge of the replica state. To achieve these properties, TA-PER provides: i) four pluggable redundancy elimination phases that balance the trade-off between bandwidth savings and computation overheads, ii) a hierarchical hash tree based directory pruning phase that quickly matches identical data from the granularity of directory trees to individual files, iii) a content-based similarity detection technique using Bloom filters to identify similar files, and iv) a combination of coarse-grained chunk matching with finer-grained block matches to achieve bandwidth efficiency. Through extensive experiments on various datasets, we observe that in comparison with rsync, a widely-used directory synchronization tool, TAPER reduces bandwidth by 15 % to 71%, performs faster matching, and scales to a larger number of replicas.
On-demand, Spot, or Both: Dynamic Resource Allocation for Executing Batch Jobs in the Cloud
Abstract Cloud computing provides an attractive computing paradigm in which computational resources are rented on-demand to users with zero capital and maintenance costs. Cloud providers offer different pricing options to meet computing requirements of a wide variety of applications. An attractive option for batch computing is spot-instances, which allows users to place bids for spare computing instances and rent them at a (often) substantially lower price compared to the fixed on-demand price. However, this raises three main challenges for users: how many instances to rent at any time? what type (on-demand, spot, or both)? and what bid value to use for spot instances? In particular, renting on-demand risks high costs while renting spot instances risks job interruption and delayed completion when the spot market price exceeds the bid. This paper introduces an online learning algorithm for resource allocation to address this fundamental tradeoff between computation cost and performance. Our algorithm dynamically adapts resource allocation by learning from its performance on prior job executions while incorporating history of spot prices and workload characteristics. We provide theoretical bounds on its performance and prove that the average regret of our approach (compared to the best policy in hindsight) vanishes to zero with time. Evaluation on traces from a large datacenter cluster shows that our algorithm outperforms greedy allocation heuristics and quickly converges to a small set of best performing policies