725 research outputs found
RepFlow: Minimizing Flow Completion Times with Replicated Flows in Data Centers
Short TCP flows that are critical for many interactive applications in data
centers are plagued by large flows and head-of-line blocking in switches.
Hash-based load balancing schemes such as ECMP aggravate the matter and result
in long-tailed flow completion times (FCT). Previous work on reducing FCT
usually requires custom switch hardware and/or protocol changes. We propose
RepFlow, a simple yet practically effective approach that replicates each short
flow to reduce the completion times, without any change to switches or host
kernels. With ECMP the original and replicated flows traverse distinct paths
with different congestion levels, thereby reducing the probability of having
long queueing delay. We develop a simple analytical model to demonstrate the
potential improvement of RepFlow. Extensive NS-3 simulations and Mininet
implementation show that RepFlow provides 50%--70% speedup in both mean and
99-th percentile FCT for all loads, and offers near-optimal FCT when used with
DCTCP.Comment: To appear in IEEE INFOCOM 201
A Model for Scientific Workflows with Parallel and Distributed Computing
In the last decade we witnessed an immense evolution of the computing infrastructures
in terms of processing, storage and communication. On one hand, developments in hardware architectures have made it possible to run multiple virtual machines on a single physical machine. On the other hand, the increase of the available network communication bandwidth has enabled the widespread use of distributed computing infrastructures, for example based on clusters, grids and clouds. The above factors enabled different scientific communities to aim for the development and implementation of complex scientific applications possibly involving large amounts of data. However, due to their structural complexity, these applications require decomposition models to allow multiple tasks running in parallel and distributed environments.
The scientific workflow concept arises naturally as a way to model applications composed of multiple activities. In fact, in the past decades many initiatives have been
undertaken to model application development using the workflow paradigm, both in
the business and in scientific domains. However, despite such intensive efforts, current
scientific workflow systems and tools still have limitations, which pose difficulties to the
development of emerging large-scale, distributed and dynamic applications.
This dissertation proposes the AWARD model for scientific workflows with parallel
and distributed computing. AWARD is an acronym for Autonomic Workflow Activities
Reconfigurable and Dynamic.
The AWARD model has the following main characteristics.
It is based on a decentralized execution control model where multiple autonomic
workflow activities interact by exchanging tokens through input and output ports. The
activities can be executed separately in diverse computing environments, such as in a
single computer or on multiple virtual machines running on distributed infrastructures,
such as clusters and clouds.
It provides basic workflow patterns for parallel and distributed application decomposition and other useful patterns supporting feedback loops and load balancing. The model is suitable to express applications based on a finite or infinite number of iterations, thus allowing to model long-running workflows, which are typical in scientific experimention. A distintive contribution of the AWARD model is the support for dynamic reconfiguration
of long-running workflows. A dynamic reconfiguration allows to modify the
structure of the workflow, for example, to introduce new activities, modify the connections
between activity input and output ports. The activity behavior can also be modified,
for example, by dynamically replacing the activity algorithm.
In addition to the proposal of a new workflow model, this dissertation presents the
implementation of a fully functional software architecture that supports the AWARD
model. The implemented prototype was used to validate and refine the model across
multiple workflow scenarios whose usefulness has been demonstrated in practice clearly, through experimental results, demonstrating the advantages of the major characteristics and contributions of the AWARD model. The implemented prototype was also used to develop application cases, such as a workflow to support the implementation of the MapReduce model and a workflow to support a text mining application developed by an external user.
The extensive experimental work confirmed the adequacy of the AWARD model and
its implementation for developing applications that exploit parallelism and distribution
using the scientific workflows paradigm
PROVIDE: hiding from automated network scans with proofs of identity
Network scanners are a valuable tool for researchers and administrators, however they are also used by malicious actors to identify vulnerable hosts on a network. Upon the disclosure of a security vulnerability, scans are launched within hours. These opportunistic attackers enumerate blocks of IP addresses in hope of discovering an exploitable host. Fortunately, defensive measures such as port knocking protocols (PKPs) allow a service to remain stealth to unauthorized IP addresses. The service is revealed only when a client includes a special authentication token (AT) in the IP/TCP header. However this AT is generated from a secret shared between the clients/servers and distributed manually to each endpoint. As a result, these defense measures have failed to be widely adopted by other protocols such as HTTP/S due to challenges in distributing the shared secrets. In this paper we propose a scalable solution to this problem for services accessed by domain name. We make the following observation: automated network scanners access servers by IP address, while legitimate clients access the server by name. Therefore a service should only reveal itself to clients who know its name. Based on this principal, we have created a proof of the verifier’s identity (a.k.a. PROVIDE) protocol that allows a prover (legitimate user) to convince a verifier (service) that it is knowledgeable of the verifier’s identity. We present a PROVIDE implementation using a PKP and DNS (PKP+DNS) that uses DNS TXT records to distribute identification tokens (IDT) while DNS PTR records for the service’s domain name are prohibited to prevent reverse DNS lookups. Clients are modified to make an additional DNS TXT query to obtain the IDT which is used by the PKP to generate an AT. The inclusion of an AT in the packet header, generated from the DNS TXT query, is proof the client knows the service’s identity. We analyze the effectiveness of this mechanism with respect to brute force attempts for various strength ATs and discuss practical considerations.This work has been supported by the National Science Foundation (NSF) awards #1430145, #1414119, and #1012798
Content and popularity analysis of Tor hidden services
Tor hidden services allow running Internet services while protecting the
location of the servers. Their main purpose is to enable freedom of speech even
in situations in which powerful adversaries try to suppress it. However,
providing location privacy and client anonymity also makes Tor hidden services
an attractive platform for every kind of imaginable shady service. The ease
with which Tor hidden services can be set up has spurred a huge growth of
anonymously provided Internet services of both types. In this paper we analyse
the landscape of Tor hidden services. We have studied Tor hidden services after
collecting 39824 hidden service descriptors on 4th of Feb 2013 by exploiting
protocol and implementation flaws in Tor: we scanned them for open ports; in
the case of HTTP services, we analysed and classified their content. We also
estimated the popularity of hidden services by looking at the request rate for
hidden service descriptors by clients. We found that while the content of Tor
hidden services is rather varied, the most popular hidden services are related
to botnets.Comment: 6 pages, 3 figures, 2 table
Understanding Security Threats in Cloud
As cloud computing has become a trend in the computing world, understanding its security concerns becomes essential for improving service quality and expanding business scale. This dissertation studies the security issues in a public cloud from three aspects. First, we investigate a new threat called power attack in the cloud. Second, we perform a systematical measurement on the public cloud to understand how cloud vendors react to existing security threats. Finally, we propose a novel technique to perform data reduction on audit data to improve system capacity, and hence helping to enhance security in cloud. In the power attack, we exploit various attack vectors in platform as a service (PaaS), infrastructure as a service (IaaS), and software as a service (SaaS) cloud environments. to demonstrate the feasibility of launching a power attack, we conduct series of testbed based experiments and data-center-level simulations. Moreover, we give a detailed analysis on how different power management methods could affect a power attack and how to mitigate such an attack. Our experimental results and analysis show that power attacks will pose a serious threat to modern data centers and should be taken into account while deploying new high-density servers and power management techniques. In the measurement study, we mainly investigate how cloud vendors have reacted to the co-residence threat inside the cloud, in terms of Virtual Machine (VM) placement, network management, and Virtual Private Cloud (VPC). Specifically, through intensive measurement probing, we first profile the dynamic environment of cloud instances inside the cloud. Then using real experiments, we quantify the impacts of VM placement and network management upon co-residence, respectively. Moreover, we explore VPC, which is a defensive service of Amazon EC2 for security enhancement, from the routing perspective. Advanced Persistent Threat (APT) is a serious cyber-threat, cloud vendors are seeking solutions to ``connect the suspicious dots\u27\u27 across multiple activities. This requires ubiquitous system auditing for long period of time, which in turn causes overwhelmingly large amount of system audit logs. We propose a new approach that exploits the dependency among system events to reduce the number of log entries while still supporting high quality forensics analysis. In particular, we first propose an aggregation algorithm that preserves the event dependency in data reduction to ensure high quality of forensic analysis. Then we propose an aggressive reduction algorithm and exploit domain knowledge for further data reduction. We conduct a comprehensive evaluation on real world auditing systems using more than one-month log traces to validate the efficacy of our approach
Survey and Analysis of Production Distributed Computing Infrastructures
This report has two objectives. First, we describe a set of the production
distributed infrastructures currently available, so that the reader has a basic
understanding of them. This includes explaining why each infrastructure was
created and made available and how it has succeeded and failed. The set is not
complete, but we believe it is representative.
Second, we describe the infrastructures in terms of their use, which is a
combination of how they were designed to be used and how users have found ways
to use them. Applications are often designed and created with specific
infrastructures in mind, with both an appreciation of the existing capabilities
provided by those infrastructures and an anticipation of their future
capabilities. Here, the infrastructures we discuss were often designed and
created with specific applications in mind, or at least specific types of
applications. The reader should understand how the interplay between the
infrastructure providers and the users leads to such usages, which we call
usage modalities. These usage modalities are really abstractions that exist
between the infrastructures and the applications; they influence the
infrastructures by representing the applications, and they influence the ap-
plications by representing the infrastructures
- …