718 research outputs found
A Monitoring System for the BaBar INFN Computing Cluster
Monitoring large clusters is a challenging problem. It is necessary to
observe a large quantity of devices with a reasonably short delay between
consecutive observations. The set of monitored devices may include PCs, network
switches, tape libraries and other equipments. The monitoring activity should
not impact the performances of the system. In this paper we present PerfMC, a
monitoring system for large clusters. PerfMC is driven by an XML configuration
file, and uses the Simple Network Management Protocol (SNMP) for data
collection. SNMP is a standard protocol implemented by many networked
equipments, so the tool can be used to monitor a wide range of devices. System
administrators can display informations on the status of each device by
connecting to a WEB server embedded in PerfMC. The WEB server can produce
graphs showing the value of different monitored quantities as a function of
time; it can also produce arbitrary XML pages by applying XSL Transformations
to an internal XML representation of the cluster's status. XSL Transformations
may be used to produce HTML pages which can be displayed by ordinary WEB
browsers. PerfMC aims at being relatively easy to configure and operate, and
highly efficient. It is currently being used to monitor the Italian
Reprocessing farm for the BaBar experiment, which is made of about 200 dual-CPU
Linux machines.Comment: Talk from the 2003 Computing in High Energy and Nuclear Physics
(CHEP03), La Jolla, Ca, USA, March 2003, 10 pages, LaTeX, 4 eps figures. PSN
MOET00
A Survey of Green Networking Research
Reduction of unnecessary energy consumption is becoming a major concern in
wired networking, because of the potential economical benefits and of its
expected environmental impact. These issues, usually referred to as "green
networking", relate to embedding energy-awareness in the design, in the devices
and in the protocols of networks. In this work, we first formulate a more
precise definition of the "green" attribute. We furthermore identify a few
paradigms that are the key enablers of energy-aware networking research. We
then overview the current state of the art and provide a taxonomy of the
relevant work, with a special focus on wired networking. At a high level, we
identify four branches of green networking research that stem from different
observations on the root causes of energy waste, namely (i) Adaptive Link Rate,
(ii) Interface proxying, (iii) Energy-aware infrastructures and (iv)
Energy-aware applications. In this work, we do not only explore specific
proposals pertaining to each of the above branches, but also offer a
perspective for research.Comment: Index Terms: Green Networking; Wired Networks; Adaptive Link Rate;
Interface Proxying; Energy-aware Infrastructures; Energy-aware Applications.
18 pages, 6 figures, 2 table
Metascheduling of HPC Jobs in Day-Ahead Electricity Markets
High performance grid computing is a key enabler of large scale collaborative
computational science. With the promise of exascale computing, high performance
grid systems are expected to incur electricity bills that grow super-linearly
over time. In order to achieve cost effectiveness in these systems, it is
essential for the scheduling algorithms to exploit electricity price
variations, both in space and time, that are prevalent in the dynamic
electricity price markets. In this paper, we present a metascheduling algorithm
to optimize the placement of jobs in a compute grid which consumes electricity
from the day-ahead wholesale market. We formulate the scheduling problem as a
Minimum Cost Maximum Flow problem and leverage queue waiting time and
electricity price predictions to accurately estimate the cost of job execution
at a system. Using trace based simulation with real and synthetic workload
traces, and real electricity price data sets, we demonstrate our approach on
two currently operational grids, XSEDE and NorduGrid. Our experimental setup
collectively constitute more than 433K processors spread across 58 compute
systems in 17 geographically distributed locations. Experiments show that our
approach simultaneously optimizes the total electricity cost and the average
response time of the grid, without being unfair to users of the local batch
systems.Comment: Appears in IEEE Transactions on Parallel and Distributed System
On Resource Pooling and Separation for LRU Caching
Caching systems using the Least Recently Used (LRU) principle have now become
ubiquitous. A fundamental question for these systems is whether the cache space
should be pooled together or divided to serve multiple flows of data item
requests in order to minimize the miss probabilities. In this paper, we show
that there is no straight yes or no answer to this question, depending on
complex combinations of critical factors, including, e.g., request rates,
overlapped data items across different request flows, data item popularities
and their sizes. Specifically, we characterize the asymptotic miss
probabilities for multiple competing request flows under resource pooling and
separation for LRU caching when the cache size is large.
Analytically, we show that it is asymptotically optimal to jointly serve
multiple flows if their data item sizes and popularity distributions are
similar and their arrival rates do not differ significantly; the
self-organizing property of LRU caching automatically optimizes the resource
allocation among them asymptotically. Otherwise, separating these flows could
be better, e.g., when data sizes vary significantly. We also quantify critical
points beyond which resource pooling is better than separation for each of the
flows when the overlapped data items exceed certain levels. Technically, we
generalize existing results on the asymptotic miss probability of LRU caching
for a broad class of heavy-tailed distributions and extend them to multiple
competing flows with varying data item sizes, which also validates the Che
approximation under certain conditions. These results provide new insights on
improving the performance of caching systems
Estimating Dynamic Traffic Matrices by using Viable Routing Changes
Abstract: In this paper we propose a new approach for dealing with the ill-posed nature of traffic matrix estimation. We present three solution enhancers: an algorithm for deliberately changing link weights to obtain additional information that can make the underlying linear system full rank; a cyclo-stationary model to capture both long-term and short-term traffic variability, and a method for estimating the variance of origin-destination (OD) flows. We show how these three elements can be combined into a comprehensive traffic matrix estimation procedure that dramatically reduces the errors compared to existing methods. We demonstrate that our variance estimates can be used to identify the elephant OD flows, and we thus propose a variant of our algorithm that addresses the problem of estimating only the heavy flows in a traffic matrix. One of our key findings is that by focusing only on heavy flows, we can simplify the measurement and estimation procedure so as to render it more practical. Although there is a tradeoff between practicality and accuracy, we find that increasing the rank is so helpful that we can nevertheless keep the average errors consistently below the 10% carrier target error rate. We validate the effectiveness of our methodology and the intuition behind it using commercial traffic matrix data from Sprint's Tier-1 backbon
Corporate influence and the academic computer science discipline. [4: CMU]
Prosopographical work on the four major centers for computer
research in the United States has now been conducted, resulting in big
questions about the independence of, so called, computer science
Understanding Internet topology: principles, models, and validation
Building on a recent effort that combines a first-principles approach to modeling router-level connectivity with a more pragmatic use of statistics and graph theory, we show in this paper that for the Internet, an improved understanding of its physical infrastructure is possible by viewing the physical connectivity as an annotated graph that delivers raw connectivity and bandwidth to the upper layers in the TCP/IP protocol stack, subject to practical constraints (e.g., router technology) and economic considerations (e.g., link costs). More importantly, by relying on data from Abilene, a Tier-1 ISP, and the Rocketfuel project, we provide empirical evidence in support of the proposed approach and its consistency with networking reality. To illustrate its utility, we: 1) show that our approach provides insight into the origin of high variability in measured or inferred router-level maps; 2) demonstrate that it easily accommodates the incorporation of additional objectives of network design (e.g., robustness to router failure); and 3) discuss how it complements ongoing community efforts to reverse-engineer the Internet
Distance-Dependent Kronecker Graphs for Modeling Social Networks
This paper focuses on a generalization of stochastic
Kronecker graphs, introducing a Kronecker-like operator and
defining a family of generator matrices H dependent on distances
between nodes in a specified graph embedding. We prove
that any lattice-based network model with sufficiently small
distance-dependent connection probability will have a Poisson
degree distribution and provide a general framework to prove
searchability for such a network. Using this framework, we focus
on a specific example of an expanding hypercube and discuss
the similarities and differences of such a model with recently
proposed network models based on a hidden metric space. We
also prove that a greedy forwarding algorithm can find very short
paths of length O((log log n)^2) on the hypercube with n nodes,
demonstrating that distance-dependent Kronecker graphs can
generate searchable network models
Understanding CHOKe: throughput and spatial characteristics
A recently proposed active queue management, CHOKe, is stateless, simple to implement, yet surprisingly effective in protecting TCP from UDP flows. We present an equilibrium model of TCP/CHOKe. We prove that, provided the number of TCP flows is large, the UDP bandwidth share peaks at (e+1)/sup -1/=0.269 when UDP input rate is slightly larger than link capacity, and drops to zero as UDP input rate tends to infinity. We clarify the spatial characteristics of the leaky buffer under CHOKe that produce this throughput behavior. Specifically, we prove that, as UDP input rate increases, even though the total number of UDP packets in the queue increases, their spatial distribution becomes more and more concentrated near the tail of the queue, and drops rapidly to zero toward the head of the queue. In stark contrast to a nonleaky FIFO buffer where UDP bandwidth shares would approach 1 as its input rate increases without bound, under CHOKe, UDP simultaneously maintains a large number of packets in the queue and receives a vanishingly small bandwidth share, the mechanism through which CHOKe protects TCP flows
- …