3,981 research outputs found
Metascheduling of HPC Jobs in Day-Ahead Electricity Markets
High performance grid computing is a key enabler of large scale collaborative
computational science. With the promise of exascale computing, high performance
grid systems are expected to incur electricity bills that grow super-linearly
over time. In order to achieve cost effectiveness in these systems, it is
essential for the scheduling algorithms to exploit electricity price
variations, both in space and time, that are prevalent in the dynamic
electricity price markets. In this paper, we present a metascheduling algorithm
to optimize the placement of jobs in a compute grid which consumes electricity
from the day-ahead wholesale market. We formulate the scheduling problem as a
Minimum Cost Maximum Flow problem and leverage queue waiting time and
electricity price predictions to accurately estimate the cost of job execution
at a system. Using trace based simulation with real and synthetic workload
traces, and real electricity price data sets, we demonstrate our approach on
two currently operational grids, XSEDE and NorduGrid. Our experimental setup
collectively constitute more than 433K processors spread across 58 compute
systems in 17 geographically distributed locations. Experiments show that our
approach simultaneously optimizes the total electricity cost and the average
response time of the grid, without being unfair to users of the local batch
systems.Comment: Appears in IEEE Transactions on Parallel and Distributed System
Workload characterization of the shared/buy-in computing cluster at Boston University
Computing clusters provide a complete environment
for computational research, including bio-informatics, machine
learning, and image processing. The Shared Computing Cluster
(SCC) at Boston University is based on a shared/buy-in architecture
that combines shared computers, which are free to be
used by all users, and buy-in computers, which are computers
purchased by users for semi-exclusive use. Although there exists
significant work on characterizing the performance of computing
clusters, little is known about shared/buy-in architectures. Using
data traces, we statistically analyze the performance of the SCC.
Our results show that the average waiting time of a buy-in job
is 16.1% shorter than that of a shared job. Furthermore, we
identify parameters that have a major impact on the performance
experienced by shared and buy-in jobs. These parameters include
the type of parallel environment and the run time limit (i.e., the
maximum time during which a job can use a resource). Finally,
we show that the semi-exclusive paradigm, which allows any SCC
user to use idle buy-in resources for a limited time, increases
the utilization of buy-in resources by 17.4%, thus significantly
improving the performance of the system as a whole.http://people.bu.edu/staro/MIT_Conference_Yoni.pdfAccepted manuscrip
DualTable: A Hybrid Storage Model for Update Optimization in Hive
Hive is the most mature and prevalent data warehouse tool providing SQL-like
interface in the Hadoop ecosystem. It is successfully used in many Internet
companies and shows its value for big data processing in traditional
industries. However, enterprise big data processing systems as in Smart Grid
applications usually require complicated business logics and involve many data
manipulation operations like updates and deletes. Hive cannot offer sufficient
support for these while preserving high query performance. Hive using the
Hadoop Distributed File System (HDFS) for storage cannot implement data
manipulation efficiently and Hive on HBase suffers from poor query performance
even though it can support faster data manipulation.There is a project based on
Hive issue Hive-5317 to support update operations, but it has not been finished
in Hive's latest version. Since this ACID compliant extension adopts same data
storage format on HDFS, the update performance problem is not solved.
In this paper, we propose a hybrid storage model called DualTable, which
combines the efficient streaming reads of HDFS and the random write capability
of HBase. Hive on DualTable provides better data manipulation support and
preserves query performance at the same time. Experiments on a TPC-H data set
and on a real smart grid data set show that Hive on DualTable is up to 10 times
faster than Hive when executing update and delete operations.Comment: accepted by industry session of ICDE201
- …