313,040 research outputs found
Early Observations on Performance of Google Compute Engine for Scientific Computing
Although Cloud computing emerged for business applications in industry,
public Cloud services have been widely accepted and encouraged for scientific
computing in academia. The recently available Google Compute Engine (GCE) is
claimed to support high-performance and computationally intensive tasks, while
little evaluation studies can be found to reveal GCE's scientific capabilities.
Considering that fundamental performance benchmarking is the strategy of
early-stage evaluation of new Cloud services, we followed the Cloud Evaluation
Experiment Methodology (CEEM) to benchmark GCE and also compare it with Amazon
EC2, to help understand the elementary capability of GCE for dealing with
scientific problems. The experimental results and analyses show both potential
advantages of, and possible threats to applying GCE to scientific computing.
For example, compared to Amazon's EC2 service, GCE may better suit applications
that require frequent disk operations, while it may not be ready yet for single
VM-based parallel computing. Following the same evaluation methodology,
different evaluators can replicate and/or supplement this fundamental
evaluation of GCE. Based on the fundamental evaluation results, suitable GCE
environments can be further established for case studies of solving real
science problems.Comment: Proceedings of the 5th International Conference on Cloud Computing
Technologies and Science (CloudCom 2013), pp. 1-8, Bristol, UK, December 2-5,
201
Predicting Intermediate Storage Performance for Workflow Applications
Configuring a storage system to better serve an application is a challenging
task complicated by a multidimensional, discrete configuration space and the
high cost of space exploration (e.g., by running the application with different
storage configurations). To enable selecting the best configuration in a
reasonable time, we design an end-to-end performance prediction mechanism that
estimates the turn-around time of an application using storage system under a
given configuration. This approach focuses on a generic object-based storage
system design, supports exploring the impact of optimizations targeting
workflow applications (e.g., various data placement schemes) in addition to
other, more traditional, configuration knobs (e.g., stripe size or replication
level), and models the system operation at data-chunk and control message
level.
This paper presents our experience to date with designing and using this
prediction mechanism. We evaluate this mechanism using micro- as well as
synthetic benchmarks mimicking real workflow applications, and a real
application.. A preliminary evaluation shows that we are on a good track to
meet our objectives: it can scale to model a workflow application run on an
entire cluster while offering an over 200x speedup factor (normalized by
resource) compared to running the actual application, and can achieve, in the
limited number of scenarios we study, a prediction accuracy that enables
identifying the best storage system configuration
A Factor Framework for Experimental Design for Performance Evaluation of Commercial Cloud Services
Given the diversity of commercial Cloud services, performance evaluations of
candidate services would be crucial and beneficial for both service customers
(e.g. cost-benefit analysis) and providers (e.g. direction of service
improvement). Before an evaluation implementation, the selection of suitable
factors (also called parameters or variables) plays a prerequisite role in
designing evaluation experiments. However, there seems a lack of systematic
approaches to factor selection for Cloud services performance evaluation. In
other words, evaluators randomly and intuitively concerned experimental factors
in most of the existing evaluation studies. Based on our previous taxonomy and
modeling work, this paper proposes a factor framework for experimental design
for performance evaluation of commercial Cloud services. This framework
capsules the state-of-the-practice of performance evaluation factors that
people currently take into account in the Cloud Computing domain, and in turn
can help facilitate designing new experiments for evaluating Cloud services.Comment: 8 pages, Proceedings of the 4th International Conference on Cloud
Computing Technology and Science (CloudCom 2012), pp. 169-176, Taipei,
Taiwan, December 03-06, 201
It\u27s Fun, But Is It Science? Goals and Strategies in a Problem-Based Learning Course
All students at Hampshire College must complete a science requirement in which they demonstrate their understanding of how science is done, examine the work of science in larger contexts, and communicate their ideas effectively. Human Biology: Selected Topics in Medicine is one of 18-20 freshman seminars designed to move students toward completing this requirement. Students work in cooperative groups of 4-6 people to solve actual medical cases about which they receive information progressively. Students assign themselves homework tasks to bring information back for group deliberation. The goal is for case teams to work cooperatively to develop a differential diagnosis and recommend treatment. Students write detailed individual final case reports. Changes observed in student work over six years of developing this course include: increased motivation to pursue work in depth, more effective participation on case teams, increase in critical examination of evidence, and more fully developed arguments in final written reports. As part of a larger study of eighteen introductory science courses in two institutions, several types of pre- and post-course assessments were used to evaluate how teaching approaches might have influenced students’ attitudes about science, their ability to learn science, and their understanding of how scientific knowledge is developed [1]. Preliminary results from interviews and Likert-scale measures suggest improvements in the development of some students’ views of epistemology and in the importance of cooperative group work in facilitating that development
MPI-Vector-IO: Parallel I/O and Partitioning for Geospatial Vector Data
In recent times, geospatial datasets are growing in terms of size, complexity and heterogeneity. High performance systems are needed to analyze such data to produce actionable insights in an efficient manner. For polygonal a.k.a vector datasets, operations such as I/O, data partitioning, communication, and load balancing becomes challenging in a cluster environment. In this work, we present MPI-Vector-IO 1 , a parallel I/O library that we have designed using MPI-IO specifically for partitioning and reading irregular vector data formats such as Well Known Text. It makes MPI aware of spatial data, spatial primitives and provides support for spatial data types embedded within collective computation and communication using MPI message-passing library. These abstractions along with parallel I/O support are useful for parallel Geographic Information System (GIS) application development on HPC platforms
Optimizing I/O for Big Array Analytics
Big array analytics is becoming indispensable in answering important
scientific and business questions. Most analysis tasks consist of multiple
steps, each making one or multiple passes over the arrays to be analyzed and
generating intermediate results. In the big data setting, I/O optimization is a
key to efficient analytics. In this paper, we develop a framework and
techniques for capturing a broad range of analysis tasks expressible in
nested-loop forms, representing them in a declarative way, and optimizing their
I/O by identifying sharing opportunities. Experiment results show that our
optimizer is capable of finding execution plans that exploit nontrivial I/O
sharing opportunities with significant savings.Comment: VLDB201
ArrayBridge: Interweaving declarative array processing with high-performance computing
Scientists are increasingly turning to datacenter-scale computers to produce
and analyze massive arrays. Despite decades of database research that extols
the virtues of declarative query processing, scientists still write, debug and
parallelize imperative HPC kernels even for the most mundane queries. This
impedance mismatch has been partly attributed to the cumbersome data loading
process; in response, the database community has proposed in situ mechanisms to
access data in scientific file formats. Scientists, however, desire more than a
passive access method that reads arrays from files.
This paper describes ArrayBridge, a bi-directional array view mechanism for
scientific file formats, that aims to make declarative array manipulations
interoperable with imperative file-centric analyses. Our prototype
implementation of ArrayBridge uses HDF5 as the underlying array storage library
and seamlessly integrates into the SciDB open-source array database system. In
addition to fast querying over external array objects, ArrayBridge produces
arrays in the HDF5 file format just as easily as it can read from it.
ArrayBridge also supports time travel queries from imperative kernels through
the unmodified HDF5 API, and automatically deduplicates between array versions
for space efficiency. Our extensive performance evaluation in NERSC, a
large-scale scientific computing facility, shows that ArrayBridge exhibits
statistically indistinguishable performance and I/O scalability to the native
SciDB storage engine.Comment: 12 pages, 13 figure
PROFET: modeling system performance and energy without simulating the CPU
The approaching end of DRAM scaling and expansion of emerging memory technologies is motivating a lot of research in future memory systems. Novel memory systems are typically explored by hardware simulators that are slow and often have a simplified or obsolete abstraction of the CPU. This study presents PROFET, an analytical model that predicts how an application's performance and energy consumption changes when it is executed on different memory systems. The model is based on instrumentation of an application execution on actual hardware, so it already takes into account CPU microarchitectural details such as the data prefetcher and out-of-order engine. PROFET is evaluated on two real platforms: Sandy Bridge-EP E5-2670 and Knights Landing Xeon Phi platforms with various memory configurations. The evaluation results show that PROFET's predictions are accurate, typically with only 2% difference from the values measured on actual hardware. We release the PROFET source code and all input data required for memory system and application profiling. The released package can be seamlessly installed and used on high-end Intel platforms.Peer ReviewedPostprint (author's final draft
Ask a clearer question, get a better answer.
Many undergraduate students struggle to engage with higher order skills such as evaluation and synthesis in written assignments, either because they do not understand that these are the aim of written assessment or because these critical thinking skills require more effort than writing a descriptive essay. Here, we report that students who attended a freely available workshop, in which they were coached to pose a question in the title of their assignment and then use their essay to answer that question, obtained higher marks for their essay than those who did not attend. We demonstrate that this is not a result of latent academic ability amongst students who chose to attend our workshops and suggest this increase in marks was a result of greater engagement with ‘critical thinking’ skills, which are essential for upper 2:1 and 1st class grades. The tutoring method we used holds two particular advantages: First, we allow students to pick their own topics of interest, which increases ownership of learning, which is associated with motivation and engagement in ‘difficult’ tasks. Second, this method integrates the development of ‘inquisitiveness’ and critical thinking into subject specific learning, which is thought to be more productive than trying to develop these skills in isolation
- …