983 research outputs found
Rhymes: a shared virtual memory system for non-coherent tiled many-core architectures
The rising core count per processor is pushing chip complexity to a level that hardware-based cache coherency protocols become too hard and costly to scale. We need new designs of many-core hardware and software other than traditional technologies to keep up with the ever-increasing scalability demands. The Intel Single-chip Cloud Computer (SCC) is a recent research processor exemplifying a new cluster-on-chip architecture which promotes a software-oriented approach instead of hardware support to implementing shared memory coherence. This paper presents a shared virtual memory (SVM) system, dubbed Rhymes, tailored to such a new processor kind of non-coherent and hybrid memory architectures. Rhymes features a two-way cache coherence protocol to enforce release consistency for pages allocated in shared physical memory (SPM) and scope consistency for pages in per-core private memory. It also supports page remapping on a per-core basis to boost data locality. We implement Rhymes on the SCC port of the Barrelfish OS. Experimental results show that our SVM outperforms the pure SPM approach used by Intel's software managed coherence (SMC) library by up to 12 times, with superlinear speedups (due to L2 cache effect) noted for applications with strong data reuse patterns.published_or_final_versio
StochSoCs: High performance biocomputing simulations for large scale Systems Biology
The stochastic simulation of large-scale biochemical reaction networks is of
great importance for systems biology since it enables the study of inherently
stochastic biological mechanisms at the whole cell scale. Stochastic Simulation
Algorithms (SSA) allow us to simulate the dynamic behavior of complex kinetic
models, but their high computational cost makes them very slow for many
realistic size problems. We present a pilot service, named WebStoch, developed
in the context of our StochSoCs research project, allowing life scientists with
no high-performance computing expertise to perform over the internet stochastic
simulations of large-scale biological network models described in the SBML
standard format. Biomodels submitted to the service are parsed automatically
and then placed for parallel execution on distributed worker nodes. The workers
are implemented using multi-core and many-core processors, or FPGA accelerators
that can handle the simulation of thousands of stochastic repetitions of
complex biomodels, with possibly thousands of reactions and interacting
species. Using benchmark LCSE biomodels, whose workload can be scaled on
demand, we demonstrate linear speedup and more than two orders of magnitude
higher throughput than existing serial simulators.Comment: The 2017 International Conference on High Performance Computing &
Simulation (HPCS 2017), 8 page
Many-Core CPUs Can Deliver Scalable Performance to Stochastic Simulations of Large-Scale Biochemical Reaction Networks
Stochastic simulation of large-scale biochemical reaction networks is becoming essential for Systems Biology. It enables the in-silico investigation of complex biological system dynamics under different conditions and intervention strategies, while also taking into account the inherent "biological noise" especially present in the low species count regime. It is however a great computational challenge since in practice we need to execute many repetitions of a complex simulation model to assess the average and extreme cases behavior of the dynamical system it represents. The problem's work scales quickly, with the number of repetitions required and the number of reactions in the bio-model. The worst case scenario s when there is a need to run thousands of repetitions of a complex model with thousands of reactions. We have developed a stochastic simulation software framework for many- and multi-core CPUs. It is evaluated using Intel's experimental many-cores Single-chip Cloud Computer (SCC) CPU and the latest generation consumer grade Core i7 multi-core Intel CPU, when running Gillespie's First Reaction Method exact stochastic simulation algorithm. It is shown that emerging many-core NoC processors can provide scalable performance achieving linear speedup as simulation work scales in both dimensions
Ringo: Interactive Graph Analytics on Big-Memory Machines
We present Ringo, a system for analysis of large graphs. Graphs provide a way
to represent and analyze systems of interacting objects (people, proteins,
webpages) with edges between the objects denoting interactions (friendships,
physical interactions, links). Mining graphs provides valuable insights about
individual objects as well as the relationships among them.
In building Ringo, we take advantage of the fact that machines with large
memory and many cores are widely available and also relatively affordable. This
allows us to build an easy-to-use interactive high-performance graph analytics
system. Graphs also need to be built from input data, which often resides in
the form of relational tables. Thus, Ringo provides rich functionality for
manipulating raw input data tables into various kinds of graphs. Furthermore,
Ringo also provides over 200 graph analytics functions that can then be applied
to constructed graphs.
We show that a single big-memory machine provides a very attractive platform
for performing analytics on all but the largest graphs as it offers excellent
performance and ease of use as compared to alternative approaches. With Ringo,
we also demonstrate how to integrate graph analytics with an iterative process
of trial-and-error data exploration and rapid experimentation, common in data
mining workloads.Comment: 6 pages, 2 figure
- …