19,465 research outputs found
Scalable parallel communications
Coarse-grain parallelism in networking (that is, the use of multiple protocol processors running replicated software sending over several physical channels) can be used to provide gigabit communications for a single application. Since parallel network performance is highly dependent on real issues such as hardware properties (e.g., memory speeds and cache hit rates), operating system overhead (e.g., interrupt handling), and protocol performance (e.g., effect of timeouts), we have performed detailed simulations studies of both a bus-based multiprocessor workstation node (based on the Sun Galaxy MP multiprocessor) and a distributed-memory parallel computer node (based on the Touchstone DELTA) to evaluate the behavior of coarse-grain parallelism. Our results indicate: (1) coarse-grain parallelism can deliver multiple 100 Mbps with currently available hardware platforms and existing networking protocols (such as Transmission Control Protocol/Internet Protocol (TCP/IP) and parallel Fiber Distributed Data Interface (FDDI) rings); (2) scale-up is near linear in n, the number of protocol processors, and channels (for small n and up to a few hundred Mbps); and (3) since these results are based on existing hardware without specialized devices (except perhaps for some simple modifications of the FDDI boards), this is a low cost solution to providing multiple 100 Mbps on current machines. In addition, from both the performance analysis and the properties of these architectures, we conclude: (1) multiple processors providing identical services and the use of space division multiplexing for the physical channels can provide better reliability than monolithic approaches (it also provides graceful degradation and low-cost load balancing); (2) coarse-grain parallelism supports running several transport protocols in parallel to provide different types of service (for example, one TCP handles small messages for many users, other TCP's running in parallel provide high bandwidth service to a single application); and (3) coarse grain parallelism will be able to incorporate many future improvements from related work (e.g., reduced data movement, fast TCP, fine-grain parallelism) also with near linear speed-ups
Optimising Simulation Data Structures for the Xeon Phi
In this paper, we propose a lock-free architecture
to accelerate logic gate circuit simulation using SIMD multi-core
machines. We evaluate its performance on different test circuits
simulated on the Intel Xeon Phi and 2 other machines. Comparisons
are presented of this software/hardware combination with
reported performances of GPU and other multi-core simulation
platforms. Comparisons are also given between the lock free
architecture and a leading commercial simulator running on the
same Intel hardware
Recommended from our members
Benchmarking the Intel®Xeon®Platinum 8160 Processor
This report presents a set of results for different microbenchmarks and applications on the Intel
Xeon Platinum8160 Processor, formerly known as Skylake. For simplicity, we will use both Skylake
and SKX to refer to this processor. We use the Skylake nodes that will be available in Stampede2.
This systemwill provide Intel Knights Landing and Skylake chips interconnected by a 100 Gb/sec
Intel Omni-Path (OPA) network with a fat tree topology. The peak performance of the system will
be 18 PF.Texas Advanced Computing Center (TACC
Model-driven Scheduling for Distributed Stream Processing Systems
Distributed Stream Processing frameworks are being commonly used with the
evolution of Internet of Things(IoT). These frameworks are designed to adapt to
the dynamic input message rate by scaling in/out.Apache Storm, originally
developed by Twitter is a widely used stream processing engine while others
includes Flink, Spark streaming. For running the streaming applications
successfully there is need to know the optimal resource requirement, as
over-estimation of resources adds extra cost.So we need some strategy to come
up with the optimal resource requirement for a given streaming application. In
this article, we propose a model-driven approach for scheduling streaming
applications that effectively utilizes a priori knowledge of the applications
to provide predictable scheduling behavior. Specifically, we use application
performance models to offer reliable estimates of the resource allocation
required. Further, this intuition also drives resource mapping, and helps
narrow the estimated and actual dataflow performance and resource utilization.
Together, this model-driven scheduling approach gives a predictable application
performance and resource utilization behavior for executing a given DSPS
application at a target input stream rate on distributed resources.Comment: 54 page
Is Our Model for Contention Resolution Wrong?
Randomized binary exponential backoff (BEB) is a popular algorithm for
coordinating access to a shared channel. With an operational history exceeding
four decades, BEB is currently an important component of several wireless
standards. Despite this track record, prior theoretical results indicate that
under bursty traffic (1) BEB yields poor makespan and (2) superior algorithms
are possible. To date, the degree to which these findings manifest in practice
has not been resolved.
To address this issue, we examine one of the strongest cases against BEB:
packets that simultaneously begin contending for the wireless channel. Using
Network Simulator 3, we compare against more recent algorithms that are
inspired by BEB, but whose makespan guarantees are superior. Surprisingly, we
discover that these newer algorithms significantly underperform. Through
further investigation, we identify as the culprit a flawed but common
abstraction regarding the cost of collisions. Our experimental results are
complemented by analytical arguments that the number of collisions -- and not
solely makespan -- is an important metric to optimize. We believe that these
findings have implications for the design of contention-resolution algorithms.Comment: Accepted to the 29th ACM Symposium on Parallelism in Algorithms and
Architectures (SPAA 2017
- …