Search CORE

295 research outputs found

Efficient algorithms for factorization and join of blades

Author: Dorst L.
Fontijne D.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

International Migration, Integration and Social Cohesion online publications

Efficient algorithms for factorization and join of blades

Author: Dorst L.
Fontijne D.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

International Migration, Integration and Social Cohesion online publications

Efficient Algorithms for Factorization and Join of Blades

Author: D. Hildenbrand
L. Dorst
L. Dorst
T. Bouma
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Crossref

UvA-DARE

International Migration, Integration and Social Cohesion online publications

Geometric Algebra Transformers

Author: Behrends Sönke
Brehmer Johann
Cohen Taco
de Haan Pim
Publication venue
Publication date: 28/05/2023
Field of study

Problems involving geometric data arise in a variety of fields, including computer vision, robotics, chemistry, and physics. Such data can take numerous forms, such as points, direction vectors, planes, or transformations, but to date there is no single architecture that can be applied to such a wide variety of geometric types while respecting their symmetries. In this paper we introduce the Geometric Algebra Transformer (GATr), a general-purpose architecture for geometric data. GATr represents inputs, outputs, and hidden states in the projective geometric algebra, which offers an efficient 16-dimensional vector space representation of common geometric objects as well as operators acting on them. GATr is equivariant with respect to E(3), the symmetry group of 3D Euclidean space. As a transformer, GATr is scalable, expressive, and versatile. In experiments with n-body modeling and robotic planning, GATr shows strong improvements over non-geometric baselines

arXiv.org e-Print Archive

Scalable Task Parallelism for NUMA: A Uniform Abstraction for Coordinated Scheduling and Memory Management

Author: Cohen Albert
Drach Nathalie
Drebes Andi
Heydemann Karine
Pop Antoniu
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 11/09/2016
Field of study

International audienceDynamic task-parallel programming models are popular on shared-memory systems, promising enhanced scalability, load balancing and locality. Yet these promises are undermined by non-uniform memory access (NUMA). We show that using NUMA-aware task and data placement, it is possible to preserve the uniform abstraction of both computing and memory resources for task-parallel programming models while achieving high data locality. Our data placement scheme guarantees that all accesses to task output data target the local memory of the accessing core. The complementary task placement heuristic improves the locality of task input data on a best effort basis. Our algorithms take advantage of data-flow style task parallelism, where the privatization of task data enhances scalability by eliminating false dependences and enabling fine-grained dynamic control over data placement. The algorithms are fully automatic, application-independent, performance-portable across NUMA machines, and adapt to dynamic changes. Placement decisions use information about inter-task data dependences readily available in the run-time system and placement information from the operating system. We achieve 94% of local memory accesses on a 192-core system with 24 NUMA nodes, up to 5× higher performance than NUMA-aware hierarchical work-stealing, and even 5.6× compared to static interleaved allocation. Finally, we show that state-of-the-art dynamic page migration by the operating system cannot catch up with frequent affinity changes between cores and data and thus fails to accelerate task-parallel applications

Crossref

INRIA a CCSD electronic archive server

The University of Manchester - Institutional Repository

Optimizing MPI one-sided synchronization mechanisms on Cray's Cascade HPC systems

Author: BELLI ROBERTO
Publication venue: 'Pisa University Press'
Publication date: 03/12/2014
Field of study

In this work we proposed Notified Access a new communication model that targets RDMA networks. Our focus was on optimizing producer-consumer computations, avoiding to over synchronize processes in point-to-point communications when it's not needed. We proposed a communication model in which a notification can be coupled with a single Remote Memory Access (RMA). In our model the target of an RMA operation is directly notified after the completion of a notified operation. This approach, avoiding the use of other synchronization primitives, minimizes synchronization latencies while using full hardware offload typical of high-performance networks. In order to demonstrate lower overheads than other point-to-point synchronization mechanisms, we implemented it in an open source MPI-3 library. We evaluated the performances of our implementation in a ping-pong benchmark, a computation/communication overlap benchmark and in three real-world applications: a pipeline stencil, a tree-based reduce and a task based Cholesky factorization. Our analysis shows that Notified Access is a valuable primitive for any RMA system and furthermore we show that the required hardware feature are already available in multiple state-of-the-art high-performance networks

Electronic Thesis and Dissertation Archive - Università di Pisa

Recommended from our members

Galois : a system for parallel execution of irregular algorithms

Author: Nguyen Donald Do
Publication venue
Publication date: 04/09/2015
Field of study

textA programming model which allows users to program with high productivity and which produces high performance executions has been a goal for decades. This dissertation makes progress towards this elusive goal by describing the design and implementation of the Galois system, a parallel programming model for shared-memory, multicore machines. Central to the design is the idea that scheduling of a program can be decoupled from the core computational operator and data structures. However, efficient programs often require application-specific scheduling to achieve best performance. To bridge this gap, an extensible and abstract scheduling policy language is proposed, which allows programmers to focus on selecting high-level scheduling policies while delegating the tedious task of implementing the policy to a scheduler synthesizer and runtime system. Implementations of deterministic and prioritized scheduling also are described. An evaluation of a well-studied benchmark suite reveals that factoring programs into operators, schedulers and data structures can produce significant performance improvements over unfactored approaches. Comparison of the Galois system with existing programming models for graph analytics shows significant performance improvements, often orders of magnitude more, due to (1) better support for the restrictive programming models of existing systems and (2) better support for more sophisticated algorithms and scheduling, which cannot be expressed in other systems.Computer Science

Texas ScholarWorks

Parallel processing for nonlinear dynamics simulations of structures including rotating bladed-disk assemblies

Author: Hsieh Shang-Hsien
Publication venue
Publication date
Field of study

The principal objective of this research is to develop, test, and implement coarse-grained, parallel-processing strategies for nonlinear dynamic simulations of practical structural problems. There are contributions to four main areas: finite element modeling and analysis of rotational dynamics, numerical algorithms for parallel nonlinear solutions, automatic partitioning techniques to effect load-balancing among processors, and an integrated parallel analysis system

NASA Technical Reports Server