3,380 research outputs found

    Conflict-free star-access in parallel memory systems

    Get PDF
    We study conflict-free data distribution schemes in parallel memories in multiprocessor system architectures. Given a host graph G, the problem is to map the nodes of G into memory modules such that any instance of a template type T in G can be accessed without memory conflicts. A conflict occurs if two or more nodes of T are mapped to the same memory module. The mapping algorithm should: (i) be fast in terms of data access (possibly mapping each node in constant time); (ii) minimize the required number of memory modules for accessing any instance in G of the given template type; and (iii) guarantee load balancing on the modules. In this paper, we consider conflict-free access to star templates. i.e., to any node of G along with all of its neighbors. Such a template type arises in many classical algorithms like breadth-first search in a graph, message broadcasting in networks, and nearest neighbor based approximation in numerical computation. We consider the star-template access problem on two specific host graphs-tori and hypercubes-that are also popular interconnection network topologies. The proposed conflict-free mappings on these graphs are fast, use an optimal or provably good number of memory modules, and guarantee load balancing. (C) 2006 Elsevier Inc. All rights reserved

    Doctor of Philosophy

    Get PDF
    dissertationStochastic methods, dense free-form mapping, atlas construction, and total variation are examples of advanced image processing techniques which are robust but computationally demanding. These algorithms often require a large amount of computational power as well as massive memory bandwidth. These requirements used to be ful lled only by supercomputers. The development of heterogeneous parallel subsystems and computation-specialized devices such as Graphic Processing Units (GPUs) has brought the requisite power to commodity hardware, opening up opportunities for scientists to experiment and evaluate the in uence of these techniques on their research and practical applications. However, harnessing the processing power from modern hardware is challenging. The di fferences between multicore parallel processing systems and conventional models are signi ficant, often requiring algorithms and data structures to be redesigned signi ficantly for efficiency. It also demands in-depth knowledge about modern hardware architectures to optimize these implementations, sometimes on a per-architecture basis. The goal of this dissertation is to introduce a solution for this problem based on a 3D image processing framework, using high performance APIs at the core level to utilize parallel processing power of the GPUs. The design of the framework facilitates an efficient application development process, which does not require scientists to have extensive knowledge about GPU systems, and encourages them to harness this power to solve their computationally challenging problems. To present the development of this framework, four main problems are described, and the solutions are discussed and evaluated: (1) essential components of a general 3D image processing library: data structures and algorithms, as well as how to implement these building blocks on the GPU architecture for optimal performance; (2) an implementation of unbiased atlas construction algorithms|an illustration of how to solve a highly complex and computationally expensive algorithm using this framework; (3) an extension of the framework to account for geometry descriptors to solve registration challenges with large scale shape changes and high intensity-contrast di fferences; and (4) an out-of-core streaming model, which enables developers to implement multi-image processing techniques on commodity hardware

    An Object-oriented Environment for Developing Finite Element Codes for Multi-disciplinary Applications

    Get PDF
    The objective of this work is to describe the design and implementation of a framework for building multi-disciplinary finite element programs. The main goals are generality, reusability, extendibility, good performance and memory efficiency. Another objective is preparing the code structure for team development to ensure the easy collaboration of experts in different fields in the development of multi-disciplinary applications. Kratos, the framework described in this work, contains several tools for the easy implementation of finite element applications and also provides a common platform for the natural interaction of different applications. To achieve this, an innovative variable base interface is designed and implemented. This interface is used at different levels of abstraction and showed to be very clear and extendible. A very efficient and flexible data structure and an extensible IO are created to overcome difficulties in dealing with multi-disciplinary problems. Several other concepts in existing works are also collected and adapted to coupled problems. The use of an interpreter, of different data layouts and variable number of dofs per node are examples of such approach. In order to minimize the possible conflicts arising in the development, a kernel and application approach is used. The code is structured in layers to reflect the working space of developers with different fields of expertise. Details are given on the approach chosen to increase performance and efficiency. Examples of application of Kratos to different multidisciplinary problems are presented in order to demonstrate the applicability and efficiency of the new object oriented environment

    Directions in parallel programming: HPF, shared virtual memory and object parallelism in pC++

    Get PDF
    Fortran and C++ are the dominant programming languages used in scientific computation. Consequently, extensions to these languages are the most popular for programming massively parallel computers. We discuss two such approaches to parallel Fortran and one approach to C++. The High Performance Fortran Forum has designed HPF with the intent of supporting data parallelism on Fortran 90 applications. HPF works by asking the user to help the compiler distribute and align the data structures with the distributed memory modules in the system. Fortran-S takes a different approach in which the data distribution is managed by the operating system and the user provides annotations to indicate parallel control regions. In the case of C++, we look at pC++ which is based on a concurrent aggregate parallel model

    Finite Element Modeling Driven by Health Care and Aerospace Applications

    Get PDF
    This thesis concerns the development, analysis, and computer implementation of mesh generation algorithms encountered in finite element modeling in health care and aerospace. The finite element method can reduce a continuous system to a discrete idealization that can be solved in the same manner as a discrete system, provided the continuum is discretized into a finite number of simple geometric shapes (e.g., triangles in two dimensions or tetrahedrons in three dimensions). In health care, namely anatomic modeling, a discretization of the biological object is essential to compute tissue deformation for physics-based simulations. This thesis proposes an efficient procedure to convert 3-dimensional imaging data into adaptive lattice-based discretizations of well-shaped tetrahedra or mixed elements (i.e., tetrahedra, pentahedra and hexahedra). This method operates directly on segmented images, thus skipping a surface reconstruction that is required by traditional Computer-Aided Design (CAD)-based meshing techniques and is convoluted, especially in complex anatomic geometries. Our approach utilizes proper mesh gradation and tissue-specific multi-resolution, without sacrificing the fidelity and while maintaining a smooth surface to reflect a certain degree of visual reality. Image-to-mesh conversion can facilitate accurate computational modeling for biomechanical registration of Magnetic Resonance Imaging (MRI) in image-guided neurosurgery. Neuronavigation with deformable registration of preoperative MRI to intraoperative MRI allows the surgeon to view the location of surgical tools relative to the preoperative anatomical (MRI) or functional data (DT-MRI, fMRI), thereby avoiding damage to eloquent areas during tumor resection. This thesis presents a deformable registration framework that utilizes multi-tissue mesh adaptation to map preoperative MRI to intraoperative MRI of patients who have undergone a brain tumor resection. Our enhancements with mesh adaptation improve the accuracy of the registration by more than 5 times compared to rigid and traditional physics-based non-rigid registration, and by more than 4 times compared to publicly available B-Spline interpolation methods. The adaptive framework is parallelized for shared memory multiprocessor architectures. Performance analysis shows that this method could be applied, on average, in less than two minutes, achieving desirable speed for use in a clinical setting. The last part of this thesis focuses on finite element modeling of CAD data. This is an integral part of the design and optimization of components and assemblies in industry. We propose a new parallel mesh generator for efficient tetrahedralization of piecewise linear complex domains in aerospace. CAD-based meshing algorithms typically improve the shape of the elements in a post-processing step due to high complexity and cost of the operations involved. On the contrary, our method optimizes the shape of the elements throughout the generation process to obtain a maximum quality and utilizes high performance computing to reduce the overheads and improve end-user productivity. The proposed mesh generation technique is a combination of Advancing Front type point placement, direct point insertion, and parallel multi-threaded connectivity optimization schemes. The mesh optimization is based on a speculative (optimistic) approach that has been proven to perform well on hardware-shared memory. The experimental evaluation indicates that the high quality and performance attributes of this method see substantial improvement over existing state-of-the-art unstructured grid technology currently incorporated in several commercial systems. The proposed mesh generator will be part of an Extreme-Scale Anisotropic Mesh Generation Environment to meet industries expectations and NASA\u27s CFD visio

    Modern data analytics in the cloud era

    Get PDF
    Cloud Computing ist die dominante Technologie des letzten Jahrzehnts. Die Benutzerfreundlichkeit der verwalteten Umgebung in Kombination mit einer nahezu unbegrenzten Menge an Ressourcen und einem nutzungsabhängigen Preismodell ermöglicht eine schnelle und kosteneffiziente Projektrealisierung für ein breites Nutzerspektrum. Cloud Computing verändert auch die Art und Weise wie Software entwickelt, bereitgestellt und genutzt wird. Diese Arbeit konzentriert sich auf Datenbanksysteme, die in der Cloud-Umgebung eingesetzt werden. Wir identifizieren drei Hauptinteraktionspunkte der Datenbank-Engine mit der Umgebung, die veränderte Anforderungen im Vergleich zu traditionellen On-Premise-Data-Warehouse-Lösungen aufweisen. Der erste Interaktionspunkt ist die Interaktion mit elastischen Ressourcen. Systeme in der Cloud sollten Elastizität unterstützen, um den Lastanforderungen zu entsprechen und dabei kosteneffizient zu sein. Wir stellen einen elastischen Skalierungsmechanismus für verteilte Datenbank-Engines vor, kombiniert mit einem Partitionsmanager, der einen Lastausgleich bietet und gleichzeitig die Neuzuweisung von Partitionen im Falle einer elastischen Skalierung minimiert. Darüber hinaus führen wir eine Strategie zum initialen Befüllen von Puffern ein, die es ermöglicht, skalierte Ressourcen unmittelbar nach der Skalierung auszunutzen. Cloudbasierte Systeme sind von fast überall aus zugänglich und verfügbar. Daten werden häufig von zahlreichen Endpunkten aus eingespeist, was sich von ETL-Pipelines in einer herkömmlichen Data-Warehouse-Lösung unterscheidet. Viele Benutzer verzichten auf die Definition von strikten Schemaanforderungen, um Transaktionsabbrüche aufgrund von Konflikten zu vermeiden oder um den Ladeprozess von Daten zu beschleunigen. Wir führen das Konzept der PatchIndexe ein, die die Definition von unscharfen Constraints ermöglichen. PatchIndexe verwalten Ausnahmen zu diesen Constraints, machen sie für die Optimierung und Ausführung von Anfragen nutzbar und bieten effiziente Unterstützung bei Datenaktualisierungen. Das Konzept kann auf beliebige Constraints angewendet werden und wir geben Beispiele für unscharfe Eindeutigkeits- und Sortierconstraints. Darüber hinaus zeigen wir, wie PatchIndexe genutzt werden können, um fortgeschrittene Constraints wie eine unscharfe Multi-Key-Partitionierung zu definieren, die eine robuste Anfrageperformance bei Workloads mit unterschiedlichen Partitionsanforderungen bietet. Der dritte Interaktionspunkt ist die Nutzerinteraktion. Datengetriebene Anwendungen haben sich in den letzten Jahren verändert. Neben den traditionellen SQL-Anfragen für Business Intelligence sind heute auch datenwissenschaftliche Anwendungen von großer Bedeutung. In diesen Fällen fungiert das Datenbanksystem oft nur als Datenlieferant, während der Rechenaufwand in dedizierten Data-Science- oder Machine-Learning-Umgebungen stattfindet. Wir verfolgen das Ziel, fortgeschrittene Analysen in Richtung der Datenbank-Engine zu verlagern und stellen das Grizzly-Framework als DataFrame-zu-SQL-Transpiler vor. Auf dieser Grundlage identifizieren wir benutzerdefinierte Funktionen (UDFs) und maschinelles Lernen (ML) als wichtige Aufgaben, die von einer tieferen Integration in die Datenbank-Engine profitieren würden. Daher untersuchen und bewerten wir Ansätze für die datenbankinterne Ausführung von Python-UDFs und datenbankinterne ML-Inferenz.Cloud computing has been the groundbreaking technology of the last decade. The ease-of-use of the managed environment in combination with nearly infinite amount of resources and a pay-per-use price model enables fast and cost-efficient project realization for a broad range of users. Cloud computing also changes the way software is designed, deployed and used. This thesis focuses on database systems deployed in the cloud environment. We identify three major interaction points of the database engine with the environment that show changed requirements compared to traditional on-premise data warehouse solutions. First, software is deployed on elastic resources. Consequently, systems should support elasticity in order to match workload requirements and be cost-effective. We present an elastic scaling mechanism for distributed database engines, combined with a partition manager that provides load balancing while minimizing partition reassignments in the case of elastic scaling. Furthermore we introduce a buffer pre-heating strategy that allows to mitigate a cold start after scaling and leads to an immediate performance benefit using scaling. Second, cloud based systems are accessible and available from nearly everywhere. Consequently, data is frequently ingested from numerous endpoints, which differs from bulk loads or ETL pipelines in a traditional data warehouse solution. Many users do not define database constraints in order to avoid transaction aborts due to conflicts or to speed up data ingestion. To mitigate this issue we introduce the concept of PatchIndexes, which allow the definition of approximate constraints. PatchIndexes maintain exceptions to constraints, make them usable in query optimization and execution and offer efficient update support. The concept can be applied to arbitrary constraints and we provide examples of approximate uniqueness and approximate sorting constraints. Moreover, we show how PatchIndexes can be exploited to define advanced constraints like an approximate multi-key partitioning, which offers robust query performance over workloads with different partition key requirements. Third, data-centric workloads changed over the last decade. Besides traditional SQL workloads for business intelligence, data science workloads are of significant importance nowadays. For these cases the database system might only act as data delivery, while the computational effort takes place in data science or machine learning (ML) environments. As this workflow has several drawbacks, we follow the goal of pushing advanced analytics towards the database engine and introduce the Grizzly framework as a DataFrame-to-SQL transpiler. Based on this we identify user-defined functions (UDFs) and machine learning inference as important tasks that would benefit from a deeper engine integration and investigate approaches to push these operations towards the database engine

    A framework for developing finite element codes for multi- disciplinary applications

    Get PDF
    The world of computing simulation has experienced great progresses in recent years and requires more exigent multidisciplinary challenges to satisfy the new upcoming demands. Increasing the importance of solving multi-disciplinary problems makes developers put more attention to these problems and deal with difficulties involved in developing software in this area. Conventional finite element codes have several difficulties in dealing with multi-disciplinary problems. Many of these codes are designed and implemented for solving a certain type of problems, generally involving a single field. Extending these codes to deal with another field of analysis usually consists of several problems and large amounts of modifications and implementations. Some typical difficulties are: predefined set of degrees of freedom per node, data structure with fixed set of defined variables, global list of variables for all entities, domain based interfaces, IO restriction in reading new data and writing new results and algorithm definition inside the code. A common approach is to connect different solvers via a master program which implements the interaction algorithms and also transfers data from one solver to another. This approach has been used successfully in practice but results duplicated implementation and redundant overhead of data storing and transferring which may be significant depending to the solvers data structure. The objective of this work is to design and implement a framework for building multi-disciplinary finite element programs. Generality, reusability, extendibility, good performance and memory efficiency are considered to be the main points in design and implementation of this framework. Preparing the structure for team development is another objective because usually a team of experts in different fields are involved in the development of multi-disciplinary code. Kratos, the framework created in this work, provides several tools for easy implementation of finite element applications and also provides a common platform for natural interaction of its applications in different ways. This is done not only by a number of innovations but also by collecting and reusing several existing works. In this work an innovative variable base interface is designed and implemented which is used at different levels of abstraction and showed to be very clear and extendible. Another innovation is a very efficient and flexible data structure which can be used to store any type of data in a type-safe manner. An extendible IO is also created to overcome another bottleneck in dealing with multi-disciplinary problems. Collecting different concepts of existing works and adapting them to coupled problems is considered to be another innovation in this work. Examples are using an interpreter, different data organizations and variable number of dofs per node. The kernel and application approach is used to reduce the possible conflicts arising between developers of different fields and layers are designed to reflect the working space of different developers also considering their programming knowledge. Finally several technical details are applied in order to increase the performance and efficiency of Kratos which makes it practically usable. This work is completed by demonstrating the framework’s functionality in practice. First some classical single field applications like thermal, fluid and structural applications are implemented and used as benchmark to prove its performance. These applications are used to solve coupled problems in order to demonstrate the natural interaction facility provided by the framework. Finally some less classical coupled finite element algorithms are implemented to show its high flexibility and extendibility

    On pulsar radio emission

    Get PDF
    This work intends to contribute to the understanding of the radio emission of pulsars. Pulsars are neutron stars with a radius of about 10^6 cm and a mass of about one to three solar masses, that rotate with a period between seconds and milliseconds. They exhibit tremendous magnetic fields of 10^8 to 10^13 Gauss. These fields facilitate the conversion of rotational energy to mainly dipole radiation, x-ray emission and the pulsar wind. Less than a thousandth of the total energy loss is being emitted as radio emission. This contribution however is generated by a collective plasma radiation process that acts coherently on a time scale of nanoseconds and below. Since the topic has been an active field of research for nearly half a century, we introduce the resulting theoretical concepts and ideas for an emission process and the appearance of the so called “magnetosphere”, the plasma filled volume around a pulsar, in Chapter 1. We show that many basic questions have been answered satisfactorily. Questions concerning the emission process, however, suffer some uncertainty. Especially the exact energy source of the radio emission remains unclear. The early works of Goldreich and Julian [1969] and Ruderman and Sutherland [1975] predict high electric fields to arise that are capable of driving a strong electric current. To supplement the energy to power the radio emission, rather mildly relativistic particle energies and a moderate current are favourable. How the system converts current into flow is unclear. In fact, the earlier theories are opposed by recent simulations that also do not predict a relativistic flow near the pulsar. We examine the observed radiation and its form, especially in light of the illustrated models in Chapter 2. We notice that the radio emission is generated in extremely short time scales, that are comparable to the inverse of the Plasma frequency. We elaborate why this places high demands on the theoretical models leaving in fact only one viable candidate process. We conclude that profound questions of energy flow and energy source remain unanswered by current theory. Furthermore, the compression of available energy in space and time to a few centimetres and nanoseconds remains unclear, especially when facing the fact that only a small fraction of the theoretically available energy is being converted. Since the fluctuations relevant for the compression of the energy take place on an intermediate scale of nanoseconds to micro- and milliseconds, it should be possible to detect these observationally. To facilitate this, we decide to analyse the statistics of the Receiver equation of radio radiation in Chapter 3, also since this is relevant to other topics of Pulsar research. The results presented in Chapter 4 show that the developed Bayesian method excels conventional methods to extract parameters from observation data in both precision and accuracy. The method for example weights rotation phase measurements differently than conventional techniques and assigns a more accurate error estimation to single measurements. This is of great relevance to gravitational wave search with so called “pulsar timing array”, as the validity of the total measurement is substantially dependent on the understanding of the accuracy assigned to the single observations. However, the work on single observation data with Bayesian techniques also exemplifies the numerical limits of this method. It is desirable to enable algorithms to include single observation data in the analysis. Therefore we developed a runtime library that writes out currently unneeded data to hard disk, being capable to manage huge data sets (substantial fractions of the hard disk space, not the main memory) in Chapter 5. This library has been written in a generic form so that it can be also used in other data-intensive areas of research. While we thereby lay the foundations to evaluate fluctuation models by observational data, we approach the problem from theoretical grounds in Chapter 6. We propose that the energetic coupling of radio emission could be of magnetic origin, as this is also a relevant mechanism in solar flare physics. We argue in a general way that the rotation of the pulsar pumps energy into the magnetic field, due to topological reasons. This energy can be released again by current decay. We show that already the annihilation of electrons and positrons may suffice to generate radio emission on non-negligible energy scales. This mechanism is not dependent on relativistic flow and thus does not suffer from the problem of requiring high kinetic particle energies. We conclude that the existing gaps in the theory of the radio emission process could possibly be closed in the future, if we analyse observational data statistically more accurate and especially if we put more effort into understanding the problem of energy transport. This thesis serves as an example that scientific investigation of a very theoretical question such as the origin of radio emission can lead to results that may be used directly in other Areas of research

    Decompose and Conquer: Addressing Evasive Errors in Systems on Chip

    Full text link
    Modern computer chips comprise many components, including microprocessor cores, memory modules, on-chip networks, and accelerators. Such system-on-chip (SoC) designs are deployed in a variety of computing devices: from internet-of-things, to smartphones, to personal computers, to data centers. In this dissertation, we discuss evasive errors in SoC designs and how these errors can be addressed efficiently. In particular, we focus on two types of errors: design bugs and permanent faults. Design bugs originate from the limited amount of time allowed for design verification and validation. Thus, they are often found in functional features that are rarely activated. Complete functional verification, which can eliminate design bugs, is extremely time-consuming, thus impractical in modern complex SoC designs. Permanent faults are caused by failures of fragile transistors in nano-scale semiconductor manufacturing processes. Indeed, weak transistors may wear out unexpectedly within the lifespan of the design. Hardware structures that reduce the occurrence of permanent faults incur significant silicon area or performance overheads, thus they are infeasible for most cost-sensitive SoC designs. To tackle and overcome these evasive errors efficiently, we propose to leverage the principle of decomposition to lower the complexity of the software analysis or the hardware structures involved. To this end, we present several decomposition techniques, specific to major SoC components. We first focus on microprocessor cores, by presenting a lightweight bug-masking analysis that decomposes a program into individual instructions to identify if a design bug would be masked by the program's execution. We then move to memory subsystems: there, we offer an efficient memory consistency testing framework to detect buggy memory-ordering behaviors, which decomposes the memory-ordering graph into small components based on incremental differences. We also propose a microarchitectural patching solution for memory subsystem bugs, which augments each core node with a small distributed programmable logic, instead of including a global patching module. In the context of on-chip networks, we propose two routing reconfiguration algorithms that bypass faulty network resources. The first computes short-term routes in a distributed fashion, localized to the fault region. The second decomposes application-aware routing computation into simple routing rules so to quickly find deadlock-free, application-optimized routes in a fault-ridden network. Finally, we consider general accelerator modules in SoC designs. When a system includes many accelerators, there are a variety of interactions among them that must be verified to catch buggy interactions. To this end, we decompose such inter-module communication into basic interaction elements, which can be reassembled into new, interesting tests. Overall, we show that the decomposition of complex software algorithms and hardware structures can significantly reduce overheads: up to three orders of magnitude in the bug-masking analysis and the application-aware routing, approximately 50 times in the routing reconfiguration latency, and 5 times on average in the memory-ordering graph checking. These overhead reductions come with losses in error coverage: 23% undetected bug-masking incidents, 39% non-patchable memory bugs, and occasionally we overlook rare patterns of multiple faults. In this dissertation, we discuss the ideas and their trade-offs, and present future research directions.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/147637/1/doowon_1.pd
    corecore