212 research outputs found

    Hypercube algorithms on mesh connected multicomputers

    Get PDF
    A new methodology named CALMANT (CC-cube Algorithms on Meshes and Tori) for mapping a type of algorithm that we call CC-cube algorithm onto multicomputers with hypercube, mesh, or torus interconnection topology is proposed. This methodology is suitable when the initial problem can be expressed as a set of processes that communicate through a hypercube topology (a CC-cube algorithm). There are many important algorithms that fit into the CC-cube type. CALMANT is based on three different techniques: (a) the standard embedding to assign the processes of the algorithm to the nodes of the mesh multicomputer; (b) the communication pipelining technique to increase the level of communication parallelism inherent in the CC-cube algorithms; and (c) optimal message-scheduling algorithms proposed in this work in order to avoid conflicts and minimizing in this way the communication time. Although CALMANT is proposed for multicomputers with different interconnection network topologies, the paper only focuses on the particular case of meshes.Peer ReviewedPostprint (published version

    Deformable Simplicial Complexes

    Get PDF
    In this dissertation we present a novel method for deformable interface tracking in 2D and 3D|deformable simplicial complexes (DSC). Deformable interfaces are used in several applications, such as fluid simulation, image analysis, reconstruction or structural optimization. In the DSC method, the interface (curve in 2D; surface in 3D) is represented explicitly as a piecewise linear curve or surface. However, the domain is also subject to discretization: triangulation in 2D; tetrahedralization in 3D. This way, the interface can be alternatively represented as a set of edges/triangles separating triangles/tetrahedra marked as outside from those marked as inside. Such an approach allows for robust topological adaptivity. Among other advantages of the deformable simplicial complexes there are: space adaptivity, ability to handle and preserve sharp features, possibility for topology control. We demonstrate those strengths in several applications. In particular, a novel, DSC-based fluid dynamics solver has been developed during the PhD project. A special feature of this solver is that due to the fact that DSC maintains an explicit interface representation, surface tension is more easily dealt with. One particular advantage of DSC is the fact that as an alternative to topology adaptivity, topology control is also possible. This is exploited in the construction of cut loci on tori where a front expands from a single point on a torus and stops when it self-intersects

    Efficient All-to-All Collective Communication Schedules for Direct-Connect Topologies

    Full text link
    The all-to-all collective communications primitive is widely used in machine learning (ML) and high performance computing (HPC) workloads, and optimizing its performance is of interest to both ML and HPC communities. All-to-all is a particularly challenging workload that can severely strain the underlying interconnect bandwidth at scale. This is mainly because of the quadratic scaling in the number of messages that must be simultaneously serviced combined with large message sizes. This paper takes a holistic approach to optimize the performance of all-to-all collective communications on supercomputer-scale direct-connect interconnects. We address several algorithmic and practical challenges in developing efficient and bandwidth-optimal all-to-all schedules for any topology, lowering the schedules to various backends and fabrics that may or may not expose additional forwarding bandwidth, establishing an upper bound on all-to-all throughput, and exploring novel topologies that deliver near-optimal all-to-all performance

    Topology Agnostic Methods for Routing, Reconfiguration and Virtualization of Interconnection Networks

    Get PDF
    Modern computing systems, such as supercomputers, data centers and multicore chips, generally require efficient communication between their different system units; tolerance towards component faults; flexibility to expand or merge; and a high utilization of their resources. Interconnection networks are used in a variety of such computing systems in order to enable communication between their diverse system units. Investigation and proposal of new or improved solutions to topology agnostic routing and reconfiguration of interconnection networks are main objectives of this thesis. In addition, topology agnostic routing and reconfiguration algorithms are utilized in the development of new and flexible approaches to processor allocation. The thesis aims to present versatile solutions that can be used for the interconnection networks of a number of different computing systems. No particular routing algorithm was specified for an interconnection network technology which is now incorporated in Dolphin Express. The thesis states a set of criteria for a suitable routing algorithm, evaluates a number of existing routing algorithms, and recommend that one of the algorithms – which fulfils all of the criteria – is used. Further investigations demonstrate how this routing algorithm inherently supports fault-tolerance, and how it can be optimized for some network topologies. These considerations are also relevant for the InfiniBand interconnection network technology. Reconfiguration of interconnection networks (change of routing function) is a deadlock prone process. Some existing reconfiguration strategies include deadlock avoidance mechanisms that significantly reduce the network service offered to running applications. The thesis expands the area of application for one of the most versatile and efficient reconfiguration algorithms available in the literature, and proposes an optimization of this algorithm that improves the network service offered to running applications. Moreover, a new reconfiguration algorithm is presented that supports a replacement of the routing function without causing performance penalties. Processor allocation strategies that guarantee traffic-containment commonly pose strict requirements on the shape of partitions, and thus achieve only a limited utilization of a system’s computing resources. The thesis introduces two new approaches that are more flexible. Both approaches utilize the properties of a topology agnostic routing algorithm in order to enforce traffic-containment within arbitrarily shaped partitions. Consequently, a high resource utilization as well as isolation of traffic between different partitions is achieved

    Scalable Realtime Rendering and Interaction with Digital Surface Models of Landscapes and Cities

    Get PDF
    Interactive, realistic rendering of landscapes and cities differs substantially from classical terrain rendering. Due to the sheer size and detail of the data which need to be processed, realtime rendering (i.e. more than 25 images per second) is only feasible with level of detail (LOD) models. Even the design and implementation of efficient, automatic LOD generation is ambitious for such out-of-core datasets considering the large number of scales that are covered in a single view and the necessity to maintain screen-space accuracy for realistic representation. Moreover, users want to interact with the model based on semantic information which needs to be linked to the LOD model. In this thesis I present LOD schemes for the efficient rendering of 2.5d digital surface models (DSMs) and 3d point-clouds, a method for the automatic derivation of city models from raw DSMs, and an approach allowing semantic interaction with complex LOD models. The hierarchical LOD model for digital surface models is based on a quadtree of precomputed, simplified triangle mesh approximations. The rendering of the proposed model is proved to allow real-time rendering of very large and complex models with pixel-accurate details. Moreover, the necessary preprocessing is scalable and fast. For 3d point clouds, I introduce an LOD scheme based on an octree of hybrid plane-polygon representations. For each LOD, the algorithm detects planar regions in an adequately subsampled point cloud and models them as textured rectangles. The rendering of the resulting hybrid model is an order of magnitude faster than comparable point-based LOD schemes. To automatically derive a city model from a DSM, I propose a constrained mesh simplification. Apart from the geometric distance between simplified and original model, it evaluates constraints based on detected planar structures and their mutual topological relations. The resulting models are much less complex than the original DSM but still represent the characteristic building structures faithfully. Finally, I present a method to combine semantic information with complex geometric models. My approach links the semantic entities to the geometric entities on-the-fly via coarser proxy geometries which carry the semantic information. Thus, semantic information can be layered on top of complex LOD models without an explicit attribution step. All findings are supported by experimental results which demonstrate the practical applicability and efficiency of the methods

    Man-made Surface Structures from Triangulated Point Clouds

    Get PDF
    Photogrammetry aims at reconstructing shape and dimensions of objects captured with cameras, 3D laser scanners or other spatial acquisition systems. While many acquisition techniques deliver triangulated point clouds with millions of vertices within seconds, the interpretation is usually left to the user. Especially when reconstructing man-made objects, one is interested in the underlying surface structure, which is not inherently present in the data. This includes the geometric shape of the object, e.g. cubical or cylindrical, as well as corresponding surface parameters, e.g. width, height and radius. Applications are manifold and range from industrial production control to architectural on-site measurements to large-scale city models. The goal of this thesis is to automatically derive such surface structures from triangulated 3D point clouds of man-made objects. They are defined as a compound of planar or curved geometric primitives. Model knowledge about typical primitives and relations between adjacent pairs of them should affect the reconstruction positively. After formulating a parametrized model for man-made surface structures, we develop a reconstruction framework with three processing steps: During a fast pre-segmentation exploiting local surface properties we divide the given surface mesh into planar regions. Making use of a model selection scheme based on minimizing the description length, this surface segmentation is free of control parameters and automatically yields an optimal number of segments. A subsequent refinement introduces a set of planar or curved geometric primitives and hierarchically merges adjacent regions based on their joint description length. A global classification and constraint parameter estimation combines the data-driven segmentation with high-level model knowledge. Therefore, we represent the surface structure with a graphical model and formulate factors based on likelihood as well as prior knowledge about parameter distributions and class probabilities. We infer the most probable setting of surface and relation classes with belief propagation and estimate an optimal surface parametrization with constraints induced by inter-regional relations. The process is specifically designed to work on noisy data with outliers and a few exceptional freeform regions not describable with geometric primitives. It yields full 3D surface structures with watertightly connected surface primitives of different types. The performance of the proposed framework is experimentally evaluated on various data sets. On small synthetically generated meshes we analyze the accuracy of the estimated surface parameters, the sensitivity w.r.t. various properties of the input data and w.r.t. model assumptions as well as the computational complexity. Additionally we demonstrate the flexibility w.r.t. different acquisition techniques on real data sets. The proposed method turns out to be accurate, reasonably fast and little sensitive to defects in the data or imprecise model assumptions.Künstliche Oberflächenstrukturen aus triangulierten Punktwolken Ein Ziel der Photogrammetrie ist die Rekonstruktion der Form und Größe von Objekten, die mit Kameras, 3D-Laserscannern und anderern räumlichen Erfassungssystemen aufgenommen wurden. Während viele Aufnahmetechniken innerhalb von Sekunden triangulierte Punktwolken mit Millionen von Punkten liefern, ist deren Interpretation gewöhnlicherweise dem Nutzer überlassen. Besonders bei der Rekonstruktion künstlicher Objekte (i.S.v. engl. man-made = „von Menschenhand gemacht“ ist man an der zugrunde liegenden Oberflächenstruktur interessiert, welche nicht inhärent in den Daten enthalten ist. Diese umfasst die geometrische Form des Objekts, z.B. quaderförmig oder zylindrisch, als auch die zugehörigen Oberflächenparameter, z.B. Breite, Höhe oder Radius. Die Anwendungen sind vielfältig und reichen von industriellen Fertigungskontrollen über architektonische Raumaufmaße bis hin zu großmaßstäbigen Stadtmodellen. Das Ziel dieser Arbeit ist es, solche Oberflächenstrukturen automatisch aus triangulierten Punktwolken von künstlichen Objekten abzuleiten. Sie sind definiert als ein Verbund ebener und gekrümmter geometrischer Primitive. Modellwissen über typische Primitive und Relationen zwischen Paaren von ihnen soll die Rekonstruktion positiv beeinflussen. Nachdem wir ein parametrisiertes Modell für künstliche Oberflächenstrukturen formuliert haben, entwickeln wir ein Rekonstruktionsverfahren mit drei Verarbeitungsschritten: Im Rahmen einer schnellen Vorsegmentierung, die lokale Oberflächeneigenschaften berücksichtigt, teilen wir die gegebene vermaschte Oberfläche in ebene Regionen. Unter Verwendung eines Schemas zur Modellauswahl, das auf der Minimierung der Beschreibungslänge beruht, ist diese Oberflächensegmentierung unabhängig von Kontrollparametern und liefert automatisch eine optimale Anzahl an Regionen. Eine anschließende Verbesserung führt eine Menge von ebenen und gekrümmten geometrischen Primitiven ein und fusioniert benachbarte Regionen hierarchisch basierend auf ihrer gemeinsamen Beschreibungslänge. Eine globale Klassifikation und bedingte Parameterschätzung verbindet die datengetriebene Segmentierung mit hochrangigem Modellwissen. Dazu stellen wir die Oberflächenstruktur in Form eines graphischen Modells dar und formulieren Faktoren basierend auf der Likelihood sowie auf apriori Wissen über die Parameterverteilungen und Klassenwahrscheinlichkeiten. Wir leiten die wahrscheinlichste Konfiguration von Flächen- und Relationsklassen mit Hilfe von Belief-Propagation ab und schätzen eine optimale Oberflächenparametrisierung mit Bedingungen, die durch die Relationen zwischen benachbarten Primitiven induziert werden. Der Prozess ist eigens für verrauschte Daten mit Ausreißern und wenigen Ausnahmeregionen konzipiert, die nicht durch geometrische Primitive beschreibbar sind. Er liefert wasserdichte 3D-Oberflächenstrukturen mit Oberflächenprimitiven verschiedener Art. Die Leistungsfähigkeit des vorgestellten Verfahrens wird an verschiedenen Datensätzen experimentell evaluiert. Auf kleinen, synthetisch generierten Oberflächen untersuchen wir die Genauigkeit der geschätzten Oberflächenparameter, die Sensitivität bzgl. verschiedener Eigenschaften der Eingangsdaten und bzgl. Modellannahmen sowie die Rechenkomplexität. Außerdem demonstrieren wir die Flexibilität bzgl. verschiedener Aufnahmetechniken anhand realer Datensätze. Das vorgestellte Rekonstruktionsverfahren erweist sich als genau, hinreichend schnell und wenig anfällig für Defekte in den Daten oder falsche Modellannahmen

    Progress Report : 1991 - 1994

    Get PDF

    Master of Science

    Get PDF
    thesisIntegrated circuits often consist of multiple processing elements that are regularly tiled across the two-dimensional surface of a die. This work presents the design and integration of high speed relative timed routers for asynchronous network-on-chip. It researches NoC's efficiency through simplicity by directly translating simple T-router, source-routing, single-flit packet to higher radix routers. This work is intended to study performance and power trade-offs adding higher radix routers, 3D topologies, Virtual Channels, Accurate NoC modeling, and Transmission line communication links. Routers with and without virtual channels are designed and integrated to arrayed communication networks. Furthermore, the work investigates 3D networks with diffusive RC wires and transmission lines on long wrap interconnects

    Assessing Contention Effects on MPI_Alltoall Communications

    Get PDF
    12 pagesInternational audienceOne of the most important collective communication patterns used in scientific applications is the complete exchange, also called All-to-All. Although efficient algorithms have been studied for specific networks, general solutions like those available in well-known MPI distributions (e.g. the MPI_Alltoall operation) are strongly influenced by the congestion of network resources. In this paper we present an integrated approach to model the performance of the All-to-All collective operation, which consists in identifying a contention signature that characterizes a given network environment, using it to augment a contention-free communication model. This approach, assessed by experimental results, allows an accurate prediction of the performance of the All-to-All operation over different network architectures with a small overhead

    Efficient bypass mechanisms for low latency networks on-chip

    Get PDF
    RESUMEN: La importancia de las redes en-chip en los procesadores multi-núcleo es cada vez mayor. Los routers con baipás son una solución eficiente para reducir la latencia de estas redes. Existen dos tipos de redes con baipás: single-hop y multi-hop. Las redes con baipás single-hop minimizan la latencia individual de cada router al asignar los recursos del router con antelación a la recepción de los paquetes. Las redes con baipás multi-hop, conocidas como SMART, permiten que los paquetes atraviesen múltiples routers en un único ciclo. La primera propuesta de esta tesis es Non-Empty Buffer Bypass (NEBB), un mecanismo que incrementa la utilización del baipás de tipo single-hop, eliminando la necesidad de usar canales virtuales. Para redes con baipás multi-hop propone SMART++ y S-SMART++. SMART++ elimina la necesidad de SMART de usar una gran cantidad de canales virtuales para aprovechar el ancho de banda de la red, permitiendo el diseño de configuraciones de bajo coste. S-SMART++ hace uso de la asignación de recursos de forma especulativa para preparar el baipás de tipo multi-hop. Este mecanismo reduce la latencia y su dependencia con la longitud máxima de los saltos de tipo multi-hop, aspecto clave para su viabilidad en diseños reales. La contribución final es un conjunto de herramientas de código abierto llamada Bypass Simulation Toolset (BST) compuesto por versiones extendidas de BookSim y OpenSMART, una API para integrar BookSim en otros simuladores y una serie de scripts para facilitar el diseño y evaluación de este tipo de redes.ABSTRACT: Networks on-Chip (NoCs) are becoming more important in many-core processors as the number of cores grows. Bypass routers are an efficient solution that skips pipeline stages. There are two types of bypass mechanisms: single-hop and multi-hop bypass. Single-hop bypass minimizes the router delay by skipping allocation stages in each hop. Multi-hop bypass, called SMART, minimizes the effective number of hops by traversing multiple routers in a single cycle. The first proposal of this dissertation is Non-Empty Buffer Bypass (NEBB) for single-hop bypass, which increases the bypass utilization without requiring VCs to match traditional bypass routers. It proposes SMART++ and S-SMART++ for multi-hop bypass. SMART++ removes the requirement of using multiple VCs of SMART to exploit the bandwidth of the network, enabling low-cost configurations. S-SMART++ relies on speculative allocation to set up multi-hop bypass paths. Thus, it reduces latency and its dependency with the maximum length of multi-hops, relaxing the requirements to integrate multi-hop bypass in real designs. The final contribution is an open-source set of tools to simulate bypass NoCs called Bypass Simulation Toolset (BST) conformed by extended versions of BookSim and OpenSMART, an API to integrate BookSim in other simulators, and scripts to simplify the designing and evaluation of such NoCs.This work was supported by the Spanish Ministry of Science, Innovation and Universities, FPI grant BES-2017-079971, and contracts TIN2010-21291-C02-02, TIN2013- 46957-C2-2-P, TIN2015-65316-P, TIN2016-76635-C2-2-R (AEI/FEDER, UE) and TIC PID2019-105660RB-C22; the European HiPEAC Network of Excellence; the European Community's Seventh Framework Programme (FP7/2007-2013), under the Mont-Blanc 1 and 2 projects (grant agreements n 288777 and 610402); the European Union's Horizon 2020 research and innovation programme under the Mont-Blanc 3 project (grant agreement nº 671697). Bluespec Inc. provided access to Bluespec tools
    corecore