446 research outputs found

    Guided rewriting and constraint satisfaction for parallel GPU code generation

    Get PDF
    Graphics Processing Units (GPUs) are notoriously hard to optimise for manually due to their scheduling and memory hierarchies. What is needed are good automatic code generators and optimisers for such parallel hardware. Functional approaches such as Accelerate, Futhark and LIFT leverage a high-level algorithmic Intermediate Representation (IR) to expose parallelism and abstract the implementation details away from the user. However, producing efficient code for a given accelerator remains challenging. Existing code generators depend on the user input to choose a subset of hard-coded optimizations or automated exploration of implementation search space. The former suffers from the lack of extensibility, while the latter is too costly due to the size of the search space. A hybrid approach is needed, where a space of valid implementations is built automatically and explored with the aid of human expertise. This thesis presents a solution combining user-guided rewriting and automatically generated constraints to produce high-performance code. The first contribution is an automatic tuning technique to find a balance between performance and memory consumption. Leveraging its functional patterns, the LIFT compiler is empowered to infer tuning constraints and limit the search to valid tuning combinations only. Next, the thesis reframes parallelisation as a constraint satisfaction problem. Parallelisation constraints are extracted automatically from the input expression, and a solver is used to identify valid rewriting. The constraints truncate the search space to valid parallel mappings only by capturing the scheduling restrictions of the GPU in the context of a given program. A synchronisation barrier insertion technique is proposed to prevent data races and improve the efficiency of the generated parallel mappings. The final contribution of this thesis is the guided rewriting method, where the user encodes a design space of structural transformations using high-level IR nodes called rewrite points. These strongly typed pragmas express macro rewrites and expose design choices as explorable parameters. The thesis proposes a small set of reusable rewrite points to achieve tiling, cache locality, data reuse and memory optimisation. A comparison with the vendor-provided handwritten kernel ARM Compute Library and the TVM code generator demonstrates the effectiveness of this thesis' contributions. With convolution as a use case, LIFT-generated direct and GEMM-based convolution implementations are shown to perform on par with the state-of-the-art solutions on a mobile GPU. Overall, this thesis demonstrates that a functional IR yields well to user-guided and automatic rewriting for high-performance code generation

    ACiS: smart switches with application-level acceleration

    Full text link
    Network performance has contributed fundamentally to the growth of supercomputing over the past decades. In parallel, High Performance Computing (HPC) peak performance has depended, first, on ever faster/denser CPUs, and then, just on increasing density alone. As operating frequency, and now feature size, have levelled off, two new approaches are becoming central to achieving higher net performance: configurability and integration. Configurability enables hardware to map to the application, as well as vice versa. Integration enables system components that have generally been single function-e.g., a network to transport data—to have additional functionality, e.g., also to operate on that data. More generally, integration enables compute-everywhere: not just in CPU and accelerator, but also in network and, more specifically, the communication switches. In this thesis, we propose four novel methods of enhancing HPC performance through Advanced Computing in the Switch (ACiS). More specifically, we propose various flexible and application-aware accelerators that can be embedded into or attached to existing communication switches to improve the performance and scalability of HPC and Machine Learning (ML) applications. We follow a modular design discipline through introducing composable plugins to successively add ACiS capabilities. In the first work, we propose an inline accelerator to communication switches for user-definable collective operations. MPI collective operations can often be performance killers in HPC applications; we seek to solve this bottleneck by offloading them to reconfigurable hardware within the switch itself. We also introduce a novel mechanism that enables the hardware to support MPI communicators of arbitrary shape and that is scalable to very large systems. In the second work, we propose a look-aside accelerator for communication switches that is capable of processing packets at line-rate. Functions requiring loops and states are addressed in this method. The proposed in-switch accelerator is based on a RISC-V compatible Coarse Grained Reconfigurable Arrays (CGRAs). To facilitate usability, we have developed a framework to compile user-provided C/C++ codes to appropriate back-end instructions for configuring the accelerator. In the third work, we extend ACiS to support fused collectives and the combining of collectives with map operations. We observe that there is an opportunity of fusing communication (collectives) with computation. Since the computation can vary for different applications, ACiS support should be programmable in this method. In the fourth work, we propose that switches with ACiS support can control and manage the execution of applications, i.e., that the switch be an active device with decision-making capabilities. Switches have a central view of the network; they can collect telemetry information and monitor application behavior and then use this information for control, decision-making, and coordination of nodes. We evaluate the feasibility of ACiS through extensive RTL-based simulation as well as deployment in an open-access cloud infrastructure. Using this simulation framework, when considering a Graph Convolutional Network (GCN) application as a case study, a speedup of on average 3.4x across five real-world datasets is achieved on 24 nodes compared to a CPU cluster without ACiS capabilities

    Heterogeneous Acceleration for 5G New Radio Channel Modelling Using FPGAs and GPUs

    Get PDF
    L'abstract è presente nell'allegato / the abstract is in the attachmen

    SoC-based FPGA architecture for image analysis and other highly demanding applications

    Get PDF
    Al giorno d'oggi, lo sviluppo di algoritmi si concentra su calcoli efficienti in termini di prestazioni ed efficienza energetica. Tecnologie come il field programmable gate array (FPGA) e il system on chip (SoC) basato su FPGA (FPGA/SoC) hanno dimostrato la loro capacità di accelerare applicazioni di calcolo intensive risparmiando al contempo il consumo energetico, grazie alla loro capacità di elevato parallelismo e riconfigurazione dell'architettura. Attualmente, i cicli di progettazione esistenti per FPGA/SoC sono lunghi, a causa della complessità dell'architettura. Pertanto, per colmare il divario tra le applicazioni e le architetture FPGA/SoC e ottenere un design hardware efficiente per l'analisi delle immagini e altri applicazioni altamente demandanti utilizzando lo strumento di sintesi di alto livello, vengono prese in considerazione due strategie complementari: tecniche ad hoc e stima delle prestazioni. Per quanto riguarda le tecniche ad-hoc, tre applicazioni molto impegnative sono state accelerate attraverso gli strumenti HLS: discriminatore di forme di impulso per i raggi cosmici, classificazione automatica degli insetti e re-ranking per il recupero delle informazioni, sottolineando i vantaggi quando questo tipo di applicazioni viene attraversato da tecniche di compressione durante il targeting dispositivi FPGA/SoC. Inoltre, in questa tesi viene proposto uno stimatore delle prestazioni per l'accelerazione hardware per prevedere efficacemente l'utilizzo delle risorse e la latenza per FPGA/SoC, costruendo un ponte tra l'applicazione e i domini architetturali. Lo strumento integra modelli analitici per la previsione delle prestazioni e un motore design space explorer (DSE) per fornire approfondimenti di alto livello agli sviluppatori di hardware, composto da due motori indipendenti: DSE basato sull'ottimizzazione a singolo obiettivo e DSE basato sull'ottimizzazione evolutiva multiobiettivo.Nowadays, the development of algorithms focuses on performance-efficient and energy-efficient computations. Technologies such as field programmable gate array (FPGA) and system on chip (SoC) based on FPGA (FPGA/SoC) have shown their ability to accelerate intensive computing applications while saving power consumption, owing to their capability of high parallelism and reconfiguration of the architecture. Currently, the existing design cycles for FPGA/SoC are time-consuming, owing to the complexity of the architecture. Therefore, to address the gap between applications and FPGA/SoC architectures and to obtain an efficient hardware design for image analysis and highly demanding applications using the high-level synthesis tool, two complementary strategies are considered: ad-hoc techniques and performance estimator. Regarding ad-hoc techniques, three highly demanding applications were accelerated through HLS tools: pulse shape discriminator for cosmic rays, automatic pest classification, and re-ranking for information retrieval, emphasizing the benefits when this type of applications are traversed by compression techniques when targeting FPGA/SoC devices. Furthermore, a comprehensive performance estimator for hardware acceleration is proposed in this thesis to effectively predict the resource utilization and latency for FPGA/SoC, building a bridge between the application and architectural domains. The tool integrates analytical models for performance prediction, and a design space explorer (DSE) engine for providing high-level insights to hardware developers, composed of two independent sub-engines: DSE based on single-objective optimization and DSE based on evolutionary multi-objective optimization

    Efficient Hardware Implementation of Deep Learning Networks Based on the Convolutional Neural Network

    Get PDF
    Image classification, speech processing, autonomous driving, and medical diagnosis have made the adoption of Deep Neural Networks (DNN) mainstream. Many deep networks such as AlexNet, GoogleNet, ResidualNet, MobileNet, YOLOv3 and Transformers have achieved immense success and popularity. However, implementing these deep and complex networks in hardware is a challenging feat. The growing demand of DNN applications in mobile devices and data centers have led the researchers to explore application specific hardware accelerators for DNNs. There have been numerous hardware and software based solutions to improve DNN throughput, latency, performance and accuracy. Any solution for hardware acceleration needs to optimize in a space confined by these metrics. Hardware acceleration of Deep Neural Networks (DNN) is a highly effective and viable solution for running them on mobile devices. The power of DNN is now available at the edge in a compact and power-efficient form factor because of hardware acceleration. In this thesis, we introduce a novel architecture that uses a generalized method called Single Input Partial Product 2-Dimensional Convolution (SIPP2D Convolution) which calculates a 2-D convolution in a fast and expedient manner. We present the exploration designs that have culminated into SIPP2D and emphasize its benefits. SIPP2D architecture prevents the re-fetching of input weights for the calculation of partial products. It can calculate the output of any input size and kernel size with a low memory-traffic while maintaining a low latency and high throughput compared to other popular techniques. In addition to being compatible with any input and kernel size, SIPP2D architecture can be modified to support any allowable stride. We describe the data flow and algorithmic modifications to SIPP2D which extends its capabilities to accommodate multi-stride convolutions. Supporting multi-stride convolutions is an essential feature addition to SIPP2D architecture, increasing its versatility and network agnostic character for convolutional type DNNs. Along with architectural explorations, we have also performed research in the area of model optimization. It is widely understood that any change on the algorithmic level of the network pays significant dividends at the hardware level. Compression and optimization techniques such as pruning and quantization help reduce the size of the model while maintaining the accuracy at an acceptable level. Thus, by combining techniques such as channel pruning with SIPP2D we can only boost its performance. In this thesis, we examine the performance of channel pruned SIPP2D compared to other compressed models. Traditionally, quantization of weights and inputs are used to reduce the memory transfer and power consumption. However, quantizing the outputs of layers can be a challenge since the output of each layer changes with the input. In our research, we use quantization on the output of each layer for AlexNet and VGGNet-16 to analyze the effect it has on accuracy. We use Signal to Noise Quantization Ratio (SQNR) to empirically determine the integer length (IL) as well as the fractional length (FL) for the fixed point precision that can yields the lowest SQNR and highest accuracy. Based on our observations, we can report that accuracy is sensitive to fractional length as well as integer length. For AlexNet, we observe deterioration in accuracy as the word length decreases. The Top -5 accuracy goes from 77% for floating point precision to 56% for a WL of 12 and FL of 8. The results are similar in the case of VGGNet-16. The Top-5 accuracy for VGGNet-16 decreases from 82% for floating point to 30% for a WL of 12 and FL of 8. In addition to the small word length, we observe the accuracy to be highly dependent on the integer length as well as the fractional length. We have also done analysis on the loss after retraining post quantization. We use polynomial fitting to achieve a relationship with fractional length and the drop in accuracy still sustained after retraining a quantized network. In summary, the winning combination of the enhanced SIPP2D architecture and compression techniques such as channel pruning and quantization techniques is highly advantageous and conducive to widespread adoption. SIPP2D architecture, with its flexible data flow and algorithmic modifications to support multi-stride convolutions, offers a powerful and versatile framework for deep neural networks

    Optimization of scientific algorithms in heterogeneous systems and accelerators for high performance computing

    Get PDF
    Actualmente, la computación de propósito general en GPU es uno de los pilares básicos de la computación de alto rendimiento. Aunque existen cientos de aplicaciones aceleradas en GPU, aún hay algoritmos científicos poco estudiados. Por ello, la motivación de esta tesis ha sido investigar la posibilidad de acelerar significativamente en GPU un conjunto de algoritmos pertenecientes a este grupo. En primer lugar, se ha obtenido una implementación optimizada del algoritmo de compresión de vídeo e imagen CAVLC (Context-Adaptive Variable Length Encoding), que es el método entrópico más usado en el estándar de codificación de vídeo H.264. La aceleración respecto a la mejor implementación anterior está entre 2.5x y 5.4x. Esta solución puede aprovecharse como el componente entrópico de codificadores H.264 software, y utilizarse en sistemas de compresión de vídeo e imagen en formatos distintos a H.264, como imágenes médicas. En segundo lugar, se ha desarrollado GUD-Canny, un detector de bordes de Canny no supervisado y distribuido. El sistema resuelve las principales limitaciones de las implementaciones del algoritmo de Canny, que son el cuello de botella causado por el proceso de histéresis y el uso de umbrales de histéresis fijos. Dada una imagen, esta se divide en un conjunto de sub-imágenes, y, para cada una de ellas, se calcula de forma no supervisada un par de umbrales de histéresis utilizando el método de MedinaCarnicer. El detector satisface el requisito de tiempo real, al ser 0.35 ms el tiempo promedio en detectar los bordes de una imagen 512x512. En tercer lugar, se ha realizado una implementación optimizada del método de compresión de datos VLE (Variable-Length Encoding), que es 2.6x más rápida en promedio que la mejor implementación anterior. Además, esta solución incluye un nuevo método scan inter-bloque, que se puede usar para acelerar la propia operación scan y otros algoritmos, como el de compactación. En el caso de la operación scan, se logra una aceleración de 1.62x si se usa el método propuesto en lugar del utilizado en la mejor implementación anterior de VLE. Esta tesis doctoral concluye con un capítulo sobre futuros trabajos de investigación que se pueden plantear a partir de sus contribuciones

    A multi-level functional IR with rewrites for higher-level synthesis of accelerators

    Get PDF
    Specialised accelerators deliver orders of magnitude higher energy-efficiency than general-purpose processors. Field Programmable Gate Arrays (FPGAs) have become the substrate of choice, because the ever-changing nature of modern workloads, such as machine learning, demands reconfigurability. However, they are notoriously hard to program directly using Hardware Description Languages (HDLs). Traditional High-Level Synthesis (HLS) tools improve productivity, but come with their own problems. They often produce sub-optimal designs and programmers are still required to write hardware-specific code, thus development cycles remain long. This thesis proposes Shir, a higher-level synthesis approach for high-performance accelerator design with a hardware-agnostic programming entry point, a multi-level Intermediate Representation (IR), a compiler and rewrite rules for optimisation. First, a novel, multi-level functional IR structure for accelerator design is described. The IRs operate on different levels of abstraction, cleanly separating different hardware concerns. They enable the expression of different forms of parallelism and standard memory features, such as asynchronous off-chip memories or synchronous on-chip buffers, as well as arbitration of such shared resources. Exposing these features at the IR level is essential for achieving high performance. Next, mechanical lowering procedures are introduced to automatically compile a program specification through Shir’s functional IRs until low-level HDL code for FPGA synthesis is emitted. Each lowering step gradually adds implementation details. Finally, this thesis presents rewrite rules for automatic optimisations around parallelisation, buffering and data reshaping. Reshaping operations pose a challenge to functional approaches in particular. They introduce overheads that compromise performance or even prevent the generation of synthesisable hardware designs altogether. This fundamental issue is solved by the application of rewrite rules. The viability of this approach is demonstrated by running matrix multiplication and 2D convolution on an Intel Arria 10 FPGA. A limited design space exploration is conducted, confirming the ability of the IR to exploit various hardware features. Using rewrite rules for optimisation, it is possible to generate high-performance designs that are competitive with highly tuned OpenCL implementations and that outperform hardware-agnostic OpenCL code. The performance impact of the optimisations is further evaluated showing that they are essential to achieving high performance, and in many cases also necessary to produce hardware that fits the resource constraints

    Multiscale visualization approaches for Volunteered Geographic Information and Location-based Social Media

    Get PDF
    Today, “zoomable” maps are a state-of-the-art way to explore the world, available to anyone with Internet access. However, the process of creating this visualization has been rather loosely investigated and documented. Nevertheless, with an increasing amount of available data, interactive maps have become a more integral approach to visualizing and exploring big datasets and user-generated data. OpenStreetMap and online platforms such as Twitter and Flickr offer application programming interfaces (APIs) with geographic information. They are well-known examples of this visualization challenge and are often used as examples. In addition, an increasing number of public administrations collect open data and publish their data sets, which makes the task of visualization even more relevant. This dissertation deals with the visualization of user-generated geodata as a multiscale map. The basics of today’s multiscale maps—their history, technologies, and possibilities—are explored and abstracted. This work introduces two new multiscale-focused visualization approaches for point data from volunteered geographic information (VGI) and location-based social media (LBSM). One contribution of this effort is a visualization methodology for spatially referenced information in the form of point geometries, using nominally scaled data from social media such as Twitter or Flickr. Typical for this data is a high number of social media posts in different categories—a post on social media corresponds to a point in a specific category. Due to the sheer quantity and similar characteristics, the posts appear generic rather than unique. This type of dataset can be explored using the new method of micro diagrams to visualize the dataset on multiple scales and resolutions. The data is aggregated into small grid cells, and the numerical proportion is shown with small diagrams, which can visually merge into heterogenous areas through colors depicting a specific category. The diagram sizes allow the user to estimate the overall number of aggregated points in a grid cell. A different visualization approach is proposed for more unique points, considered points of interest (POI), based on the selection method. The goal is to identify more locally relevant points from the data set, considered more important compared to other points in the neighborhood, which are then compared by numerical attribute. The method, derived from topographic isolation and called discrete isolation, is the distance from one point to the next with a higher attribute value. By using this measure, the most essential points can be easily selected by choosing a minimum distance and producing a homogenous spatial of the selected points within the chosen dataset. The two newly developed approaches are applied to multiscale mapping by constructing example workflows that produce multiscale maps. The publicly available multiscale mapping workflows OpenMapTiles and OpenStreetMap Carto, using OpenStreetMap data, are systematically explored and analyzed. The result is a general workflow for multiscale map production and a short overview of the toolchain software. In particular, the generalization approaches in the example projects are discussed and these are classified into cartographic theories on the basis of literature. The workflow is demonstrated by building a raster tile service for the micro diagrams and a vector tile service for the discrete isolation, able to be used with just a web browser. In conclusion, these new approaches for point data using VGI and LBSM allow better qualitative visualization of geodata. While analyzing vast global datasets is challenging, exploring and analyzing hidden data patterns is fruitful. Creating this degree of visualization and producing maps on multiple scales is a complicated task. The workflows and tools provided in this thesis will make map production on a worldwide scale easier.:1 Introduction 1 1.1 Motivation .................................................................................................. 3 1.2 Visualization of crowdsourced geodata on multiple scales ............ 5 1.2.1 Research objective 1: Visualization of point collections ......... 6 1.2.2 Research objective 2: Visualization of points of interest ......... 7 1.2.3 Research objective 3: Production of multiscale maps ............. 7 1.3 Reader’s guide ......................................................................................... 9 1.3.1 Structure ........................................................................................... 9 1.3.2 Related Publications ....................................................................... 9 1.3.3 Formatting and layout ................................................................. 10 1.3.4 Online examples ........................................................................... 10 2 Foundations of crowdsourced mapping on multiple scales 11 2.1 Types and properties of crowdsourced data .................................. 11 2.2 Currents trends in cartography ......................................................... 11 2.3 Definitions .............................................................................................. 12 2.3.1 VGI .................................................................................................. 12 2.3.2 LBSM .............................................................................................. 13 2.3.3 Space, place, and location......................................................... 13 2.4 Visualization approaches for crowdsourced geodata ................... 14 2.4.1 Review of publications and visualization approaches ........... 14 2.4.2 Conclusions from the review ...................................................... 15 2.4.3 Challenges mapping crowdsourced data ................................ 17 2.5 Technologies for serving multiscale maps ...................................... 17 2.5.1 Research about multiscale maps .............................................. 17 2.5.2 Web Mercator projection ............................................................ 18 2.5.3 Tiles and zoom levels .................................................................. 19 2.5.4 Raster tiles ..................................................................................... 21 2.5.5 Vector tiles .................................................................................... 23 2.5.6 Tiling as a principle ..................................................................... 25 3 Point collection visualization with categorized attributes 26 3.1 Target users and possible tasks ....................................................... 26 3.2 Example data ......................................................................................... 27 3.3 Visualization approaches .................................................................... 28 3.3.1 Common techniques .................................................................... 28 3.3.2 The micro diagram approach .................................................... 30 3.4 The micro diagram and its parameters ............................................ 33 3.4.1 Aggregating points into a regular structure ............................ 33 3.4.2 Visualizing the number of data points ...................................... 35 3.4.3 Grid and micro diagrams ............................................................ 36 3.4.4 Visualizing numerical proportions with diagrams .................. 37 3.4.5 Influence of color and color brightness ................................... 38 3.4.6 Interaction options with micro diagrams .................................. 39 3.5 Application and user-based evaluation ............................................ 39 3.5.1 Micro diagrams in a multiscale environment ........................... 39 3.5.2 The micro diagram user study ................................................... 41 3.5.3 Point collection visualization discussion .................................. 47 4 Selection of POIs for visualization 50 4.1 Approaches for point selection .......................................................... 50 4.2 Methods for point selection ................................................................ 51 4.2.1 Label grid approach .................................................................... 52 4.2.2 Functional importance approach .............................................. 53 4.2.3 Discrete isolation approach ....................................................... 54 4.3 Functional evaluation of selection methods .................................... 56 4.3.1 Runtime comparison .................................................................... 56 4.3.2 Use cases for discrete isolation ................................................ 57 4.4 Discussion of the selection approaches .......................................... 61 4.4.1 A critical view of the use cases ................................................. 61 4.4.2 Comparing the approaches ........................................................ 62 4.4.3 Conclusion ..................................................................................... 64 5 Creating multiscale maps 65 5.1 Examples of multiscale map production .......................................... 65 5.1.1 OpenStreetMap Infrastructure ................................................... 66 5.1.2 OpenStreetMap Carto ................................................................. 67 5.1.3 OpenMapTiles ............................................................................... 73 5.2 Methods of multiscale map production ............................................ 80 5.2.1 OpenStreetMap tools ................................................................... 80 5.2.2 Geoprocessing .............................................................................. 80 5.2.3 Database ........................................................................................ 80 5.2.4 Creating tiles ................................................................................. 82 5.2.5 Caching .......................................................................................... 82 5.2.6 Styling tiles .................................................................................... 82 5.2.7 Viewing tiles ................................................................................... 83 5.2.8 The stackless approach to tile creation ................................... 83 5.3 Example workflows for creating multiscale maps ........................... 84 5.3.1 Raster tiles: OGC services and micro diagrams .................... 84 5.3.2 Vector tiles: Slippy map and vector tiles ................................. 87 5.4 Discussion of approaches and workflows ....................................... 90 5.4.1 Map production as a rendering pipeline .................................. 90 5.4.2 Comparison of OpenStreetMap Carto and OpenMapTiles .. 92 5.4.3 Discussion of the implementations ........................................... 93 5.4.4 Generalization in map production workflows .......................... 95 5.4.5 Conclusions ................................................................................. 101 6 Discussion 103 6.1 Development for web mapping ........................................................ 103 6.1.1 The role of standards in map production .............................. 103 6.1.2 Technological development ..................................................... 103 6.2 New data, new mapping techniques? ............................................. 104 7 Conclusion 106 7.1 Visualization of point collections ..................................................... 106 7.2 Visualization of points of interest ................................................... 107 7.3 Production of multiscale maps ........................................................ 107 7.4 Synthesis of the research questions .............................................. 108 7.5 Contributions ....................................................................................... 109 7.6 Limitations ............................................................................................ 110 7.7 Outlook ................................................................................................. 111 8 References 113 9 Appendix 130 9.1 Zoom levels and Scale ...................................................................... 130 9.3 Full information about selected UGC papers ................................ 131 9.4 Timeline of mapping technologies .................................................. 133 9.5 Timeline of map providers ................................................................ 133 9.6 Code snippets from own map production workflows .................. 134 9.6.1 Vector tiles workflow ................................................................. 134 9.6.2 Raster tiles workflow.................................................................. 137Heute sind zoombare Karten Alltag für jeden Internetznutzer. Die Erstellung interaktiv zoombarer Karten ist allerdings wenig erforscht, was einen deutlichen Gegensatz zu ihrer aktuellen Bedeutung und Nutzungshäufigkeit darstellt. Die Forschung in diesem Bereich ist also umso notwendiger. Steigende Datenmengen und größere Regionen, die von Karten abgedeckt werden sollen, unterstreichen den Forschungsbedarf umso mehr. Beispiele für stetig wachsende Datenmengen sind Geodatenquellen wie OpenStreetMap aber auch freie amtliche Geodatensätze (OpenData), aber auch die zunehmende Zahl georeferenzierter Inhalte auf Internetplatformen wie Twitter oder Flickr zu nennen. Das Thema dieser Arbeit ist die Visualisierung eben dieser nutzergenerierten Geodaten mittels zoombarer Karten. Dafür wird die Entwicklung der zugrundeliegenden Technologien über die letzten zwei Jahr-zehnte und die damit verbundene Möglichkeiten vorgestellt. Weitere Beiträge sind zwei neue Visualisierungsmethoden, die sich besonders für die Darstellung von Punktdaten aus raumbezogenen nutzergenerierten Daten und georeferenzierte Daten aus Sozialen Netzwerken eignen. Ein Beitrag dieser Arbeit ist eine neue Visualisierungsmethode für raumbezogene Informationen in Form von Punktgeometrien mit nominal skalierten Daten aus Sozialen Medien, wie beispielsweise Twitter oder Flickr. Typisch für diese Daten ist eine hohe Anzahl von Beiträgen mit unterschiedlichen Kategorien. Wobei die Beiträge, bedingt durch ihre schiere Menge und ähnlicher Ei-genschaften, eher generisch als einzigartig sind. Ein Beitrag in den So-zia len Medien entspricht dabei einem Punkt mit einer bestimmten Katego-rie. Ein solcher Datensatz kann mit der neuen Methode der „micro diagrams“ in verschiedenen Maßstäben und Auflösungen visualisiert und analysiert werden. Dazu werden die Daten in kleine Gitterzellen aggregiert. Die Menge und Verteilung der über die Kategorien aggregierten Punkte wird durch kleine Diagramme dargestellt, wobei die Farben die verschiedenen Kategorien visualisieren. Durch die geringere Größe der einzelnen Diagramme verschmelzen die kleinen Diagramme visuell, je nach der Verteilung der Farben für die Kategorien. Bei genauerem Hinsehen ist die Schätzung der Menge der aggregierten Punkte über die Größe der Diagramme die Menge und die Verteilung über die Kategorien möglich. Für einzigartigere Punkte, die als Points of Interest (POI) angesehen werden, wird ein anderer Visualisierungsansatz vorgeschlagen, der auf einer Auswahlmethode basiert. Ziel ist es dabei lokal relevantere Punkte aus dem Datensatz zu identifizieren, die im Vergleich zu anderen Punkten in der Nachbarschaft des Punktes verglichen nach einem numerischen Attribut wichtiger sind. Die Methode ist von dem geographischen Prinzip der Dominanz von Bergen abgeleitet und wird „discrete isolation“ genannt. Es handelt sich dabei um die Distanz von einem Punkt zum nächsten mit einem höheren Attributwert. Durch die Verwendung dieses Maßes können lokal bedeutende Punkte leicht ausgewählt werden, indem ein minimaler Abstand gewählt und so räumlich gleichmäßig verteilte Punkte aus dem Datensatz ausgewählt werden. Die beiden neu vorgestellten Methoden werden in den Kontext der zoombaren Karten gestellt, indem exemplarische Arbeitsabläufe erstellt werden, die als Er-gebnis eine zoombare Karte liefern. Dazu werden die frei verfügbaren Beispiele zur Herstellung von weltweiten zoombaren Karten mit nutzergenerierten Geo-daten von OpenStreetMap, anhand der Kartenprojekte OpenMapTiles und O-penStreetMap Carto analysiert und in Arbeitsschritte gegliedert. Das Ergebnis ist ein wiederverwendbarer Arbeitsablauf zur Herstellung zoombarer Karten, ergänzt durch eine Auswahl von passender Software für die einzelnen Arbeits-schritte. Dabei wird insbesondere auf die Generalisierungsansätze in den Beispielprojekten eingegangen und diese anhand von Literatur in die kartographische Theorie eingeordnet. Zur Demonstration des Workflows wird je ein Raster Tiles Dienst für die „micro diagrams“ und ein Vektor Tiles Dienst für die „discrete isolation“ erstellt. Beide Dienste lassen sich mit einem aktuellen Webbrowser nutzen. Zusammenfassend ermöglichen diese neuen Visualisierungsansätze für Punkt-daten aus VGI und LBSM eine bessere qualitative Visualisierung der neuen Geodaten. Die Analyse riesiger globaler Datensätze ist immer noch eine Herausforderung, aber die Erforschung und Analyse verborgener Muster in den Daten ist lohnend. Die Erstellung solcher Visualisierungen und die Produktion von Karten in verschiedenen Maßstäben ist eine komplexe Aufgabe. Die in dieser Arbeit vorgestellten Arbeitsabläufe und Werkzeuge erleichtern die Erstellung von Karten in globalem Maßstab.:1 Introduction 1 1.1 Motivation .................................................................................................. 3 1.2 Visualization of crowdsourced geodata on multiple scales ............ 5 1.2.1 Research objective 1: Visualization of point collections ......... 6 1.2.2 Research objective 2: Visualization of points of interest ......... 7 1.2.3 Research objective 3: Production of multiscale maps ............. 7 1.3 Reader’s guide ......................................................................................... 9 1.3.1 Structure ........................................................................................... 9 1.3.2 Related Publications ....................................................................... 9 1.3.3 Formatting and layout ................................................................. 10 1.3.4 Online examples ........................................................................... 10 2 Foundations of crowdsourced mapping on multiple scales 11 2.1 Types and properties of crowdsourced data .................................. 11 2.2 Currents trends in cartography ......................................................... 11 2.3 Definitions .............................................................................................. 12 2.3.1 VGI .................................................................................................. 12 2.3.2 LBSM .............................................................................................. 13 2.3.3 Space, place, and location......................................................... 13 2.4 Visualization approaches for crowdsourced geodata ................... 14 2.4.1 Review of publications and visualization approaches ........... 14 2.4.2 Conclusions from the review ...................................................... 15 2.4.3 Challenges mapping crowdsourced data ................................ 17 2.5 Technologies for serving multiscale maps ...................................... 17 2.5.1 Research about multiscale maps .............................................. 17 2.5.2 Web Mercator projection ............................................................ 18 2.5.3 Tiles and zoom levels .................................................................. 19 2.5.4 Raster tiles ..................................................................................... 21 2.5.5 Vector tiles .................................................................................... 23 2.5.6 Tiling as a principle ..................................................................... 25 3 Point collection visualization with categorized attributes 26 3.1 Target users and possible tasks ....................................................... 26 3.2 Example data ......................................................................................... 27 3.3 Visualization approaches .................................................................... 28 3.3.1 Common techniques .................................................................... 28 3.3.2 The micro diagram approach .................................................... 30 3.4 The micro diagram and its parameters ............................................ 33 3.4.1 Aggregating points into a regular structure ............................ 33 3.4.2 Visualizing the number of data points ...................................... 35 3.4.3 Grid and micro diagrams ............................................................ 36 3.4.4 Visualizing numerical proportions with diagrams .................. 37 3.4.5 Influence of color and color brightness ................................... 38 3.4.6 Interaction options with micro diagrams .................................. 39 3.5 Application and user-based evaluation ............................................ 39 3.5.1 Micro diagrams in a multiscale environment ........................... 39 3.5.2 The micro diagram user study ................................................... 41 3.5.3 Point collection vis

    Structured parallelism discovery with hybrid static-dynamic analysis and evaluation technique

    Get PDF
    Parallel computer architectures have dominated the computing landscape for the past two decades; a trend that is only expected to continue and intensify, with increasing specialization and heterogeneity. This creates huge pressure across the software stack to produce programming languages, libraries, frameworks and tools which will efficiently exploit the capabilities of parallel computers, not only for new software, but also revitalizing existing sequential code. Automatic parallelization, despite decades of research, has had limited success in transforming sequential software to take advantage of efficient parallel execution. This thesis investigates three approaches that use commutativity analysis as the enabler for parallelization. This has the potential to overcome limitations of traditional techniques. We introduce the concept of liveness-based commutativity for sequential loops. We examine the use of a practical analysis utilizing liveness-based commutativity in a symbolic execution framework. Symbolic execution represents input values as groups of constraints, consequently deriving the output as a function of the input and enabling the identification of further program properties. We employ this feature to develop an analysis and discern commutativity properties between loop iterations. We study the application of this approach on loops taken from real-world programs in the OLDEN and NAS Parallel Benchmark (NPB) suites, and identify its limitations and related overheads. Informed by these findings, we develop Dynamic Commutativity Analysis (DCA), a new technique that leverages profiling information from program execution with specific input sets. Using profiling information, we track liveness information and detect loop commutativity by examining the code’s live-out values. We evaluate DCA against almost 1400 loops of the NPB suite, discovering 86% of them as parallelizable. Comparing our results against dependence-based methods, we match the detection efficacy of two dynamic and outperform three static approaches, respectively. Additionally, DCA is able to automatically detect parallelism in loops which iterate over Pointer-Linked Data Structures (PLDSs), taken from wide range of benchmarks used in the literature, where all other techniques we considered failed. Parallelizing the discovered loops, our methodology achieves an average speedup of 3.6× across NPB (and up to 55×) and up to 36.9× for the PLDS-based loops on a 72-core host. We also demonstrate that our methodology, despite relying on specific input values for profiling each program, is able to correctly identify parallelism that is valid for all potential input sets. Lastly, we develop a methodology to utilize liveness-based commutativity, as implemented in DCA, to detect latent loop parallelism in the shape of patterns. Our approach applies a series of transformations which subsequently enable multiple applications of DCA over the generated multi-loop code section and match its loop commutativity outcomes against the expected criteria for each pattern. Applying our methodology on sets of sequential loops, we are able to identify well-known parallel patterns (i.e., maps, reduction and scans). This extends the scope of parallelism detection to loops, such as those performing scan operations, which cannot be determined as parallelizable by simply evaluating liveness-based commutativity conditions on their original form

    Automated cache optimisations of stencil computations for partial differential equations

    Get PDF
    This thesis focuses on numerical methods that solve partial differential equations. Our focal point is the finite difference method, which solves partial differential equations by approximating derivatives with explicit finite differences. These partial differential equation solvers consist of stencil computations on structured grids. Stencils for computing real-world practical applications are patterns often characterised by many memory accesses and non-trivial arithmetic expressions that lead to high computational costs compared to simple stencils used in much prior proof-of-concept work. In addition, the loop nests to express stencils on structured grids may often be complicated. This work is highly motivated by a specific domain of stencil computations where one of the challenges is non-aligned to the structured grid ("off-the-grid") operations. These operations update neighbouring grid points through scatter and gather operations via non-affine memory accesses, such as {A[B[i]]}. In addition to this challenge, these practical stencils often include many computation fields (need to store multiple grid copies), complex data dependencies and imperfect loop nests. In this work, we aim to increase the performance of stencil kernel execution. We study automated cache-memory-dependent optimisations for stencil computations. This work consists of two core parts with their respective contributions.The first part of our work tries to reduce the data movement in stencil computations of practical interest. Data movement is a dominant factor affecting the performance of high-performance computing applications. It has long been a target of optimisations due to its impact on execution time and energy consumption. This thesis tries to relieve this cost by applying temporal blocking optimisations, also known as time-tiling, to stencil computations. Temporal blocking is a well-known technique to enhance data reuse in stencil computations. However, it is rarely used in practical applications but rather in theoretical examples to prove its efficacy. Applying temporal blocking to scientific simulations is more complex. More specifically, in this work, we focus on the application context of seismic and medical imaging. In this area, we often encounter scatter and gather operations due to signal sources and receivers at arbitrary locations in the computational domain. These operations make the application of temporal blocking challenging. We present an approach to overcome this challenge and successfully apply temporal blocking.In the second part of our work, we extend the first part as an automated approach targeting a wide range of simulations modelled with partial differential equations. Since temporal blocking is error-prone, tedious to apply by hand and highly complex to assimilate theoretically and practically, we are motivated to automate its application and automatically generate code that benefits from it. We discuss algorithmic approaches and present a generalised compiler pipeline to automate the application of temporal blocking. These passes are written in the Devito compiler. They are used to accelerate the computation of stencil kernels in areas such as seismic and medical imaging, computational fluid dynamics and machine learning. \href{www.devitoproject.org}{Devito} is a Python package to implement optimised stencil computation (e.g., finite differences, image processing, machine learning) from high-level symbolic problem definitions. Devito builds on \href{www.sympy.org}{SymPy} and employs automated code generation and just-in-time compilation to execute optimised computational kernels on several computer platforms, including CPUs, GPUs, and clusters thereof. We show how we automate temporal blocking code generation without user intervention and often achieve better time-to-solution. We enable domain-specific optimisation through compiler passes and offer temporal blocking gains from a high-level symbolic abstraction. These automated optimisations benefit various computational kernels for solving real-world application problems.Open Acces
    corecore