1,657 research outputs found

    Large-Scale Evaluation of Topic Models and Dimensionality Reduction Methods for 2D Text Spatialization

    Full text link
    Topic models are a class of unsupervised learning algorithms for detecting the semantic structure within a text corpus. Together with a subsequent dimensionality reduction algorithm, topic models can be used for deriving spatializations for text corpora as two-dimensional scatter plots, reflecting semantic similarity between the documents and supporting corpus analysis. Although the choice of the topic model, the dimensionality reduction, and their underlying hyperparameters significantly impact the resulting layout, it is unknown which particular combinations result in high-quality layouts with respect to accuracy and perception metrics. To investigate the effectiveness of topic models and dimensionality reduction methods for the spatialization of corpora as two-dimensional scatter plots (or basis for landscape-type visualizations), we present a large-scale, benchmark-based computational evaluation. Our evaluation consists of (1) a set of corpora, (2) a set of layout algorithms that are combinations of topic models and dimensionality reductions, and (3) quality metrics for quantifying the resulting layout. The corpora are given as document-term matrices, and each document is assigned to a thematic class. The chosen metrics quantify the preservation of local and global properties and the perceptual effectiveness of the two-dimensional scatter plots. By evaluating the benchmark on a computing cluster, we derived a multivariate dataset with over 45 000 individual layouts and corresponding quality metrics. Based on the results, we propose guidelines for the effective design of text spatializations that are based on topic models and dimensionality reductions. As a main result, we show that interpretable topic models are beneficial for capturing the structure of text corpora. We furthermore recommend the use of t-SNE as a subsequent dimensionality reduction.Comment: To be published at IEEE VIS 2023 conferenc

    Resource and thermal management in 3D-stacked multi-/many-core systems

    Full text link
    Continuous semiconductor technology scaling and the rapid increase in computational needs have stimulated the emergence of multi-/many-core processors. While up to hundreds of cores can be placed on a single chip, the performance capacity of the cores cannot be fully exploited due to high latencies of interconnects and memory, high power consumption, and low manufacturing yield in traditional (2D) chips. 3D stacking is an emerging technology that aims to overcome these limitations of 2D designs by stacking processor dies over each other and using through-silicon-vias (TSVs) for on-chip communication, and thus, provides a large amount of on-chip resources and shortens communication latency. These benefits, however, are limited by challenges in high power densities and temperatures. 3D stacking also enables integrating heterogeneous technologies into a single chip. One example of heterogeneous integration is building many-core systems with silicon-photonic network-on-chip (PNoC), which reduces on-chip communication latency significantly and provides higher bandwidth compared to electrical links. However, silicon-photonic links are vulnerable to on-chip thermal and process variations. These variations can be countered by actively tuning the temperatures of optical devices through micro-heaters, but at the cost of substantial power overhead. This thesis claims that unearthing the energy efficiency potential of 3D-stacked systems requires intelligent and application-aware resource management. Specifically, the thesis improves energy efficiency of 3D-stacked systems via three major components of computing systems: cache, memory, and on-chip communication. We analyze characteristics of workloads in computation, memory usage, and communication, and present techniques that leverage these characteristics for energy-efficient computing. This thesis introduces 3D cache resource pooling, a cache design that allows for flexible heterogeneity in cache configuration across a 3D-stacked system and improves cache utilization and system energy efficiency. We also demonstrate the impact of resource pooling on a real prototype 3D system with scratchpad memory. At the main memory level, we claim that utilizing heterogeneous memory modules and memory object level management significantly helps with energy efficiency. This thesis proposes a memory management scheme at a finer granularity: memory object level, and a page allocation policy to leverage the heterogeneity of available memory modules and cater to the diverse memory requirements of workloads. On the on-chip communication side, we introduce an approach to limit the power overhead of PNoC in (3D) many-core systems through cross-layer thermal management. Our proposed thermally-aware workload allocation policies coupled with an adaptive thermal tuning policy minimize the required thermal tuning power for PNoC, and in this way, help broader integration of PNoC. The thesis also introduces techniques in placement and floorplanning of optical devices to reduce optical loss and, thus, laser source power consumption.2018-03-09T00:00:00

    The design and construction of high performance garbage collectors

    Get PDF
    Garbage collection is a performance-critical component of modern language implementations. The performance of a garbage collector depends in part on major algorithmic decisions, but also significantly on implementation details and techniques which are often incidental in the literature. In this dissertation I look in detail at the performance characteristics of garbage collection on modern architectures. My thesis is that a thorough understanding of the characteristics of the heap to be collected, coupled with measured performance of various design alternatives on a range of modern architectures provides insights that can be used to improve the performance of any garbage collection algorithm. The key contributions of this work are: 1) A new analysis technique (replay collection) for measuring the performance of garbage collection algorithms; 2) a novel technique for applying software prefetch to non-moving garbage collectors that achieves significant performance gains; and 3) a comprehensive analysis of object scanning techniques, cataloguing and comparing the performance of the known methods, and leading to a new technique that optimizes performance without significant cost to the runtime environment. These contributions are applicable to a wide range of garbage collectors, and can provide significant measurable speedups to a design point where each implementer in the past has had to trust intuition or their own benchmarking. The methodologies and implementation techniques contributed in this dissertation have the potential to make a significant improvement to the performance of every garbage collector

    Energy-Aware Data Management on NUMA Architectures

    Get PDF
    The ever-increasing need for more computing and data processing power demands for a continuous and rapid growth of power-hungry data center capacities all over the world. As a first study in 2008 revealed, energy consumption of such data centers is becoming a critical problem, since their power consumption is about to double every 5 years. However, a recently (2016) released follow-up study points out that this threatening trend was dramatically throttled within the past years, due to the increased energy efficiency actions taken by data center operators. Furthermore, the authors of the study emphasize that making and keeping data centers energy-efficient is a continuous task, because more and more computing power is demanded from the same or an even lower energy budget, and that this threatening energy consumption trend will resume as soon as energy efficiency research efforts and its market adoption are reduced. An important class of applications running in data centers are data management systems, which are a fundamental component of nearly every application stack. While those systems were traditionally designed as disk-based databases that are optimized for keeping disk accesses as low a possible, modern state-of-the-art database systems are main memory-centric and store the entire data pool in the main memory, which replaces the disk as main bottleneck. To scale up such in-memory database systems, non-uniform memory access (NUMA) hardware architectures are employed that face a decreased bandwidth and an increased latency when accessing remote memory compared to the local memory. In this thesis, we investigate energy awareness aspects of large scale-up NUMA systems in the context of in-memory data management systems. To do so, we pick up the idea of a fine-grained data-oriented architecture and improve the concept in a way that it keeps pace with increased absolute performance numbers of a pure in-memory DBMS and scales up on NUMA systems in the large scale. To achieve this goal, we design and build ERIS, the first scale-up in-memory data management system that is designed from scratch to implement a data-oriented architecture. With the help of the ERIS platform, we explore our novel core concept for energy awareness, which is Energy Awareness by Adaptivity. The concept describes that software and especially database systems have to quickly respond to environmental changes (i.e., workload changes) by adapting themselves to enter a state of low energy consumption. We present the hierarchically organized Energy-Control Loop (ECL), which is a reactive control loop and provides two concrete implementations of our Energy Awareness by Adaptivity concept, namely the hardware-centric Resource Adaptivity and the software-centric Storage Adaptivity. Finally, we will give an exhaustive evaluation regarding the scalability of ERIS as well as our adaptivity facilities

    Three pitfalls in Java performance evaluation

    Get PDF
    The Java programming language has known a remarkable growth over the last decade. This is partially due to the infrastructure required to run Java ap- plications on general purpose microprocessors: a Java virtual machine (VM). The VM ensures that Java applications are portable across different hardware platforms, because it shelters the applications from the underlying system. Hence the motto write once, run (almost) anywhere. Java applications are compiled to an intermediate form, called bytecode, and consist of a number of so-called class files. The virtual machine takes care of class loading, interpreting or compiling the bytecode to the native code of the underlying hardware platform, thread scheduling, garbage collection, etc. As such, during the execution of a Java application, the VM regularly intervenes to take care of housekeeping tasks and to optimise the application as it is executing. Furthermore, the specific implementation details of most virtual machines insert non-deterministic behaviour, not into the semantic part of the execution, but rather into the lower level execution. For example, to bring a Java application up to competitive speed with classical compiled programs written in languages such as C, the virtual machine needs to optimise Java bytecode. To limit the execution overhead, most virtual machines use a time sampling mechanism to determine the hot methods in the application. This introduces non-determinism, as over several runs, the methods are not always optimised at the same moment, nor is the set of optimised methods always the same. Other factors that introduce non-determinism are the thread scheduling, garbage collection, etc. It is readily seen that performance analysis of Java applications is not as simple as it seems at first, and warrants closer inspection. In this dissertation we are mainly interested in the behaviour of Java applications and their performance. In the course of this work, we uncovered three major pitfalls that were not taken into account by researchers when analysing Java performance prior to this work. We will briefly summarise the main achievements presented in this dissertation. The first pitfall we present involves the interaction between the virtual machine, the application and the input to the application. The performance for short running applications is shown to be mainly determined by the virtual machine. For longer running applications, this influence decreases, but remains tangible. We use statistical analysis, such as principal components analysis and cluster analysis (K-means and hierarchical clustering) to demonstrate and clarify the pitfall. By means of a large number of performance char- acteristics measured using hardware performance counters, five virtual machines and fourteen benchmarks with both a small and a large input size, we demonstrate that short running workloads are primarily clustered by virtual machines. Even for long running applications from the SPECjvm98 benchmark suite, the virtual machine still exerts a large influence on the observed behaviour at the microarchitectural level. This work has shown the need for both larger and longer running benchmarks than were available prior to it – this was (partially) met by the introduction of the DaCapo benchmark suite – as well as a careful consideration when setting up an experiment to avoid measuring the virtual machine, rather than the benchmark. Prior to this work, people were quite often using simulation with short running applications (to save time) for exploring Java performance. The second pitfall we uncover involves the analysis of performance numbers. During a survey of 50 papers published at premier conferences, such as OOPSLA, PLDI, CGO, ISMM and VEE, over the past seven years, we found that a variety of approaches are used, both for experimental design – for example, the input size, virtual machines, heap sizes, etc. – and, even more importantly, for data analysis – for example, using a best out of 3 performance number. New techniques are pitted against existing work using these prevalent approaches, and conclusions regarding their successfulness in beating prior state-of-the-art are based upon them. Given the fact that the execution of Java applications usually involves non-determinism in the virtual machine – for example, when determining which methods to optimise – it should come as no surprise that the lack of statistical rigour in these prevalent approaches leads to misleading or even incorrect conclusions. By this we mean that the conclusions are either not representative of what actually happens, or even contradict reality, as modelled in a statistical manner. To circumvent this pitfall, we propose a rigorous statistical approach that uses confidence intervals to both report and compare performance numbers. We also claim that sufficient experiments should be conducted to get a reliable performance measure. The non-determinism caused by the timer-based optimisation component in a virtual machine can be eliminated using so-called replay compilation. This technique will record a compilation plan during a first execution or profiling run of the application. During a second execution, the application is iterated twice: once to compile and optimise all methods found in the compilation plan, and a second time to perform the actual measurement. It turns out however that current practice of using either a single plan – corresponding to the best performing profiling run – or a combined plan choosing the methods that were optimised in, say, more than half the profiling runs, is no match for using multiple plans. The variability observed in the plans themselves is too large to capture in one of the current practices. Consequently, using multiple plans is definitely the better option. Moreover, this allows using a matched-pair approach in the data analysis, which results in tighter confidence intervals for the mean performance number. The third pitfall we examine is the usage of global performance numbers when tuning either an application or a virtual machine. We show that Java applications exhibit phase behaviour at the method level. This means that instances of the same method show more similarity to each other, behaviourwise, than to instances of other methods. A phase can then be identified as a set of sub-trees of the dynamic call-tree, with each sub-tree headed by the same method. We present an two-step algorithm that allows correlating hardware performance counter data in step 2 with the phases determined in step 1. The information obtained can be applied to show the programmer which methods perform worse than average, for example with respect to the number of cache misses they incur. In the dissertation, we pay particular attention to statistical rigour. For each pitfall, we use statistics to demonstrate its presence. Hopefully this work will encourage other researchers to use more rigour in their work as well

    Effects of hyperlinks on navigation in virtual environments

    No full text
    Hyperlinks introduce discontinuities of movement to 3-D virtual environments (VEs). Nine independent attributes of hyperlinks are defined and their likely effects on navigation in VEs are discussed. Four experiments are described in which participants repeatedly navigated VEs that were either conventional (i.e. obeyed the laws of Euclidean space), or contained hyperlinks. Participants learned spatial knowledge slowly in both types of environment, echoing the findings of previous studies that used conventional VEs. The detrimental effects on participants' spatial knowledge of using hyperlinks for movement were reduced when a time-delay was introduced, but participants still developed less accurate knowledge than they did in the conventional VEs. Visual continuity had a greater influence on participants' rate of learning than continuity of movement, and participants were able to exploit hyperlinks that connected together disparate regions of a VE to reduce travel time

    Study of Layout Techniques in Dynamic Logic Circuitry for Single Event Effect Mitigation

    Get PDF
    Dynamic logic circuits are highly suitable for high-speed applications, considering the fact that they have a smaller area and faster transition. However, their application in space or other radiation-rich environments has been significantly inhibited by their susceptibility to radiation effects. This work begins with the basic operations of dynamic logic circuits, elaborates upon the physics underlying their radiation vulnerability, and evaluates three techniques that harden dynamic logic from the layout: drain extension, pulse quenching, and a proposed method. The drain extension method adds an extra drain to the sensitive node in order to improve charge sharing, the pulse quenching scheme utilizes charge sharing by duplicating a component that offsets the transient pulse, and the proposed technique takes advantage of both. Domino buffers designed using these three techniques, along with a conventional design as reference, were modeled and simulated using a 3D TCAD tool. Simulation results confirm a significant reduction of soft error rate in the proposed technique and suggest a greater reduction with angled incidence. A 130 nm chip containing designed buffer and register chains was fabricated and tested with heavy ion irradiation. According to the experiment results, the proposed design achieved 30% soft error rate reduction, with 19%, 20%, and 10% overhead in speed, power, and area, respectively

    Ramasse-miettes générationnel et incémental gérant les cycles et les gros objets en utilisant des frames délimités

    Get PDF
    Ces dernières années, des recherches ont été menées sur plusieurs techniques reliées à la collection des déchets. Plusieurs découvertes centrales pour le ramassage de miettes par copie ont été réalisées. Cependant, des améliorations sont encore possibles. Dans ce mémoire, nous introduisons des nouvelles techniques et de nouveaux algorithmes pour améliorer le ramassage de miettes. En particulier, nous introduisons une technique utilisant des cadres délimités pour marquer et retracer les pointeurs racines. Cette technique permet un calcul efficace de l'ensemble des racines. Elle réutilise des concepts de deux techniques existantes, card marking et remembered sets, et utilise une configuration bidirectionelle des objets pour améliorer ces concepts en stabilisant le surplus de mémoire utilisée et en réduisant la charge de travail lors du parcours des pointeurs. Nous présentons aussi un algorithme pour marquer récursivement les objets rejoignables sans utiliser de pile (éliminant le gaspillage de mémoire habituel). Nous adaptons cet algorithme pour implémenter un ramasse-miettes copiant en profondeur et améliorer la localité du heap. Nous améliorons l'algorithme de collection des miettes older-first et sa version générationnelle en ajoutant une phase de marquage garantissant la collection de toutes les miettes, incluant les structures cycliques réparties sur plusieurs fenêtres. Finalement, nous introduisons une technique pour gérer les gros objets. Pour tester nos idées, nous avons conçu et implémenté, dans la machine virtuelle libre Java SableVM, un cadre de développement portable et extensible pour la collection des miettes. Dans ce cadre, nous avons implémenté des algorithmes de collection semi-space, older-first et generational. Nos expérimentations montrent que la technique du cadre délimité procure des performances compétitives pour plusieurs benchmarks. Elles montrent aussi que, pour la plupart des benchmarks, notre algorithme de parcours en profondeur améliore la localité et augmente ainsi la performance. Nos mesures de la performance générale montrent que, utilisant nos techniques, un ramasse-miettes peut délivrer une performance compétitive et surpasser celle des ramasses-miettes existants pour plusieurs benchmarks. ______________________________________________________________________________ MOTS-CLÉS DE L’AUTEUR : Ramasse-Miettes, Machine Virtuelle, Java, SableVM
    • …
    corecore