50 research outputs found

    Unsupervised Algorithms for Learning Emergent Spatio-Temporal Correlations

    Get PDF
    Many applications require the extraction of spatiotemporal correlations among dynamically emergent features of non-stationary distributions. In such applications it is not possible to obtain an a priori analytical characterization of the emergent distribution. This paper extends the Growing Cell Structures (GCS) network and presents two novel (GIST and GEST) networks, which combine unsupervised feature-extraction and Hebbian learning, for tracking such emergent correlations. The networks were successfully tested on the challenging Data Mapping problem, using an execution driven simulation of their implementation in hardware. The results of the simulations show the successful use of the GIST and GEST networks for extracting spatiotemporal correlation information among emergent features of previously unknown distributions and, indicate the feasibility of hardware implementation for online use. Of the two networks, the GEST network evinced better performance in terms of the network map stability, feature/correlation tracking ability and network sizes evolved

    GPU accelerated path tracing of massive scenes

    Get PDF
    This article presents a solution to path tracing of massive scenes on multiple GPUs. Our approach analyzes the memory access pattern of a path tracer and defines how the scene data should be distributed across up to 16 CPUs with minimal effect on performance. The key concept is that the parts of the scene that have the highest amount of memory accesses are replicated on all GPUs. We propose two methods for maximizing the performance of path tracing when working with partially distributed scene data. Both methods work on the memory management level and therefore path tracer data structures do not have to be redesigned, making our approach applicable to other path tracers with only minor changes in their code. As a proof of concept, we have enhanced the open-source Blender Cycles path tracer. The approach was validated on scenes of sizes up to 169 GB. We show that only 1 5% of the scene data needs to be replicated to all machines for such large scenes. On smaller scenes we have verified that the performance is very close to rendering a fully replicated scene. In terms of scalability we have achieved a parallel efficiency of over 94% using up to 16 GPUs.Web of Science402art. no. 1

    Integration of Tumor Mutation Burden and PD-L1 Testing in Routine Laboratory Diagnostics in Non-Small Cell Lung Cancer

    Get PDF
    In recent years, Non-small cell lung cancer (NSCLC) has evolved into a prime example for precision oncology with multiple FDA-approved "precision" drugs. For the majority of NSCLC lacking targetable genetic alterations, immune checkpoint inhibition (ICI) has become standard of care in first-line treatment or beyond. PD-L1 tumor expression represents the only approved predictive biomarker for PD-L1/PD-1 checkpoint inhibition by therapeutic antibodies. Since PD-L1-negative or low-expressing tumors may also respond to ICI, additional factors are likely to contribute in addition to PD-L1 expression. Tumor mutation burden (TMB) has emerged as a potential candidate; however, it is the most complex biomarker so far and might represent a challenge for routine diagnostics. We therefore established a hybrid capture (HC) next-generation sequencing (NGS) assay that covers all oncogenic driver alterations as well as TMB and validated TMB values by correlation with the assay (F1CDx) used for the CheckMate 227 study. Results of the first consecutive 417 patients analyzed in a routine clinical setting are presented. Data show that fast reliable comprehensive diagnostics including TMB and targetable alterations are obtained with a short turn-around time. Thus, even complex biomarkers can easily be implemented in routine practice to optimize treatment decisions for advanced NSCLC

    CAS-DSM: A Compiler Assisted Software Distributed Shared Memory

    Full text link
    Traditional software Distributed Shared Memory (DSM) systems rely on the virtual memory management mechanisms to detect accesses to shared memory locations and maintain their consistency. The resulting involvement of the OS (kernel) and the associated overhead which is significant, can be avoided by careful compile time analysis and code instrumentation. In this paper, we propose such a Compiler Assisted Software support approach (CAS-DSM). In the CAS-DSM implementation, the involvement of the OS kernel is avoided by instrumenting the application code at the source level. The overhead caused by the execution of the instrumented code is reduced through several aggressive compile time optimizations. Finally, we also address the issue of reducing certain overheads in polling-based implementation of receiving asynchronous messages. We used SUIF, a public domain compiler tool, to implement compile time analysis, instrumentation and optimizations. We modified CVM, a publicly available software DSM to support the instrumentation inserted by the compiler. Detailed performance evaluation of CAS-DSM is reported using a set of Splash/Splash2 parallel application benchmarks on a distributed memory IBM SP-2 machine. CAS-DSM achieved moderate to good performance improvements for most of the applications compared to the original CVM implementation. Reducing the overheads in polling-based implementation improves the performance of CAS-DSM significantly resulting in an overall improvement of 12–52% over the original CVM implementation.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/44573/1/10766_2004_Article_482234.pd

    Reactive NUMA: A design for unifying S-COMA and CC-NUMA

    Get PDF
    This paper proposes and evaluates a new approach to directory-based cache coherence protocols called Reactive NUMA (R-NUMA). An R-NUMA system combines a conventional CC-NUMA coherence protocol with a more-recent Simple-COMA (S-COMA) protocol. What makes R-NUMA novel is the way it dynamically reacts to program and system behavior to switch between CC- NUMA and S-COMA and exploit the best aspects of both protocols. This reactive behavior allows each node in an R-NUMA system to independently choose the best protocol for a particular page, thus providing much greater performance stability than either CC-NUMA or S-COMA alone. Our evaluation is both qualitative and quantitative. We first show the theoretical result that R-NUMA's worst-case performance is bounded within a small constant factor (i.e., two to three times) of the best of CC-NUMA and S-COMA. We then use detailed execution-driven simulation to show that, in practice, R-NUMA usually performs better than either a pure CC-NUMA or pure S-COMA protocol, and no more than 57% worse than the best of CC-NUMA and S- COMA, for our benchmarks and base system assumptions

    Empirical and Statistical Application Modeling Using on -Chip Performance Monitors.

    Get PDF
    To analyze the performance of applications and architectures, both programmers and architects desire formal methods to explain anomalous behavior. To this end, we present various methods that utilize non-intrusive, performance-monitoring hardware only recently available on microprocessors to provide further explanations of observed behavior. All the methods attempt to characterize and explain the instruction-level parallelism achieved by codes on different architectures. We also present a prototype tool automating the analysis process to exploit the advantages of the empirical and statistical methods proposed. The empirical, statistical and hybrid methods are discussed and explained with case study results provided. The given methods further the wealth of tools available to programmer\u27s and architects for generally understanding the performance of scientific applications. Specifically, the models and tools presented provide new methods for evaluating and categorizing application performance. The empirical memory model serves to quantify the hierarchical memory performance of applications by inferring the incurred latencies of codes after the effect of latency hiding techniques are realized. The instruction-level model and its extensions model on-chip performance analytically giving insight into inherent performance bottlenecks in superscalar architectures. The statistical model and its hybrid extension provide other methods of categorizing codes via their statistical variations. The PTERA performance tool automates the use of performance counters for use by these methods across platforms making the modeling process easier still. These unique methods provide alternatives to performance modeling and categorizing not available previously in an attempt to utilize the inherent modeling capabilities of performance monitors on commodity processors for scientific applications

    Protocol optimizations for the CRL distributed shared memory system

    Get PDF
    Thesis (M.S.)--Massachusetts Institute of Technology, Dept. of Mechanical Engineering, 1996.Includes bibliographical references (p. 173-175).by Sandeep K. Gupta.M.S

    Machine learning for biological network inference

    Get PDF
    corecore