1,070 research outputs found

    Data-Centric Diet: Effective Multi-center Dataset Pruning for Medical Image Segmentation

    Full text link
    This paper seeks to address the dense labeling problems where a significant fraction of the dataset can be pruned without sacrificing much accuracy. We observe that, on standard medical image segmentation benchmarks, the loss gradient norm-based metrics of individual training examples applied in image classification fail to identify the important samples. To address this issue, we propose a data pruning method by taking into consideration the training dynamics on target regions using Dynamic Average Dice (DAD) score. To the best of our knowledge, we are among the first to address the data importance in dense labeling tasks in the field of medical image analysis, making the following contributions: (1) investigating the underlying causes with rigorous empirical analysis, and (2) determining effective data pruning approach in dense labeling problems. Our solution can be used as a strong yet simple baseline to select important examples for medical image segmentation with combined data sources.Comment: Accepted by ICML workshops 202

    Dynamically allocating processor resources between nearby and distant ILP

    Get PDF
    Journal ArticleModern superscalar processors use wide instruction issue widths and out-of-order execution in order to increase instruction-level parallelism (ILP). Because instructions must be committed in order so as to guarantee precise exceptions, increasing ILP implies increasing the sizes of structures such as the register file, issue queue, and reorder buffer. Simultaneously, cycle time constraints limit the sizes of these structures, resulting in conflicting design requirements. In this paper, we present a novel microarchitecture designed to overcome the limitations of a register file size dictated by cycle time constraints. Available registers are dynamically allocated between the primary program thread and a future thread. The future thread executes instructions when the primary thread is limited by resource availability. The future thread is not constrained by in-order commit requirements. It is therefore able to examine a much larger instruction window and jump far ahead to execute ready instructions. Results are communicated back to the primary thread by warming up the register file, instruction cache, data cache, and instruction reuse buffer, and by resolving branch mispredicts early. The proposed microarchitecture is able to get an overall speedup of 1.17 over the base processor for our benchmark set, with speedups of up to 1.64

    Algorithmic and infrastructural software development for cryo electron tomography

    Get PDF
    Many Cryo Electron Microscopy (cryoEM) software packages have accumulated significant technical debts over the years, resulting in overcomplicated codebases that are costly to maintain and that slow down development. In this thesis, we advocate for the development of open-source cryoEM core libraries as a solution to this debt and with the ultimate goal of improving the developer and user experience. First, a brief summary of cryoEM is presented, with an emphasis on projection algorithms and tomography. Second, the requirements of modern and future cryoEM image processing are discussed. Third, a new experimental cryoEM core library written in modern C++ is introduced. This library prioritises performance and code reusability, and is designed around a few core functions which offers an efficient model to manipulate multidimensional arrays at an index-wise and element-wise level. C++ template metaprogramming allowed us to develop modular and transparent compute backends, that provide great CPU and GPU performance, unified in an easy to use interface. Fourth, new projection algorithms will be described, notably a grid-driven approach to accurately insert and sample central slices in 3-dimensional (3d) Fourier space. A Fourier-based fused backward-forward projection, further improving the computational efficiency and accuracy of reprojections, will also be presented. Fifth, and as part of our efforts to test and showcase the library, we have started to implement a tilt series alignment package that gathers existing and new techniques into an automated pipeline. The current program first estimates the per-tilt translations and specimen stage rotation using a coarse alignment based on cosine stretching. It then fits the Thon rings of each tilt image as part of a global optimization to estimate the specimen inclination. Finally, we are using our Fourier-based fused reprojection to efficiently refine the per-tilt translations, and are starting to explore ways that would allow us to refine the per-tilt stage rotations

    Exploring the Suitability of Existing Tools for Constructing Executable Java Slices

    Get PDF
    Java is a managed programming language, and Java programs are executed in a virtual machine (VM) environment. Java VMs not only execute the primary application program, but also perform several other auxiliary tasks at runtime. Some of these auxiliary tasks, including the program proling and security checks, are typically performed inline with the program execution, stalling the applica- tion progress and slowing program execution. We hypothesize that the execution of individual inline auxiliary tasks do not need knowledge of the entire previ- ous program state. In such cases, it might be possible to abstract individual or sets of auxiliary tasks, along with the code statements that compute the required program state, and execute them in a separate thread. The results of such ab- stracted auxiliary tasks can then be made available to the main program before they are needed, eliminating the execution time stall and improving runtime pro- gram eciency. However, to test this hypothesis we need access to robust tools to determine and generate the program slice required for each auxiliary task. The goal of this thesis is to study existing Java slicers, test their ability to generate correct executable slices, and evaluate their suitablity for this project. Additionally, we also aim to compare the size of the static slice with the minimal dynamic program slice for each task. This comparison will allow us to determine the quality of the static slicing algorithm for Java programs, and provide us with knowledge to enhance the slicing algorithm, is possible. To our knowledge, one of the most robust static Java slicer implementation available publicly is the Indus Java Analysis Framework developed at Kansas State University. We also found the latest dynamic java slicer, which was developed at Saarland University. For this thesis we study these two state-of-the-art Java slicers and evaluate their suit- ability for our parallelization project. We found that although the Indus slicer is a very ecient and robust tool for the tasks it was originally intended to perform (debugging), the slicer routinely fails to produce correct executable slices. Moreover, the code base has several dependences with older (and now defunct) libraries and also needs all source code to be compiled with obsolete JDK com- pilers. After a detailed study of the Indus static slicer we conclude that massive changes to the Indus code-base may be necessary to bring it up-to-date and make it suitable for our project. We also found that the Saarland dynamic Java slicer is able to correctly produce dynamic program slices and has few dependences on other tools. Unfortunately, this slicer frequently runs out of memory for longer (standard benchmark-grade) programs, and will also need to be updated for our tasks

    SDN Access Control for the Masses

    Full text link
    The evolution of Software-Defined Networking (SDN) has so far been predominantly geared towards defining and refining the abstractions on the forwarding and control planes. However, despite a maturing south-bound interface and a range of proposed network operating systems, the network management application layer is yet to be specified and standardized. It has currently poorly defined access control mechanisms that could be exposed to network applications. Available mechanisms allow only rudimentary control and lack procedures to partition resource access across multiple dimensions. We address this by extending the SDN north-bound interface to provide control over shared resources to key stakeholders of network infrastructure: network providers, operators and application developers. We introduce a taxonomy of SDN access models, describe a comprehensive design for SDN access control and implement the proposed solution as an extension of the ONOS network controller intent framework

    Technical Design Report for PANDA Electromagnetic Calorimeter (EMC)

    Get PDF
    This document presents the technical layout and the envisaged performance of the Electromagnetic Calorimeter (EMC) for the PANDA target spectrometer. The EMC has been designed to meet the physics goals of the PANDA experiment. The performance figures are based on extensive prototype tests and radiation hardness studies. The document shows that the EMC is ready for construction up to the front-end electronics interface

    Damage to left frontal regulatory circuits produces greater positive emotional reactivity in frontotemporal dementia.

    Get PDF
    Positive emotions foster social relationships and motivate thought and action. Dysregulation of positive emotion may give rise to debilitating clinical symptomatology such as mania, risk-taking, and disinhibition. Neuroanatomically, there is extensive evidence that the left hemisphere of the brain, and the left frontal lobe in particular, plays an important role in positive emotion generation. Although prior studies have found that left frontal injury decreases positive emotion, it is not clear whether selective damage to left frontal emotion regulatory systems can actually increase positive emotion. We measured happiness reactivity in 96 patients with frontotemporal dementia (FTD), a neurodegenerative disease that targets emotion-relevant neural systems and causes alterations in positive emotion (i.e., euphoria and jocularity), and in 34 healthy controls. Participants watched a film clip designed to elicit happiness and a comparison film clip designed to elicit sadness while their facial behavior, physiological reactivity, and self-reported emotional experience were monitored. Whole-brain voxel-based morphometry (VBM) analyses revealed that atrophy in predominantly left hemisphere fronto-striatal emotion regulation systems including left ventrolateral prefrontal cortex, orbitofrontal cortex, anterior insula, and striatum was associated with greater happiness facial behavior during the film (pFWE < .05). Atrophy in left anterior insula and bilateral frontopolar cortex was also associated with higher cardiovascular reactivity (i.e., heart rate and blood pressure) but not self-reported positive emotional experience during the happy film (p < .005, uncorrected). No regions emerged as being associated with greater sadness reactivity, which suggests that left-lateralized fronto-striatal atrophy is selectively associated with happiness dysregulation. Whereas previous models have proposed that left frontal injury decreases positive emotional responding, we argue that selective disruption of left hemisphere emotion regulating systems can impair the ability to suppress positive emotions such as happiness

    Energy-Efficient Acceleration of Asynchronous Programs.

    Full text link
    Asynchronous or event-driven programming has become the dominant programming model in the last few years. In this model, computations are posted as events to an event queue from where they get processed asynchronously by the application. A huge fraction of computing systems built today use asynchronous programming. All the Web 2.0 JavaScript applications (e.g., Gmail, Facebook) use asynchronous programming. There are now more than two million mobile applications available between the Apple App Store and Google Play, which are all written using asynchronous programming. Distributed servers (e.g., Twitter, LinkedIn, PayPal) built using actor-based languages (e.g., Scala) and platforms such as node.js rely on asynchronous events for scalable communication. Internet-of-Things (IoT), embedded systems, sensor networks, desktop GUI applications, etc., all rely on the asynchronous programming model. Despite the ubiquity of asynchronous programs, their unique execution characteristics have been largely ignored by conventional processor architectures, which have remained heavily optimized for synchronous programs. Asynchronous programs are characterized by short events executing varied tasks. This results in a large instruction footprint with little cache locality, severely degrading cache performance. Also, event execution has few repeatable patterns causing poor branch prediction. This thesis proposes novel processor optimizations exploiting the unique execution characteristics of asynchronous programs for performance optimization and energy-efficiency. These optimizations are designed to make the underlying hardware aware of discrete events and thereafter, exploit the latent Event-Level Parallelism present in these applications. Through speculative pre-execution of future events, cache addresses and branch outcomes are recorded and later used for improving cache and branch predictor performance. A hardware instruction prefetcher specialized for asynchronous programs is also proposed as a comparative design direction.PhDComputer Science and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/120780/1/gauravc_1.pd

    Identifying, Quantifying, Extracting and Enhancing Implicit Parallelism

    Get PDF
    The shift of the microprocessor industry towards multicore architectures has placed a huge burden on the programmers by requiring explicit parallelization for performance. Implicit Parallelization is an alternative that could ease the burden on programmers by parallelizing applications ???under the covers??? while maintaining sequential semantics externally. This thesis develops a novel approach for thinking about parallelism, by casting the problem of parallelization in terms of instruction criticality. Using this approach, parallelism in a program region is readily identified when certain conditions about fetch-criticality are satisfied by the region. The thesis formalizes this approach by developing a criticality-driven model of task-based parallelization. The model can accurately predict the parallelism that would be exposed by potential task choices by capturing a wide set of sources of parallelism as well as costs to parallelization. The criticality-driven model enables the development of two key components for Implicit Parallelization: a task selection policy, and a bottleneck analysis tool. The task selection policy can partition a single-threaded program into tasks that will profitably execute concurrently on a multicore architecture in spite of the costs associated with enforcing data-dependences and with task-related actions. The bottleneck analysis tool gives feedback to the programmers about data-dependences that limit parallelism. In particular, there are several ???accidental dependences??? that can be easily removed with large improvements in parallelism. These tools combine into a systematic methodology for performance tuning in Implicit Parallelization. Finally, armed with the criticality-driven model, the thesis revisits several architectural design decisions, and finds several encouraging ways forward to increase the scope of Implicit Parallelization.unpublishednot peer reviewe
    • …
    corecore