1,070 research outputs found
Data-Centric Diet: Effective Multi-center Dataset Pruning for Medical Image Segmentation
This paper seeks to address the dense labeling problems where a significant
fraction of the dataset can be pruned without sacrificing much accuracy. We
observe that, on standard medical image segmentation benchmarks, the loss
gradient norm-based metrics of individual training examples applied in image
classification fail to identify the important samples. To address this issue,
we propose a data pruning method by taking into consideration the training
dynamics on target regions using Dynamic Average Dice (DAD) score. To the best
of our knowledge, we are among the first to address the data importance in
dense labeling tasks in the field of medical image analysis, making the
following contributions: (1) investigating the underlying causes with rigorous
empirical analysis, and (2) determining effective data pruning approach in
dense labeling problems. Our solution can be used as a strong yet simple
baseline to select important examples for medical image segmentation with
combined data sources.Comment: Accepted by ICML workshops 202
Dynamically allocating processor resources between nearby and distant ILP
Journal ArticleModern superscalar processors use wide instruction issue widths and out-of-order execution in order to increase instruction-level parallelism (ILP). Because instructions must be committed in order so as to guarantee precise exceptions, increasing ILP implies increasing the sizes of structures such as the register file, issue queue, and reorder buffer. Simultaneously, cycle time constraints limit the sizes of these structures, resulting in conflicting design requirements. In this paper, we present a novel microarchitecture designed to overcome the limitations of a register file size dictated by cycle time constraints. Available registers are dynamically allocated between the primary program thread and a future thread. The future thread executes instructions when the primary thread is limited by resource availability. The future thread is not constrained by in-order commit requirements. It is therefore able to examine a much larger instruction window and jump far ahead to execute ready instructions. Results are communicated back to the primary thread by warming up the register file, instruction cache, data cache, and instruction reuse buffer, and by resolving branch mispredicts early. The proposed microarchitecture is able to get an overall speedup of 1.17 over the base processor for our benchmark set, with speedups of up to 1.64
Algorithmic and infrastructural software development for cryo electron tomography
Many Cryo Electron Microscopy (cryoEM) software packages have accumulated significant technical debts over the years, resulting in overcomplicated codebases that are costly to maintain and that slow down development. In this thesis, we advocate for the development of open-source cryoEM core libraries as a solution to this debt and with the ultimate goal of improving the developer and user experience.
First, a brief summary of cryoEM is presented, with an emphasis on projection algorithms and tomography. Second, the requirements of modern and future cryoEM image processing are discussed. Third, a new experimental cryoEM core library written in modern C++ is introduced. This library prioritises performance and code reusability, and is designed around a few core functions which offers an efficient model to manipulate multidimensional arrays at an index-wise and element-wise level. C++ template metaprogramming allowed us to develop modular and transparent compute backends, that provide great CPU and GPU performance, unified in an easy to use interface. Fourth, new projection algorithms will be described, notably a grid-driven approach to accurately insert and sample central slices in 3-dimensional (3d) Fourier space. A Fourier-based fused backward-forward projection, further improving the computational efficiency and accuracy of reprojections, will also be presented. Fifth, and as part of our efforts to test and showcase the library, we have started to implement a tilt series alignment package that gathers existing and new techniques into an automated pipeline. The current program first estimates the per-tilt translations and specimen stage rotation using a coarse alignment based on cosine stretching. It then fits the Thon rings of each tilt image as part of a global optimization to estimate the specimen inclination. Finally, we are using our Fourier-based fused reprojection to efficiently refine the per-tilt translations, and are starting to explore ways that would allow us to refine the per-tilt stage rotations
Exploring the Suitability of Existing Tools for Constructing Executable Java Slices
Java is a managed programming language, and Java programs are executed in a virtual machine (VM) environment. Java VMs not only execute the primary application program, but also perform several other auxiliary tasks at runtime. Some of these auxiliary tasks, including the program proling and security checks, are typically performed inline with the program execution, stalling the applica- tion progress and slowing program execution. We hypothesize that the execution of individual inline auxiliary tasks do not need knowledge of the entire previ- ous program state. In such cases, it might be possible to abstract individual or sets of auxiliary tasks, along with the code statements that compute the required program state, and execute them in a separate thread. The results of such ab- stracted auxiliary tasks can then be made available to the main program before they are needed, eliminating the execution time stall and improving runtime pro- gram eciency. However, to test this hypothesis we need access to robust tools to determine and generate the program slice required for each auxiliary task. The goal of this thesis is to study existing Java slicers, test their ability to generate correct executable slices, and evaluate their suitablity for this project. Additionally, we also aim to compare the size of the static slice with the minimal dynamic program slice for each task. This comparison will allow us to determine the quality of the static slicing algorithm for Java programs, and provide us with knowledge to enhance the slicing algorithm, is possible. To our knowledge, one of the most robust static Java slicer implementation available publicly is the Indus Java Analysis Framework developed at Kansas State University. We also found the latest dynamic java slicer, which was developed at Saarland University. For this thesis we study these two state-of-the-art Java slicers and evaluate their suit- ability for our parallelization project. We found that although the Indus slicer is a very ecient and robust tool for the tasks it was originally intended to perform (debugging), the slicer routinely fails to produce correct executable slices. Moreover, the code base has several dependences with older (and now defunct) libraries and also needs all source code to be compiled with obsolete JDK com- pilers. After a detailed study of the Indus static slicer we conclude that massive changes to the Indus code-base may be necessary to bring it up-to-date and make it suitable for our project. We also found that the Saarland dynamic Java slicer is able to correctly produce dynamic program slices and has few dependences on other tools. Unfortunately, this slicer frequently runs out of memory for longer (standard benchmark-grade) programs, and will also need to be updated for our tasks
SDN Access Control for the Masses
The evolution of Software-Defined Networking (SDN) has so far been
predominantly geared towards defining and refining the abstractions on the
forwarding and control planes. However, despite a maturing south-bound
interface and a range of proposed network operating systems, the network
management application layer is yet to be specified and standardized. It has
currently poorly defined access control mechanisms that could be exposed to
network applications. Available mechanisms allow only rudimentary control and
lack procedures to partition resource access across multiple dimensions.
We address this by extending the SDN north-bound interface to provide control
over shared resources to key stakeholders of network infrastructure: network
providers, operators and application developers. We introduce a taxonomy of SDN
access models, describe a comprehensive design for SDN access control and
implement the proposed solution as an extension of the ONOS network controller
intent framework
Technical Design Report for PANDA Electromagnetic Calorimeter (EMC)
This document presents the technical layout and the envisaged performance of the Electromagnetic Calorimeter (EMC) for the
PANDA target spectrometer. The EMC has been designed to meet the physics goals of the PANDA experiment. The performance figures are based on extensive prototype tests and radiation hardness studies. The document shows that the EMC is ready for construction up to the front-end electronics interface
Damage to left frontal regulatory circuits produces greater positive emotional reactivity in frontotemporal dementia.
Positive emotions foster social relationships and motivate thought and action. Dysregulation of positive emotion may give rise to debilitating clinical symptomatology such as mania, risk-taking, and disinhibition. Neuroanatomically, there is extensive evidence that the left hemisphere of the brain, and the left frontal lobe in particular, plays an important role in positive emotion generation. Although prior studies have found that left frontal injury decreases positive emotion, it is not clear whether selective damage to left frontal emotion regulatory systems can actually increase positive emotion. We measured happiness reactivity in 96 patients with frontotemporal dementia (FTD), a neurodegenerative disease that targets emotion-relevant neural systems and causes alterations in positive emotion (i.e., euphoria and jocularity), and in 34 healthy controls. Participants watched a film clip designed to elicit happiness and a comparison film clip designed to elicit sadness while their facial behavior, physiological reactivity, and self-reported emotional experience were monitored. Whole-brain voxel-based morphometry (VBM) analyses revealed that atrophy in predominantly left hemisphere fronto-striatal emotion regulation systems including left ventrolateral prefrontal cortex, orbitofrontal cortex, anterior insula, and striatum was associated with greater happiness facial behavior during the film (pFWE < .05). Atrophy in left anterior insula and bilateral frontopolar cortex was also associated with higher cardiovascular reactivity (i.e., heart rate and blood pressure) but not self-reported positive emotional experience during the happy film (p < .005, uncorrected). No regions emerged as being associated with greater sadness reactivity, which suggests that left-lateralized fronto-striatal atrophy is selectively associated with happiness dysregulation. Whereas previous models have proposed that left frontal injury decreases positive emotional responding, we argue that selective disruption of left hemisphere emotion regulating systems can impair the ability to suppress positive emotions such as happiness
Energy-Efficient Acceleration of Asynchronous Programs.
Asynchronous or event-driven programming has
become the dominant programming model in the last few years. In this
model, computations are posted as events to an event queue from where
they get processed asynchronously by the application. A huge fraction
of computing systems built today use asynchronous programming. All the Web 2.0 JavaScript applications (e.g., Gmail, Facebook) use asynchronous programming. There are now more than two million mobile applications available between the Apple App Store and Google Play, which are all written using asynchronous programming. Distributed servers (e.g., Twitter, LinkedIn, PayPal) built using actor-based languages (e.g., Scala) and platforms such as node.js rely on asynchronous events for scalable communication. Internet-of-Things (IoT), embedded systems, sensor networks, desktop GUI applications, etc., all rely on the asynchronous programming model.
Despite the ubiquity of asynchronous programs, their unique execution
characteristics have been largely ignored by conventional processor
architectures, which have remained heavily optimized for synchronous programs. Asynchronous programs are characterized by short events executing varied tasks. This results in a large instruction footprint with little cache locality, severely degrading cache performance. Also, event execution has few repeatable patterns causing poor branch prediction.
This thesis proposes novel processor optimizations exploiting the unique execution characteristics of asynchronous programs for performance optimization and energy-efficiency. These optimizations are designed to make the underlying hardware aware of discrete events and thereafter, exploit the latent Event-Level Parallelism present in these applications. Through speculative pre-execution of future events, cache addresses and branch outcomes are recorded and later used for improving cache and branch predictor performance. A hardware instruction prefetcher specialized for asynchronous programs is also proposed as a comparative design direction.PhDComputer Science and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/120780/1/gauravc_1.pd
Recommended from our members
Visual Mental Imagery Activates Topographically Organized Visual Cortex: PET Investigations
Cerebral blood flow was measured using positron emission tomography (PET) in three experiments while subjects performed mental imagery or analogous perceptual tasks. In Experiment 1, the subjects either visualized letters in grids and decided whether an X mark would have fallen on each letter if it were actually in the grid, or they saw letters in grids and decided whether an X mark fell on each letter. A region identified as part of area 17 by the Talairach and Tournoux (1988) atlas, in addition to other areas involved in vision, was activated more in the mental imagery task than in the perception task. In Experiment 2, the identical stimuli were presented in imagery and baseline conditions, but subjects were asked to form images only in the imagery condition; the portion of area 17 that was more active in the imagery condition of Experiment 1 was also more activated in imagery than in the baseline condition, as was part of area 18. Subjects also were tested with degraded perceptual stimuli, which caused visual cortex to be activated to the same degree in imagery and perception. In both Experiments 1 and 2, however, imagery selectively activated the extreme anterior part of what was identified as area 17, which is inconsistent with the relatively small size of the imaged stimuli. These results, then, suggest that imagery may have activated another region just anterior to area 17. In Experiment 3, subjects were instructed to close their eyes and evaluate visual mental images of upper case letters that were formed at a small size or large size. The small mental images engendered more activation in the posterior portion of visual cortex, and the large mental images engendered more activation in anterior portions of visual cortex. This finding is strong evidence that imagery activates topographically mapped cortex. The activated regions were also consistent with their being localized in area 17. Finally, additional results were consistent with the existence of two types of imagery, one that rests on allocating attention to form a pattern and one that rests on activating stored visual memories.Psycholog
Identifying, Quantifying, Extracting and Enhancing Implicit Parallelism
The shift of the microprocessor industry towards multicore architectures has
placed a huge burden on the programmers by requiring explicit parallelization
for performance. Implicit Parallelization is an alternative that could ease the
burden on programmers by parallelizing applications ???under the covers??? while
maintaining sequential semantics externally. This thesis develops a novel
approach for thinking about parallelism, by casting the problem of
parallelization in terms of instruction criticality. Using this approach,
parallelism in a program region is readily identified when certain conditions
about fetch-criticality are satisfied by the region. The thesis formalizes this
approach by developing a criticality-driven model of task-based
parallelization. The model can accurately predict the parallelism that would be
exposed by potential task choices by capturing a wide set of sources of
parallelism as well as costs to parallelization.
The criticality-driven model enables the development of two key components for
Implicit Parallelization: a task selection policy, and a bottleneck analysis
tool. The task selection policy can partition a single-threaded program into
tasks that will profitably execute concurrently on a multicore architecture in
spite of the costs associated with enforcing data-dependences and with
task-related actions. The bottleneck analysis tool gives feedback to the
programmers about data-dependences that limit parallelism. In particular, there
are several ???accidental dependences??? that can be easily removed with large
improvements in parallelism. These tools combine into a systematic methodology
for performance tuning in Implicit Parallelization. Finally, armed with the
criticality-driven model, the thesis revisits several architectural design
decisions, and finds several encouraging ways forward to increase the scope of
Implicit Parallelization.unpublishednot peer reviewe
- …