103 research outputs found

    Quick and practical run-time evaluation of multiple program optimizations

    Get PDF
    This article aims at making iterative optimization practical and usable by speeding up the evaluation of a large range of optimizations. Instead of using a full run to evaluate a single program optimization, we take advantage of periods of stable performance, called phases. For that purpose, we propose a low-overhead phase detection scheme geared toward fast optimization space pruning, using code instrumentation and versioning implemented in a production compiler. Our approach is driven by simplicity and practicality. We show that a simple phase detection scheme can be sufficient for optimization space pruning. We also show it is possible to search for complex optimizations at run-time without resorting to sophisticated dynamic compilation frameworks. Beyond iterative optimization, our approach also enables one to quickly design selftuned applications. Considering 5 representative SpecFP2000 benchmarks, our approach speeds up iterative search for the best program optimizations by a factor of 32 to 962. Phase prediction is 99.4% accurate on average, with an overhead of only 2.6%. The resulting self-tuned implementations bring an average speed-up of 1.4

    Proceedings Virtual Imaging Trials in Medicine 2024

    Get PDF
    This submission comprises the proceedings of the 1st Virtual Imaging Trials in Medicine conference, organized by Duke University on April 22-24, 2024. The listed authors serve as the program directors for this conference. The VITM conference is a pioneering summit uniting experts from academia, industry and government in the fields of medical imaging and therapy to explore the transformative potential of in silico virtual trials and digital twins in revolutionizing healthcare. The proceedings are categorized by the respective days of the conference: Monday presentations, Tuesday presentations, Wednesday presentations, followed by the abstracts for the posters presented on Monday and Tuesday

    Understanding and Optimizing Flash-based Key-value Systems in Data Centers

    Get PDF
    Flash-based key-value systems are widely deployed in today’s data centers for providing high-speed data processing services. These systems deploy flash-friendly data structures, such as slab and Log Structured Merge(LSM) tree, on flash-based Solid State Drives(SSDs) and provide efficient solutions in caching and storage scenarios. With the rapid evolution of data centers, there appear plenty of challenges and opportunities for future optimizations. In this dissertation, we focus on understanding and optimizing flash-based key-value systems from the perspective of workloads, software, and hardware as data centers evolve. We first propose an on-line compression scheme, called SlimCache, considering the unique characteristics of key-value workloads, to virtually enlarge the cache space, increase the hit ratio, and improve the cache performance. Furthermore, to appropriately configure increasingly complex modern key-value data systems, which can have more than 50 parameters with additional hardware and system settings, we quantitatively study and compare five multi-objective optimization methods for auto-tuning the performance of an LSM-tree based key-value store in terms of throughput, the 99th percentile tail latency, convergence time, real-time system throughput, and the iteration process, etc. Last but not least, we conduct an in-depth, comprehensive measurement work on flash-optimized key-value stores with recently emerging 3D XPoint SSDs. We reveal several unexpected bottlenecks in the current key-value store design and present three exemplary case studies to showcase the efficacy of removing these bottlenecks with simple methods on 3D XPoint SSDs. Our experimental results show that our proposed solutions significantly outperform traditional methods. Our study also contributes to providing system implications for auto-tuning the key-value system on flash-based SSDs and optimizing it on revolutionary 3D XPoint based SSDs

    A Parametric Approach for Efficient Speech Storage, Flexible Synthesis and Voice Conversion

    Get PDF
    During the past decades, many areas of speech processing have benefited from the vast increases in the available memory sizes and processing power. For example, speech recognizers can be trained with enormous speech databases and high-quality speech synthesizers can generate new speech sentences by concatenating speech units retrieved from a large inventory of speech data. However, even in today's world of ever-increasing memory sizes and computational resources, there are still lots of embedded application scenarios for speech processing techniques where the memory capacities and the processor speeds are very limited. Thus, there is still a clear demand for solutions that can operate with limited resources, e.g., on low-end mobile devices. This thesis introduces a new segmental parametric speech codec referred to as the VLBR codec. The novel proprietary sinusoidal speech codec designed for efficient speech storage is capable of achieving relatively good speech quality at compression ratios beyond the ones offered by the standardized speech coding solutions, i.e., at bitrates of approximately 1 kbps and below. The efficiency of the proposed coding approach is based on model simplifications, mode-based segmental processing, and the method of adaptive downsampling and quantization. The coding efficiency is also further improved using a novel flexible multi-mode matrix quantizer structure and enhanced dynamic codebook reordering. The compression is also facilitated using a new perceptual irrelevancy removal method. The VLBR codec is also applied to text-to-speech synthesis. In particular, the codec is utilized for the compression of unit selection databases and for the parametric concatenation of speech units. It is also shown that the efficiency of the database compression can be further enhanced using speaker-specific retraining of the codec. Moreover, the computational load is significantly decreased using a new compression-motivated scheme for very fast and memory-efficient calculation of concatenation costs, based on techniques and implementations used in the VLBR codec. Finally, the VLBR codec and the related speech synthesis techniques are complemented with voice conversion methods that allow modifying the perceived speaker identity which in turn enables, e.g., cost-efficient creation of new text-to-speech voices. The VLBR-based voice conversion system combines compression with the popular Gaussian mixture model based conversion approach. Furthermore, a novel method is proposed for converting the prosodic aspects of speech. The performance of the VLBR-based voice conversion system is also enhanced using a new approach for mode selection and through explicit control of the degree of voicing. The solutions proposed in the thesis together form a complete system that can be utilized in different ways and configurations. The VLBR codec itself can be utilized, e.g., for efficient compression of audio books, and the speech synthesis related methods can be used for reducing the footprint and the computational load of concatenative text-to-speech synthesizers to levels required in some embedded applications. The VLBR-based voice conversion techniques can be used to complement the codec both in storage applications and in connection with speech synthesis. It is also possible to only utilize the voice conversion functionality, e.g., in games or other entertainment applications

    Effective memory management for mobile environments

    Get PDF
    Smartphones, tablets, and other mobile devices exhibit vastly different constraints compared to regular or classic computing environments like desktops, laptops, or servers. Mobile devices run dozens of so-called “apps” hosted by independent virtual machines (VM). All these VMs run concurrently and each VM deploys purely local heuristics to organize resources like memory, performance, and power. Such a design causes conflicts across all layers of the software stack, calling for the evaluation of VMs and the optimization techniques specific for mobile frameworks. In this dissertation, we study the design of managed runtime systems for mobile platforms. More specifically, we deepen the understanding of interactions between garbage collection (GC) and system layers. We develop tools to monitor the memory behavior of Android-based apps and to characterize GC performance, leading to the development of new techniques for memory management that address energy constraints, time performance, and responsiveness. We implement a GC-aware frequency scaling governor for Android devices. We also explore the tradeoffs of power and performance in vivo for a range of realistic GC variants, with established benchmarks and real applications running on Android virtual machines. We control for variation due to dynamic voltage and frequency scaling (DVFS), Just-in-time (JIT) compilation, and across established dimensions of heap memory size and concurrency. Finally, we provision GC as a global service that collects statistics from all running VMs and then makes an informed decision that optimizes across all them (and not just locally), and across all layers of the stack. Our evaluation illustrates the power of such a central coordination service and garbage collection mechanism in improving memory utilization, throughput, and adaptability to user activities. In fact, our techniques aim at a sweet spot, where total on-chip energy is reduced (20–30%) with minimal impact on throughput and responsiveness (5–10%). The simplicity and efficacy of our approach reaches well beyond the usual optimization techniques

    Boron neutron capture therapy treatment planning improvements

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Nuclear Engineering, 1998.Includes bibliographical references.The Boron Neutron Capture Therapy (BNCT) treatment planning process of the Harvard/MIT team used for their clinical Phase I trials is very time consuming. If BNCT proves to be a successful treatment, this process must be made more efficient. Since the Monte Carlo treatment planning calculations were the most time consuming aspect of the treatment planning process, requiring more than thirty six hours for scoping calculations of three to five beams and final calculations for two beams, it was targeted for improvement. Three approaches were used to reduce the calculation times. A statistical uncertainty analysis was performed on doses rates and showed that a fewer number of particles could not be used and still meet uncertainty requirements in the region of interest. Unused features were removed and assumptions specific to the Harvard/MIT BNCT treatment planning calculations were hard wired into MCNP by Los Alamos personnel, resulting in a thirty percent decrease in runtimes. MCNP was also installed in parallel on the treatment planning computers, allowing a factor of improvement by roughly the number of computers linked together in parallel. After theses enhancements were made, the final executable, MCNPBNCT, was tested by comparing its calculated dose rates against the previously used executable, MCNPNEHD. Since the dose rates in close agreement, MCNPBNCT was adopted. The final runtime improvement to a single beam scoping run by linking the two 200MHz Pentium Pro computers was to reduce the wall clock runtime from 2 hours thirty minutes to fifty nine minutes. It is anticipated that the addition of ten 900 MHz CPUs will further reduce this calculation to three minutes, giving the medical physicist or radiation oncologist the freedom to use an iterative approach to try different radiation beam orientations to optimize treatment. Additional aspects of the treatment planning process were improved. The previously unrecognized phenomenon of peak dose movement during irradiation and its potential for overdosing the subject was identified. A method of predicting its occurrence was developed to prevent this from occurring. The calculated dose rate was also used to create dose volume histograms and volume averaged doses. These data suggest an alternative method for categorizing the subjects, rather than by peak tissue dose.by John Timothy Goorley.S.M

    Unified Role Assignment Framework For Wireless Sensor Networks

    Get PDF
    Wireless sensor networks are made possible by the continuing improvements in embedded sensor, VLSI, and wireless radio technologies. Currently, one of the important challenges in sensor networks is the design of a systematic network management framework that allows localized and collaborative resource control uniformly across all application services such as sensing, monitoring, tracking, data aggregation, and routing. The research in wireless sensor networks is currently oriented toward a cross-layer network abstraction that supports appropriate fine or course grained resource controls for energy efficiency. In that regard, we have designed a unified role-based service paradigm for wireless sensor networks. We pursue this by first developing a Role-based Hierarchical Self-Organization (RBSHO) protocol that organizes a connected dominating set (CDS) of nodes called dominators. This is done by hierarchically selecting nodes that possess cumulatively high energy, connectivity, and sensing capabilities in their local neighborhood. The RBHSO protocol then assigns specific tasks such as sensing, coordination, and routing to appropriate dominators that end up playing a certain role in the network. Roles, though abstract and implicit, expose role-specific resource controls by way of role assignment and scheduling. Based on this concept, we have designed a Unified Role-Assignment Framework (URAF) to model application services as roles played by local in-network sensor nodes with sensor capabilities used as rules for role identification. The URAF abstracts domain specific role attributes by three models: the role energy model, the role execution time model, and the role service utility model. The framework then generalizes resource management for services by providing abstractions for controlling the composition of a service in terms of roles, its assignment, reassignment, and scheduling. To the best of our knowledge, a generic role-based framework that provides a simple and unified network management solution for wireless sensor networks has not been proposed previously

    Integration of dynamic table translations into dynamic trajectory radiotherapy and mixed photon-electron beam radiotherapy

    Get PDF
    Radiotherapy aims at delivering a lethal dose of radiation to tumor cells while sparing the surrounding healthy tissue and organs. Highly specialized devices, such as C-arm linear accelerators (linacs), have been developed for external beam radiotherapy, which deliver high-energy photon and electron beams. Over the last decades, several improvements in photon beam radiotherapy, such as introducing the photon multileaf collimator (pMLC), enabled intensity-modulated radiotherapy (IMRT), resulting in improved target conformality compared to 3D conformal techniques. Volumetric modulated arc therapy (VMAT) improves the delivery efficiency while maintaining the dosimetric plan quality of IMRT by using dynamic gantry rotation during beam on. Next to the dynamic gantry rotation, also the table and the collimator can rotate dynamically during beam on. This is used in a technique called dynamic trajectory radiotherapy (DTRT). Furthermore, the table can also translate dynamically in three directions, enabling non-isocentric DTRT. However, the potential of dynamic table translations for radiotherapy on a C-arm linac is unexplored. Thus, in this thesis, treatment techniques including dynamic table translations are developed, and potential use cases are shown. A treatment planning process (TPP) for non-isocentric DTRT is developed to create treatment plans with photon beams including dynamic gantry, collimator, and table rotation and dynamic table translation. The intensity modulation optimization of the TPP is based on a hybrid column generation and simulated annealing direct aperture optimization algorithm. The TPP is used to create non-isocentric DTRT plans and several potential use cases for non-isocentric DTRT are demonstrated: While maintaining treatment plan quality, the delivery efficiency is improved by using non-isocentric DTRT instead of multi-isocentric IMRT for craniospinal irradiation. Extending the source-to-target distance in DTRT plans reduces the risk of collision between the gantry and the patient or table and enables additional beam directions, which could be exploited to improve the dosimetric treatment plan quality compared to isocentric DTRT. Contrary to photon beam radiotherapy, electron treatments are still applied using patient-specific cut-outs placed in an applicator. By using the pMLC for electron beam collimation instead of the cut-outs, efficient electron beam treatments are possible. Further, the use of the pMLC facilitates mixed photon-electron beam radiotherapy (MBRT). An MBRT technique using pMLC-collimated electron arcs instead of electron beams with a static gantry angle is developed, resulting in improved delivery efficiency while maintaining the dosimetric plan quality of MBRT plans using electron beams with a static gantry angle. One of the challenges of DTRT on C-arm linacs is accurately predicting potential collisions between the gantry, the patient, and the table during treatment planning. Thus, a collision prediction tool is developed, which is able to predict possible collision interlocks. The tool was successfully validated against measurements. The created treatment plans for non-isocentric DTRT and MBRT were shown to be accurately deliverable on a C-arm linac. For several treatment plans, the dosimetric accuracy was successfully validated using film measurements. In conclusion, this thesis demonstrates the benefits of dynamic table translations in photon and electron beam radiotherapy. With the demonstrated benefits of improved dosimetric treatment plan quality, delivery efficiency, and collision risk, dynamic table translations further facilitate the use of MBRT and DTRT treatment techniques in clinics in the future
    • 

    corecore