110 research outputs found

    Learning Scheduling Algorithms for Data Processing Clusters

    Full text link
    Efficiently scheduling data processing jobs on distributed compute clusters requires complex algorithms. Current systems, however, use simple generalized heuristics and ignore workload characteristics, since developing and tuning a scheduling policy for each workload is infeasible. In this paper, we show that modern machine learning techniques can generate highly-efficient policies automatically. Decima uses reinforcement learning (RL) and neural networks to learn workload-specific scheduling algorithms without any human instruction beyond a high-level objective such as minimizing average job completion time. Off-the-shelf RL techniques, however, cannot handle the complexity and scale of the scheduling problem. To build Decima, we had to develop new representations for jobs' dependency graphs, design scalable RL models, and invent RL training methods for dealing with continuous stochastic job arrivals. Our prototype integration with Spark on a 25-node cluster shows that Decima improves the average job completion time over hand-tuned scheduling heuristics by at least 21%, achieving up to 2x improvement during periods of high cluster load

    A method for high-energy, low-dose mammography using edge illumination x-ray phase-contrast imaging

    Get PDF
    Since the breast is one of the most radiosensitive organs, mammography is arguably the area where lowering radiation dose is of the uttermost importance. Phase-based x-ray imaging methods can provide opportunities in this sense, since they do not require x-rays to be stopped in tissue for image contrast to be generated. Therefore, x-ray energy can be considerably increased compared to those usually exploited by conventional mammography. In this article we show how a novel, optimized approach can lead to considerable dose reductions. This was achieved by matching the edge-illumination phase method, which reaches very high angular sensitivity also at high x-ray energies, to an appropriate image processing algorithm and to a virtually noise-free detection technology capable of reaching almost 100% efficiency at the same energies. Importantly, while proof-of-concept was obtained at a synchrotron, the method has potential for a translation to conventional sources

    Optimizing Selection of the Reference Population for Genotype Imputation From Array to Sequence Variants

    Get PDF
    Imputation of high-density genotypes to whole-genome sequences (WGS) is a cost-effective method to increase the density of available markers within a population. Imputed genotypes have been successfully used for genomic selection and discovery of variants associated with traits of interest for the population. To allow for the use of imputed genotypes for genomic analyses, accuracy of imputation must be high. Accuracy of imputation is influenced by multiple factors, such as size and composition of the reference group, and the allele frequency of variants included. Understanding the use of imputed WGSs prior to the generation of the reference population is important, as accurate imputation might be more focused, for instance, on common or on rare variants. The aim of this study was to present and evaluate new methods to select animals for sequencing relying on a previously genotyped population. The Genetic Diversity Index method optimizes the number of unique haplotypes in the future reference population, while the Highly Segregating Haplotype selection method targets haplotype alleles found throughout the majority of the population of interest. First the WGSs of a dairy cattle population were simulated. The simulated sequences mimicked the linkage disequilibrium level and the variants’ frequency distribution observed in currently available Holstein sequences. Then, reference populations of different sizes, in which animals were selected using both novel methods proposed here as well as two other methods presented in previous studies, were created. Finally, accuracies of imputation obtained with different reference populations were compared against each other. The novel methods were found to have overall accuracies of imputation of more than 0.85. Accuracies of imputation of rare variants reached values above 0.50. In conclusion, if imputed sequences are to be used for discovery of novel associations between variants and traits of interest in the population, animals carrying novel information should be selected and, consequently, the Genetic Diversity Index method proposed here may be used. If sequences are to be used to impute the overall genotyped population, a reference population consisting of common haplotypes carriers selected using the proposed Highly Segregating Haplotype method is recommended

    Detection of post-therapeutic effects in breast carcinoma using hard X-Ray index of refraction computed tomography - A feasibility study

    Get PDF
    Objectives Neoadjuvant chemotherapy is the state-of-the-art treatment in advanced breast cancer. A correct visualization of the post-therapeutic tumor size is of high prognostic relevance. X-ray phase-contrast computed tomography (PC-CT) has been shown to provide improved soft-tissue contrast at a resolution formerly restricted to histopathology, at low doses. This study aimed at assessing ex-vivo the potential use of PC-CT for visualizing the effects of neoadjuvant chemotherapy on breast carcinoma. Materials and Methods The analysis was performed on two ex-vivo formalin-fixed mastectomy samples containing an invasive carcinoma removed from two patients treated with neoadjuvant chemotherapy. Images were matched with corresponding histological slices. The visibility of typical post-therapeutic tissue changes was assessed and compared to results obtained with conventional clinical imaging modalities. Results PC-CT depicted the different tissue types with an excellent correlation to histopathology. Post-therapeutic tissue changes were correctly visualized and the residual tumor mass could be detected. PC-CT outperformed clinical imaging modalities in the detection of chemotherapy-induced tissue alterations including post-therapeutic tumor size. Conclusions PC-CT might become a unique diagnostic tool in the prediction of tumor response to neoadjuvant chemotherapy. PC-CT might be used to assist during histopathological diagnosis, offering a high-resolution and high-contrast virtual histological tool for the accurate delineation of tumor boundaries

    Cloud-scale VM Deflation for Running Interactive Applications On Transient Servers

    Full text link
    Transient computing has become popular in public cloud environments for running delay-insensitive batch and data processing applications at low cost. Since transient cloud servers can be revoked at any time by the cloud provider, they are considered unsuitable for running interactive application such as web services. In this paper, we present VM deflation as an alternative mechanism to server preemption for reclaiming resources from transient cloud servers under resource pressure. Using real traces from top-tier cloud providers, we show the feasibility of using VM deflation as a resource reclamation mechanism for interactive applications in public clouds. We show how current hypervisor mechanisms can be used to implement VM deflation and present cluster deflation policies for resource management of transient and on-demand cloud VMs. Experimental evaluation of our deflation system on a Linux cluster shows that microservice-based applications can be deflated by up to 50\% with negligible performance overhead. Our cluster-level deflation policies allow overcommitment levels as high as 50\%, with less than a 1\% decrease in application throughput, and can enable cloud platforms to increase revenue by 30\%.Comment: To appear at ACM HPDC 202

    Evolution of Vertebrate Transient Receptor Potential Vanilloid 3 Channels: Opposite Temperature Sensitivity between Mammals and Western Clawed Frogs

    Get PDF
    Transient Receptor Potential (TRP) channels serve as temperature receptors in a wide variety of animals and must have played crucial roles in thermal adaptation. The TRP vanilloid (TRPV) subfamily contains several temperature receptors with different temperature sensitivities. The TRPV3 channel is known to be highly expressed in skin, where it is activated by warm temperatures and serves as a sensor to detect ambient temperatures near the body temperature of homeothermic animals such as mammals. Here we performed comprehensive comparative analyses of the TRPV subfamily in order to understand the evolutionary process; we identified novel TRPV genes and also characterized the evolutionary flexibility of TRPV3 during vertebrate evolution. We cloned the TRPV3 channel from the western clawed frog Xenopus tropicalis to understand the functional evolution of the TRPV3 channel. The amino acid sequences of the N- and C-terminal regions of the TRPV3 channel were highly diversified from those of other terrestrial vertebrate TRPV3 channels, although central portions were well conserved. In a heterologous expression system, several mammalian TRPV3 agonists did not activate the TRPV3 channel of the western clawed frog. Moreover, the frog TRPV3 channel did not respond to heat stimuli, instead it was activated by cold temperatures. Temperature thresholds for activation were about 16 °C, slightly below the lower temperature limit for the western clawed frog. Given that the TRPV3 channel is expressed in skin, its likely role is to detect noxious cold temperatures. Thus, the western clawed frog and mammals acquired opposite temperature sensitivity of the TRPV3 channel in order to detect environmental temperatures suitable for their respective species, indicating that temperature receptors can dynamically change properties to adapt to different thermal environments during evolution

    Towards Efficient and Scalable Data-Intensive Content Delivery: State-of-the-Art, Issues and Challenges

    Get PDF
    This chapter presents the authors’ work for the Case Study entitled “Delivering Social Media with Scalability” within the framework of High-Performance Modelling and Simulation for Big Data Applications (cHiPSet) COST Action 1406. We identify some core research areas and give an outline of the publications we came up within the framework of the aforementioned action. The ease of user content generation within social media platforms, e.g. check-in information, multimedia data, etc., along with the proliferation of Global Positioning System (GPS)-enabled, always-connected capture devices lead to data streams of unprecedented amount and a radical change in information sharing. Social data streams raise a variety of practical challenges: derivation of real-time meaningful insights from effectively gathered social information, a paradigm shift for content distribution with the leverage of contextual data associated with user preferences, geographical characteristics and devices in general, etc. In this article we present the methodology we followed, the results of our work and the outline of a comprehensive survey, that depicts the state-of-the-art situation and organizes challenges concerning social media streams and the infrastructure of the data centers supporting the efficient access to data streams in terms of content distribution, data diffusion, data replication, energy efficiency and network infrastructure. The challenges of enabling better provisioning of social media data have been identified and they were based on the context of users accessing these resources. The existing literature has been systematized and the main research points and industrial efforts in the area were identified and analyzed. In our works, in the framework of the Action, we came up with potential solutions addressing the problems of the area and described how these fit in the general ecosystem

    Design and Implementation of a Framework for Software-Defined Middlebox Networking

    Get PDF
    Increasingly, middleboxes are being deployed as software components and, with the advent of software defined networking, can be deployed at arbitrary locations. However, existing approaches for controlling the operations of middleboxes continue to be rudimentary and ad hoc. As such, a variety of dynamic network control scenarios that are crucial to enhancing the security, availability and performance of enterprise applications cannot be realized today. In this paper, we ask: what is the right way to exercise unified control over the actions of middlebox that enables sophisticated dynamic network control scenarios? Inspired by SDN, we argue that a software-defined middlebox networking (SDMBN) framework?which provides fine-grained, programmatic control over all MB state in concert with control over the network?is the answer to this question. Thus, we present the design and implementation of OpenMB. OpenMB consists of slightly modified middleboxes that expose a southbound API for importing/exporting middlebox state, a middlebox controller that implements a northbound API to define how state can be accessed or placed, and scenario-specific control applications that orchestrate middlebox and network changes in tandem
    corecore