561 research outputs found

    MULTIVARIATE REGRESSION APPLIED TO THE PERFORMANCE OTIMIZATION OF A COUNTERCURRENT ULTRACENTRIFUGE - A PRELIMINARY STUDY

    Get PDF
    In this work, the least-squares methodology with covariance matrix is applied to determine a data curve fitting in order to obtain a performance function for the separative power U of an ultracentrifuge as a function of variables that are experimentally controlled. The experimental data refer to 173 experiments on the ultracentrifugation process for uranium isotope separation. The experimental uncertainties related to the independent variables measurements are considered in the calculation of the experimental separative power values, determining an experimental data input covariance matrix. The process control variables, which significantly influence the U values, are chosen in order to give information on the ultracentrifuge behaviour when submitted to several levels of feed flow F and cut . After the model goodness-of-fit validation, a residual analysis is carried out to verify the assumed basis concerning its randomness and independence and mainly the existence of residual heteroscedasticity with any regression model variable. The response curves are made relating the separative power with the control variables F and , to compare the fitted model with the experimental data and finally to calculate their optimized values

    Integrating Scale Out and Fault Tolerance in Stream Processing using Operator State Management

    Get PDF
    As users of big data applications expect fresh results, we witness a new breed of stream processing systems (SPS) that are designed to scale to large numbers of cloud-hosted machines. Such systems face new challenges: (i) to benefit from the pay-as-you-go model of cloud computing, they must scale out on demand, acquiring additional virtual machines (VMs) and parallelising operators when the workload increases; (ii) failures are common with deployments on hundreds of VMs - systems must be fault-tolerant with fast recovery times, yet low per-machine overheads. An open question is how to achieve these two goals when stream queries include stateful operators, which must be scaled out and recovered without affecting query results. Our key idea is to expose internal operator state explicitly to the SPS through a set of state management primitives. Based on them, we describe an integrated approach for dynamic scale out and recovery of stateful operators. Externalised operator state is checkpointed periodically by the SPS and backed up to upstream VMs. The SPS identifies individual operator bottlenecks and automatically scales them out by allocating new VMs and partitioning the check-pointed state. At any point, failed operators are recovered by restoring checkpointed state on a new VM and replaying unprocessed tuples. We evaluate this approach with the Linear Road Benchmark on the Amazon EC2 cloud platform and show that it can scale automatically to a load factor of L=350 with 50 VMs, while recovering quickly from failures. Copyright © 2013 ACM

    Making State Explicit for Imperative Big Data Processing

    Get PDF
    Data scientists often implement machine learning algorithms in imperative languages such as Java, Matlab and R. Yet such implementations fail to achieve the performance and scalability of specialised data-parallel processing frameworks. Our goal is to execute imperative Java programs in a data-parallel fashion with high throughput and low latency. This raises two challenges: how to support the arbitrary mutable state of Java programs without compromising scalability, and how to re cover that state after failure with low overhead. Our idea is to infer the dataflow and the types of state accesses from a Java program and use this information to generate a stateful dataflow graph (SDG) . By explicitly separating data from mutablestate, SDGs have specific features to enable this translation: to ensure scalability, distributed state can be partitioned across nodes if computation can occur entirely in parallel; if this is not possible, partial state gives nodes local instances for independent computation, which are reconciled according to application semantics. For fault tolerance, large inmemory state is checkpointed asynchronously without global coordination. We show that the performance of SDGs for several imperative online applications matches that of existing data-parallel processing frameworks

    Scalable and Fault-tolerant Stateful Stream Processing.

    Get PDF
    As users of "big data" applications expect fresh results, we witness a new breed of stream processing systems (SPS) that are designed to scale to large numbers of cloud-hosted machines. Such systems face new challenges: (i) to benefit from the "pay-as-you-go" model of cloud computing, they must scale out on demand, acquiring additional virtual machines (VMs) and parallelising operators when the workload increases; (ii) failures are common with deployments on hundreds of VMs—systems must be fault-tolerant with fast recovery times, yet low per-machine overheads. An open question is how to achieve these two goals when stream queries include stateful operators, which must be scaled out and recovered without affecting query results. Our key idea is to expose internal operator state explicitly to the SPS through a set of state management primitives. Based on them, we describe an integrated approach for dynamic scale out and recovery of stateful operators. Externalised operator state is checkpointed periodically by the SPS and backed up to upstream VMs. The SPS identifies individual operator bottlenecks and automatically scales them out by allocating new VMs and partitioning the checkpointed state. At any point, failed operators are recovered by restoring checkpointed state on a new VM and replaying unprocessed tuples. We evaluate this approach with the Linear Road Benchmark on the Amazon EC2 cloud platform and show that it can scale automatically to a load factor of L=350 with 50 VMs, while recovering quickly from failures

    Diurnal regulation of RNA polymerase III transcription is under the control of both the feeding-fasting response and the circadian clock.

    Get PDF
    RNA polymerase III (Pol III) synthesizes short noncoding RNAs, many of which are essential for translation. Accordingly, Pol III activity is tightly regulated with cell growth and proliferation by factors such as MYC, RB1, TRP53, and MAF1. MAF1 is a repressor of Pol III transcription whose activity is controlled by phosphorylation; in particular, it is inactivated through phosphorylation by the TORC1 kinase complex, a sensor of nutrient availability. Pol III regulation is thus sensitive to environmental cues, yet a diurnal profile of Pol III transcription activity is so far lacking. Here, we first use gene expression arrays to measure mRNA accumulation during the diurnal cycle in the livers of (1) wild-type mice, (2) arrhythmic javax.xml.bind.JAXBElement@59c2c50e knockout mice, (3) mice fed at regular intervals during both night and day, and (4) mice lacking the javax.xml.bind.JAXBElement@160cb27a gene, and so provide a comprehensive view of the changes in cyclic mRNA accumulation occurring in these different systems. We then show that Pol III occupancy of its target genes rises before the onset of the night, stays high during the night, when mice normally ingest food and when translation is known to be increased, and decreases in daytime. Whereas higher Pol III occupancy during the night reflects a MAF1-dependent response to feeding, the rise of Pol III occupancy before the onset of the night reflects a circadian clock-dependent response. Thus, Pol III transcription during the diurnal cycle is regulated both in response to nutrients and by the circadian clock, which allows anticipatory Pol III transcription

    Multiple imputations applied to the DREAM3 phosphoproteomics challenge: a winning strategy.

    Get PDF
    DREAM is an initiative that allows researchers to assess how well their methods or approaches can describe and predict networks of interacting molecules [1]. Each year, recently acquired datasets are released to predictors ahead of publication. Researchers typically have about three months to predict the masked data or network of interactions, using any predictive method. Predictions are assessed prior to an annual conference where the best predictions are unveiled and discussed. Here we present the strategy we used to make a winning prediction for the DREAM3 phosphoproteomics challenge. We used Amelia II, a multiple imputation software method developed by Gary King, James Honaker and Matthew Blackwell[2] in the context of social sciences to predict the 476 out of 4624 measurements that had been masked for the challenge. To chose the best possible multiple imputation parameters to apply for the challenge, we evaluated how transforming the data and varying the imputation parameters affected the ability to predict additionally masked data. We discuss the accuracy of our findings and show that multiple imputations applied to this dataset is a powerful method to accurately estimate the missing data. We postulate that multiple imputations methods might become an integral part of experimental design as a mean to achieve cost savings in experimental design or to increase the quantity of samples that could be handled for a given cost
    corecore