129 research outputs found

    Heuristics for periodical batch job scheduling in a MapReduce computing framework

    Full text link
    Task scheduling has a significant impact on the performance of the MapReduce computing framework. In this paper, a scheduling problem of periodical batch jobs with makespan minimization is considered. The problem is modeled as a general two-stage hybrid flow shop scheduling problem with schedule-dependent setup times. The new model incorporates the data locality of tasks and is formulated as an integer program. Three heuristics are developed to solve the problem and an improvement policy based on data locality is presented to enhance the methods. A lower bound of the makespan is derived. 150 instances are randomly generated from data distributions drawn from a real cluster. The parameters involved in the methods are set according to different cluster setups. The proposed heuristics are compared over different numbers of jobs and cluster setups. Computational results show that the performance of the methods is highly dependent on both the number of jobs and the cluster setups. The proposed improvement policy is effective and the impact of the input data distribution on the policy is analyzed and tested.This work is supported by the National Natural Science Foundation of China (No. 61272377) and the Specialized Research Fund for the Doctoral Program of Higher Education (No. 20120092110027). Ruben Ruiz is partially supported by the Spanish Ministry of Economy and Competitiveness, under the project "RESULT - Realistic Extended Scheduling Using Light Techniques" (No. DPI2012-36243-C02-01) partially financed with FEDER funds.Xiaoping Li; Tianze Jiang; Ruiz García, R. (2016). Heuristics for periodical batch job scheduling in a MapReduce computing framework. Information Sciences. 326:119-133. https://doi.org/10.1016/j.ins.2015.07.040S11913332

    Kernel-Based Tests for Likelihood-Free Hypothesis Testing

    Full text link
    Given nn observations from two balanced classes, consider the task of labeling an additional mm inputs that are known to all belong to \emph{one} of the two classes. Special cases of this problem are well-known: with complete knowledge of class distributions (n=n=\infty) the problem is solved optimally by the likelihood-ratio test; when m=1m=1 it corresponds to binary classification; and when mnm\approx n it is equivalent to two-sample testing. The intermediate settings occur in the field of likelihood-free inference, where labeled samples are obtained by running forward simulations and the unlabeled sample is collected experimentally. In recent work it was discovered that there is a fundamental trade-off between mm and nn: increasing the data sample mm reduces the amount nn of training/simulation data needed. In this work we (a) introduce a generalization where unlabeled samples come from a mixture of the two classes -- a case often encountered in practice; (b) study the minimax sample complexity for non-parametric classes of densities under \textit{maximum mean discrepancy} (MMD) separation; and (c) investigate the empirical performance of kernels parameterized by neural networks on two tasks: detection of the Higgs boson and detection of planted DDPM generated images amidst CIFAR-10 images. For both problems we confirm the existence of the theoretically predicted asymmetric mm vs nn trade-off.Comment: 36 pages, 6 figure

    Density estimation using the perceptron

    Full text link
    We propose a new density estimation algorithm. Given nn i.i.d. samples from a distribution belonging to a class of densities on Rd\mathbb{R}^d, our estimator outputs any density in the class whose ''perceptron discrepancy'' with the empirical distribution is at most O(d/n)O(\sqrt{d/n}). The perceptron discrepancy between two distributions is defined as the largest difference in mass that they place on any halfspace of Rd\mathbb{R}^d. It is shown that this estimator achieves expected total variation distance to the truth that is almost minimax optimal over the class of densities with bounded Sobolev norm and Gaussian mixtures. This suggests that regularity of the prior distribution could be an explanation for the efficiency of the ubiquitous step in machine learning that replaces optimization over large function spaces with simpler parametric classes (e.g. in the discriminators of GANs). We generalize the above to show that replacing the ''perceptron discrepancy'' with the generalized energy distance of Sz\'ekeley-Rizzo further improves total variation loss. The generalized energy distance between empirical distributions is easily computable and differentiable, thus making it especially useful for fitting generative models. To the best of our knowledge, it is the first example of a distance with such properties for which there are minimax statistical guarantees.Comment: 47 page

    XRL-Bench: A Benchmark for Evaluating and Comparing Explainable Reinforcement Learning Techniques

    Full text link
    Reinforcement Learning (RL) has demonstrated substantial potential across diverse fields, yet understanding its decision-making process, especially in real-world scenarios where rationality and safety are paramount, is an ongoing challenge. This paper delves in to Explainable RL (XRL), a subfield of Explainable AI (XAI) aimed at unravelling the complexities of RL models. Our focus rests on state-explaining techniques, a crucial subset within XRL methods, as they reveal the underlying factors influencing an agent's actions at any given time. Despite their significant role, the lack of a unified evaluation framework hinders assessment of their accuracy and effectiveness. To address this, we introduce XRL-Bench, a unified standardized benchmark tailored for the evaluation and comparison of XRL methods, encompassing three main modules: standard RL environments, explainers based on state importance, and standard evaluators. XRL-Bench supports both tabular and image data for state explanation. We also propose TabularSHAP, an innovative and competitive XRL method. We demonstrate the practical utility of TabularSHAP in real-world online gaming services and offer an open-source benchmark platform for the straightforward implementation and evaluation of XRL methods. Our contributions facilitate the continued progression of XRL technology.Comment: 10 pages, 5 figure

    TopMSV: A Web-Based Tool for Top-Down Mass Spectrometry Data Visualization

    No full text
    Top-down mass spectrometry (MS) investigates intact proteoforms for proteoform identification, characterization, and quantification. Data visualization plays an essential role in top-down MS data analysis because proteoform identification and characterization often involve manual data inspection to determine the molecular masses of highly charged ions and validate unexpected alterations in identified proteoforms. While many software tools have been developed for MS data visualization, there is still a lack of web-based visualization software designed for top-down MS. Here, we present TopMSV, a web-based tool for top-down MS data processing and visualization. TopMSV provides interactive views of top-down MS data using a web browser. It integrates software tools for spectral deconvolution and proteoform identification and uses analysis results of the tools to annotate top-down MS data

    Roles of thermal energy storage technology for carbon neutrality

    No full text
    Abstract In order to achieve global carbon neutrality in the middle of the 21st century, efficient utilization of fossil fuels is highly desired in diverse energy utilization sectors such as industry, transportation, building as well as life science. In the energy utilization infrastructure, about 75% of the fossil fuel consumption is used to provide and maintain heat, leading to more than 60% waste heat of the input energy discharging to the environment. Types of low-grade waste heat recovery technologies are developed to increase the energy efficiency. However, due to the spatial and temporal mismatch between the need and supply of the thermal energy, much of the waste thermal energy is difficult to be recovered. Thermal energy storage (TES) technologies in the forms of sensible, latent and thermochemical heat storage are developed for relieving the mismatched energy supply and demand. Diverse TES systems are developed in recent years with the superior features of large density, long-term, durable and low-cost. These technologies are vital in efficient utilization of low-grade waste heat and expected for building a low or zero carbon emission society. This paper reviews the thermal storage technologies for low carbon power generation, low carbon transportation, low carbon building as well as low carbon life science, in addition, carbon capture, utilization, and storage are also considered for carbon emission reduction. The conclusion and perspective are raised after discussing the specific technologies. This study is expected to provide a reference for the TES technologies in achieving zero-carbon future

    Characterization of a Mass-Produced SiPM at Liquid Nitrogen Temperature for CsI Neutrino Coherent Detectors

    No full text
    Silicon Photomultiplier (SiPM) is a sensor that can detect low-light signals lower than the single-photon level. In order to study the properties of neutrinos at a low detection threshold and low radioactivity experimental background, a low-temperature CsI neutrino coherent scattering detector is designed to be read by the SiPM sensor. Less thermal noise of SiPM and more light yield of CsI crystals can be obtained at the working temperature of liquid nitrogen. The breakdown voltage (Vbd) and dark count rate (DCR) of SiPM at liquid nitrogen temperature are two key parameters for coherent scattering detection. In this paper, a low-temperature test is conducted on the mass-produced ON Semiconductor J-Series SiPM. We design a cryogenic system for cooling SiPM at liquid nitrogen temperature and the changes of operating voltage and dark noise from room to liquid nitrogen temperature are measured in detail. The results show that SiPM works at the liquid nitrogen temperature, and the dark count rate drops by six orders of magnitude from room temperature (120 kHz/mm2) to liquid nitrogen temperature (0.1 Hz/mm2)

    Petroleum Geology and Exploration of Deep-Seated Volcanic Condensate Gas Reservoir around the Penyijingxi Sag in the Junggar Basin

    No full text
    Many types of volcanic rock oil and gas reservoirs have been found in China, showing great petroleum exploration potential. Volcanic reservoir also is one of the key fields of exploration in the Junggar Basin and mainly concentrated in the middle and shallow layers, while the deep volcanic rock and natural gas fields have not been broken through. Based on comprehensive analysis of core observation, single well analysis, reservoir description, source rocks evaluation, combined with seismic data and time-frequency electromagnetic technology, multiple volcanic rock exploration targets were identified, and industrial oil and gas flow was obtained in the well SX 16 of the Penyijingxi Sag, western Junggar Basin. It is believed that the deep Permian source rocks have relatively higher natural gas generation potential and volcanic breccia usually have large reservoir space. And the mudstone of the Upper Wuerhe Formation played as the role of caprock. The success of exploration well SX16 has achieved a major breakthrough in natural gas exploration in the Penyijingxi Sag, which has essential guiding significance for the exploration of deep volcanic rocks and large-scale gas exploration in the Junggar Basin
    corecore