129 research outputs found
Heuristics for periodical batch job scheduling in a MapReduce computing framework
Task scheduling has a significant impact on the performance of the MapReduce computing
framework. In this paper, a scheduling problem of periodical batch jobs with makespan minimization
is considered. The problem is modeled as a general two-stage hybrid flow shop
scheduling problem with schedule-dependent setup times. The new model incorporates the
data locality of tasks and is formulated as an integer program. Three heuristics are developed
to solve the problem and an improvement policy based on data locality is presented to enhance
the methods. A lower bound of the makespan is derived. 150 instances are randomly
generated from data distributions drawn from a real cluster. The parameters involved in the
methods are set according to different cluster setups. The proposed heuristics are compared
over different numbers of jobs and cluster setups. Computational results show that the performance
of the methods is highly dependent on both the number of jobs and the cluster setups.
The proposed improvement policy is effective and the impact of the input data distribution on
the policy is analyzed and tested.This work is supported by the National Natural Science Foundation of China (No. 61272377) and the Specialized Research Fund for the Doctoral Program of Higher Education (No. 20120092110027). Ruben Ruiz is partially supported by the Spanish Ministry of Economy and Competitiveness, under the project "RESULT - Realistic Extended Scheduling Using Light Techniques" (No. DPI2012-36243-C02-01) partially financed with FEDER funds.Xiaoping Li; Tianze Jiang; Ruiz García, R. (2016). Heuristics for periodical batch job scheduling in a MapReduce computing framework. Information Sciences. 326:119-133. https://doi.org/10.1016/j.ins.2015.07.040S11913332
Kernel-Based Tests for Likelihood-Free Hypothesis Testing
Given observations from two balanced classes, consider the task of
labeling an additional inputs that are known to all belong to \emph{one} of
the two classes. Special cases of this problem are well-known: with complete
knowledge of class distributions () the problem is solved optimally
by the likelihood-ratio test; when it corresponds to binary
classification; and when it is equivalent to two-sample testing.
The intermediate settings occur in the field of likelihood-free inference,
where labeled samples are obtained by running forward simulations and the
unlabeled sample is collected experimentally. In recent work it was discovered
that there is a fundamental trade-off between and : increasing the data
sample reduces the amount of training/simulation data needed. In this
work we (a) introduce a generalization where unlabeled samples come from a
mixture of the two classes -- a case often encountered in practice; (b) study
the minimax sample complexity for non-parametric classes of densities under
\textit{maximum mean discrepancy} (MMD) separation; and (c) investigate the
empirical performance of kernels parameterized by neural networks on two tasks:
detection of the Higgs boson and detection of planted DDPM generated images
amidst CIFAR-10 images. For both problems we confirm the existence of the
theoretically predicted asymmetric vs trade-off.Comment: 36 pages, 6 figure
Density estimation using the perceptron
We propose a new density estimation algorithm. Given i.i.d. samples from
a distribution belonging to a class of densities on , our
estimator outputs any density in the class whose ''perceptron discrepancy''
with the empirical distribution is at most . The perceptron
discrepancy between two distributions is defined as the largest difference in
mass that they place on any halfspace of . It is shown that this
estimator achieves expected total variation distance to the truth that is
almost minimax optimal over the class of densities with bounded Sobolev norm
and Gaussian mixtures. This suggests that regularity of the prior distribution
could be an explanation for the efficiency of the ubiquitous step in machine
learning that replaces optimization over large function spaces with simpler
parametric classes (e.g. in the discriminators of GANs).
We generalize the above to show that replacing the ''perceptron discrepancy''
with the generalized energy distance of Sz\'ekeley-Rizzo further improves total
variation loss. The generalized energy distance between empirical distributions
is easily computable and differentiable, thus making it especially useful for
fitting generative models. To the best of our knowledge, it is the first
example of a distance with such properties for which there are minimax
statistical guarantees.Comment: 47 page
XRL-Bench: A Benchmark for Evaluating and Comparing Explainable Reinforcement Learning Techniques
Reinforcement Learning (RL) has demonstrated substantial potential across
diverse fields, yet understanding its decision-making process, especially in
real-world scenarios where rationality and safety are paramount, is an ongoing
challenge. This paper delves in to Explainable RL (XRL), a subfield of
Explainable AI (XAI) aimed at unravelling the complexities of RL models. Our
focus rests on state-explaining techniques, a crucial subset within XRL
methods, as they reveal the underlying factors influencing an agent's actions
at any given time. Despite their significant role, the lack of a unified
evaluation framework hinders assessment of their accuracy and effectiveness. To
address this, we introduce XRL-Bench, a unified standardized benchmark tailored
for the evaluation and comparison of XRL methods, encompassing three main
modules: standard RL environments, explainers based on state importance, and
standard evaluators. XRL-Bench supports both tabular and image data for state
explanation. We also propose TabularSHAP, an innovative and competitive XRL
method. We demonstrate the practical utility of TabularSHAP in real-world
online gaming services and offer an open-source benchmark platform for the
straightforward implementation and evaluation of XRL methods. Our contributions
facilitate the continued progression of XRL technology.Comment: 10 pages, 5 figure
TopMSV: A Web-Based Tool for Top-Down Mass Spectrometry Data Visualization
Top-down mass spectrometry (MS) investigates intact proteoforms for proteoform identification, characterization, and quantification. Data visualization plays an essential role in top-down MS data analysis because proteoform identification and characterization often involve manual data inspection to determine the molecular masses of highly charged ions and validate unexpected alterations in identified proteoforms. While many software tools have been developed for MS data visualization, there is still a lack of web-based visualization software designed for top-down MS. Here, we present TopMSV, a web-based tool for top-down MS data processing and visualization. TopMSV provides interactive views of top-down MS data using a web browser. It integrates software tools for spectral deconvolution and proteoform identification and uses analysis results of the tools to annotate top-down MS data
Roles of thermal energy storage technology for carbon neutrality
Abstract In order to achieve global carbon neutrality in the middle of the 21st century, efficient utilization of fossil fuels is highly desired in diverse energy utilization sectors such as industry, transportation, building as well as life science. In the energy utilization infrastructure, about 75% of the fossil fuel consumption is used to provide and maintain heat, leading to more than 60% waste heat of the input energy discharging to the environment. Types of low-grade waste heat recovery technologies are developed to increase the energy efficiency. However, due to the spatial and temporal mismatch between the need and supply of the thermal energy, much of the waste thermal energy is difficult to be recovered. Thermal energy storage (TES) technologies in the forms of sensible, latent and thermochemical heat storage are developed for relieving the mismatched energy supply and demand. Diverse TES systems are developed in recent years with the superior features of large density, long-term, durable and low-cost. These technologies are vital in efficient utilization of low-grade waste heat and expected for building a low or zero carbon emission society. This paper reviews the thermal storage technologies for low carbon power generation, low carbon transportation, low carbon building as well as low carbon life science, in addition, carbon capture, utilization, and storage are also considered for carbon emission reduction. The conclusion and perspective are raised after discussing the specific technologies. This study is expected to provide a reference for the TES technologies in achieving zero-carbon future
Characterization of a Mass-Produced SiPM at Liquid Nitrogen Temperature for CsI Neutrino Coherent Detectors
Silicon Photomultiplier (SiPM) is a sensor that can detect low-light signals lower than the single-photon level. In order to study the properties of neutrinos at a low detection threshold and low radioactivity experimental background, a low-temperature CsI neutrino coherent scattering detector is designed to be read by the SiPM sensor. Less thermal noise of SiPM and more light yield of CsI crystals can be obtained at the working temperature of liquid nitrogen. The breakdown voltage (Vbd) and dark count rate (DCR) of SiPM at liquid nitrogen temperature are two key parameters for coherent scattering detection. In this paper, a low-temperature test is conducted on the mass-produced ON Semiconductor J-Series SiPM. We design a cryogenic system for cooling SiPM at liquid nitrogen temperature and the changes of operating voltage and dark noise from room to liquid nitrogen temperature are measured in detail. The results show that SiPM works at the liquid nitrogen temperature, and the dark count rate drops by six orders of magnitude from room temperature (120 kHz/mm2) to liquid nitrogen temperature (0.1 Hz/mm2)
Petroleum Geology and Exploration of Deep-Seated Volcanic Condensate Gas Reservoir around the Penyijingxi Sag in the Junggar Basin
Many types of volcanic rock oil and gas reservoirs have been found in China, showing great petroleum exploration potential. Volcanic reservoir also is one of the key fields of exploration in the Junggar Basin and mainly concentrated in the middle and shallow layers, while the deep volcanic rock and natural gas fields have not been broken through. Based on comprehensive analysis of core observation, single well analysis, reservoir description, source rocks evaluation, combined with seismic data and time-frequency electromagnetic technology, multiple volcanic rock exploration targets were identified, and industrial oil and gas flow was obtained in the well SX 16 of the Penyijingxi Sag, western Junggar Basin. It is believed that the deep Permian source rocks have relatively higher natural gas generation potential and volcanic breccia usually have large reservoir space. And the mudstone of the Upper Wuerhe Formation played as the role of caprock. The success of exploration well SX16 has achieved a major breakthrough in natural gas exploration in the Penyijingxi Sag, which has essential guiding significance for the exploration of deep volcanic rocks and large-scale gas exploration in the Junggar Basin
- …