217 research outputs found
Automated data pre-processing via meta-learning
The final publication is available at link.springer.comA data mining algorithm may perform differently on datasets with different characteristics, e.g., it might perform better on a dataset with continuous attributes rather than with categorical attributes, or the other way around.
As a matter of fact, a dataset usually needs to be pre-processed. Taking into account all the possible pre-processing operators, there exists a staggeringly large number of alternatives and nonexperienced users become overwhelmed.
We show that this problem can be addressed by an automated approach, leveraging ideas from metalearning.
Specifically, we consider a wide range of data pre-processing techniques and a set of data mining algorithms. For each data mining algorithm and selected dataset, we are able to predict the transformations that improve the result
of the algorithm on the respective dataset. Our approach will help non-expert users to more effectively identify the transformations appropriate to their applications, and hence to achieve improved results.Peer ReviewedPostprint (published version
Numerical Implementation of lepton-nucleus interactions and its effect on neutrino oscillation analysis
We discuss the implementation of the nuclear model based on realistic nuclear
spectral functions in the GENIE neutrino interaction generator. Besides
improving on the Fermi gas description of the nuclear ground state, our scheme
involves a new prescription for selection, meant to efficiently enforce
energy momentum conservation. The results of our simulations, validated through
comparison to electron scattering data, have been obtained for a variety of
target nuclei, ranging from carbon to argon, and cover the kinematical region
in which quasi elastic scattering is the dominant reaction mechanism. We also
analyse the influence of the adopted nuclear model on the determination of
neutrino oscillation parameters.Comment: 19 pages, 35 figures, version accepted by Phys. Rev.
Determining appropriate approaches for using data in feature selection
Feature selection is increasingly important in data analysis and machine learning in big data era. However, how to use the data in feature selection, i.e. using either ALL or PART of a dataset, has become a serious and tricky issue. Whilst the conventional practice of using all the data in feature selection may lead to selection bias, using part of the data may, on the other hand, lead to underestimating the relevant features under some conditions. This paper investigates these two strategies systematically in terms of reliability and effectiveness, and then determines their suitability for datasets with different characteristics. The reliability is measured by the Average Tanimoto Index and the Inter-method Average Tanimoto Index, and the effectiveness is measured by the mean generalisation accuracy of classification. The computational experiments are carried out on ten real-world benchmark datasets and fourteen synthetic datasets. The synthetic datasets are generated with a pre-set number of relevant features and varied numbers of irrelevant features and instances, and added with different levels of noise. The results indicate that the PART approach is more effective in reducing the bias when the size of a dataset is small but starts to lose its advantage as the dataset size increases
Development of a quality assurance process for the SoLid experiment
The SoLid experiment has been designed to search for an oscillation pattern induced by a light sterile neutrino state, utilising the BR2 reactor of SCK circle CEN, in Belgium.
The detector leverages a new hybrid technology, utilising two distinct scintillators in a cubic array, creating a highly segmented detector volume. A combination of 5 cm cubic polyvinyltoluene cells, with (LiF)-Li-6:ZnS(Ag) sheets on two faces of each cube, facilitate reconstruction of the neutrino signals. Whilst the high granularity provides a powerful toolset to discriminate backgrounds; by itself the segmentation also represents a challenge in terms of homogeneity and calibration, for a consistent detector response. The search for this light sterile neutrino implies a sensitivity to distortions of around O(10)% in the energy spectrum of reactor (v) over bare. Hence, a very good neutron detection efficiency, light yield and homogeneous detector response are critical for data validation. The minimal requirements for the SoLid physics program are a light yield and a neutron detection efficiency larger than 40 PA/MeV/cube and 50% respectively. In order to guarantee these minimal requirements, the collaboration developed a rigorous quality assurance process for all 12800 cubic cells of the detector. To carry out the quality assurance process, an automated calibration system called CALIPSO was designed and constructed. CALIPSO provides precise, automatic placement of radioactive sources in front of each cube of a given detector plane (16 x 16 cubes). A combination of Na-22, Cf-252 and AmBe gamma and neutron sources were used by CALIPSO during the quality assurance process. Initially, the scanning identified defective components allowing for repair during initial construction of the SoLid detector. Secondly, a full analysis of the calibration data revealed initial estimations for the light yield of over 60 PA/MeV and neutron reconstruction efficiency of 68%, validating the SoLid physics requirements
Algebraic Comparison of Partial Lists in Bioinformatics
The outcome of a functional genomics pipeline is usually a partial list of
genomic features, ranked by their relevance in modelling biological phenotype
in terms of a classification or regression model. Due to resampling protocols
or just within a meta-analysis comparison, instead of one list it is often the
case that sets of alternative feature lists (possibly of different lengths) are
obtained. Here we introduce a method, based on the algebraic theory of
symmetric groups, for studying the variability between lists ("list stability")
in the case of lists of unequal length. We provide algorithms evaluating
stability for lists embedded in the full feature set or just limited to the
features occurring in the partial lists. The method is demonstrated first on
synthetic data in a gene filtering task and then for finding gene profiles on a
recent prostate cancer dataset
Determination of muon momentum in the MicroBooNE LArTPC using an improved model of multiple Coulomb scattering
We discuss a technique for measuring a charged particle's momentum by means
of multiple Coulomb scattering (MCS) in the MicroBooNE liquid argon time
projection chamber (LArTPC). This method does not require the full particle
ionization track to be contained inside of the detector volume as other track
momentum reconstruction methods do (range-based momentum reconstruction and
calorimetric momentum reconstruction). We motivate use of this technique,
describe a tuning of the underlying phenomenological formula, quantify its
performance on fully contained beam-neutrino-induced muon tracks both in
simulation and in data, and quantify its performance on exiting muon tracks in
simulation. Using simulation, we have shown that the standard Highland formula
should be re-tuned specifically for scattering in liquid argon, which
significantly improves the bias and resolution of the momentum measurement.
With the tuned formula, we find agreement between data and simulation for
contained tracks, with a small bias in the momentum reconstruction and with
resolutions that vary as a function of track length, improving from about 10%
for the shortest (one meter long) tracks to 5% for longer (several meter)
tracks. For simulated exiting muons with at least one meter of track contained,
we find a similarly small bias, and a resolution which is less than 15% for
muons with momentum below 2 GeV/c. Above 2 GeV/c, results are given as a first
estimate of the MCS momentum measurement capabilities of MicroBooNE for high
momentum exiting tracks
Long-Baseline Neutrino Facility (LBNF) and Deep Underground Neutrino Experiment (DUNE) Conceptual Design Report Volume 2: The Physics Program for DUNE at LBNF
The Physics Program for the Deep Underground Neutrino Experiment (DUNE) at
the Fermilab Long-Baseline Neutrino Facility (LBNF) is described
Conditional Neural Relational Inference for Interacting Systems
In this work, we want to learn to model the dynamics of similar yet distinct
groups of interacting objects. These groups follow some common physical laws
that exhibit specificities that are captured through some vectorial
description. We develop a model that allows us to do conditional generation
from any such group given its vectorial description. Unlike previous work on
learning dynamical systems that can only do trajectory completion and require a
part of the trajectory dynamics to be provided as input in generation time, we
do generation using only the conditioning vector with no access to generation
time's trajectories. We evaluate our model in the setting of modeling human
gait and, in particular pathological human gait
A Proposal for a Three Detector Short-Baseline Neutrino Oscillation Program in the Fermilab Booster Neutrino Beam
A Short-Baseline Neutrino (SBN) physics program of three LAr-TPC detectors
located along the Booster Neutrino Beam (BNB) at Fermilab is presented. This
new SBN Program will deliver a rich and compelling physics opportunity,
including the ability to resolve a class of experimental anomalies in neutrino
physics and to perform the most sensitive search to date for sterile neutrinos
at the eV mass-scale through both appearance and disappearance oscillation
channels. Using data sets of 6.6e20 protons on target (P.O.T.) in the LAr1-ND
and ICARUS T600 detectors plus 13.2e20 P.O.T. in the MicroBooNE detector, we
estimate that a search for muon neutrino to electron neutrino appearance can be
performed with ~5 sigma sensitivity for the LSND allowed (99% C.L.) parameter
region. In this proposal for the SBN Program, we describe the physics analysis,
the conceptual design of the LAr1-ND detector, the design and refurbishment of
the T600 detector, the necessary infrastructure required to execute the
program, and a possible reconfiguration of the BNB target and horn system to
improve its performance for oscillation searches.Comment: 209 pages, 129 figure
Indication for the disappearance of reactor electron antineutrinos in the Double Chooz experiment
The Double Chooz Experiment presents an indication of reactor electron
antineutrino disappearance consistent with neutrino oscillations. A ratio of
0.944 0.016 (stat) 0.040 (syst) observed to predicted events was
obtained in 101 days of running at the Chooz Nuclear Power Plant in France,
with two 4.25 GW reactors. The results were obtained from a single 10
m fiducial volume detector located 1050 m from the two reactor cores. The
reactor antineutrino flux prediction used the Bugey4 measurement as an anchor
point. The deficit can be interpreted as an indication of a non-zero value of
the still unmeasured neutrino mixing parameter \sang. Analyzing both the rate
of the prompt positrons and their energy spectrum we find \sang = 0.086
0.041 (stat) 0.030 (syst), or, at 90% CL, 0.015 \sang 0.16.Comment: 7 pages, 4 figures, (new version after PRL referee's comments
- …
