186,418 research outputs found
Photometric redshifts with Quasi Newton Algorithm (MLPQNA). Results in the PHAT1 contest
Context. Since the advent of modern multiband digital sky surveys,
photometric redshifts (photo-z's) have become relevant if not crucial to many
fields of observational cosmology, from the characterization of cosmic
structures, to weak and strong lensing. Aims. We describe an application to an
astrophysical context, namely the evaluation of photometric redshifts, of
MLPQNA, a machine learning method based on Quasi Newton Algorithm. Methods.
Theoretical methods for photo-z's evaluation are based on the interpolation of
a priori knowledge (spectroscopic redshifts or SED templates) and represent an
ideal comparison ground for neural networks based methods. The MultiLayer
Perceptron with Quasi Newton learning rule (MLPQNA) described here is a
computing effective implementation of Neural Networks for the first time
exploited to solve regression problems in the astrophysical context and is
offered to the community through the DAMEWARE (DAta Mining & ExplorationWeb
Application REsource) infrastructure. Results. The PHAT contest (Hildebrandt et
al. 2010) provides a standard dataset to test old and new methods for
photometric redshift evaluation and with a set of statistical indicators which
allow a straightforward comparison among different methods. The MLPQNA model
has been applied on the whole PHAT1 dataset of 1984 objects after an
optimization of the model performed by using as training set the 515 available
spectroscopic redshifts. When applied to the PHAT1 dataset, MLPQNA obtains the
best bias accuracy (0.0006) and very competitive accuracies in terms of scatter
(0.056) and outlier percentage (16.3%), scoring as the second most effective
empirical method among those which have so far participated to the contest.
MLPQNA shows better generalization capabilities than most other empirical
methods especially in presence of underpopulated regions of the Knowledge Base.Comment: Accepted for publication in Astronomy & Astrophysics; 9 pages, 2
figure
An empirical learning-based validation procedure for simulation workflow
Simulation workflow is a top-level model for the design and control of
simulation process. It connects multiple simulation components with time and
interaction restrictions to form a complete simulation system. Before the
construction and evaluation of the component models, the validation of
upper-layer simulation workflow is of the most importance in a simulation
system. However, the methods especially for validating simulation workflow is
very limit. Many of the existing validation techniques are domain-dependent
with cumbersome questionnaire design and expert scoring. Therefore, this paper
present an empirical learning-based validation procedure to implement a
semi-automated evaluation for simulation workflow. First, representative
features of general simulation workflow and their relations with validation
indices are proposed. The calculation process of workflow credibility based on
Analytic Hierarchy Process (AHP) is then introduced. In order to make full use
of the historical data and implement more efficient validation, four learning
algorithms, including back propagation neural network (BPNN), extreme learning
machine (ELM), evolving new-neuron (eNFN) and fast incremental gaussian mixture
model (FIGMN), are introduced for constructing the empirical relation between
the workflow credibility and its features. A case study on a landing-process
simulation workflow is established to test the feasibility of the proposed
procedure. The experimental results also provide some useful overview of the
state-of-the-art learning algorithms on the credibility evaluation of
simulation models
A Call to Reflect on Evaluation Practices for Failure Detection in Image Classification
Reliable application of machine learning-based decision systems in the wild
is one of the major challenges currently investigated by the field. A large
portion of established approaches aims to detect erroneous predictions by means
of assigning confidence scores. This confidence may be obtained by either
quantifying the model's predictive uncertainty, learning explicit scoring
functions, or assessing whether the input is in line with the training
distribution. Curiously, while these approaches all state to address the same
eventual goal of detecting failures of a classifier upon real-life application,
they currently constitute largely separated research fields with individual
evaluation protocols, which either exclude a substantial part of relevant
methods or ignore large parts of relevant failure sources. In this work, we
systematically reveal current pitfalls caused by these inconsistencies and
derive requirements for a holistic and realistic evaluation of failure
detection. To demonstrate the relevance of this unified perspective, we present
a large-scale empirical study for the first time enabling benchmarking
confidence scoring functions w.r.t all relevant methods and failure sources.
The revelation of a simple softmax response baseline as the overall best
performing method underlines the drastic shortcomings of current evaluation in
the abundance of publicized research on confidence scoring. Code and trained
models are at https://github.com/IML-DKFZ/fd-shifts
Empowering Many, Biasing a Few: Generalist Credit Scoring through Large Language Models
In the financial industry, credit scoring is a fundamental element, shaping
access to credit and determining the terms of loans for individuals and
businesses alike. Traditional credit scoring methods, however, often grapple
with challenges such as narrow knowledge scope and isolated evaluation of
credit tasks. Our work posits that Large Language Models (LLMs) have great
potential for credit scoring tasks, with strong generalization ability across
multiple tasks. To systematically explore LLMs for credit scoring, we propose
the first open-source comprehensive framework. We curate a novel benchmark
covering 9 datasets with 14K samples, tailored for credit assessment and a
critical examination of potential biases within LLMs, and the novel instruction
tuning data with over 45k samples. We then propose the first Credit and Risk
Assessment Large Language Model (CALM) by instruction tuning, tailored to the
nuanced demands of various financial risk assessment tasks. We evaluate CALM,
and existing state-of-art (SOTA) open source and close source LLMs on the build
benchmark. Our empirical results illuminate the capability of LLMs to not only
match but surpass conventional models, pointing towards a future where credit
scoring can be more inclusive, comprehensive, and unbiased. We contribute to
the industry's transformation by sharing our pioneering instruction-tuning
datasets, credit and risk assessment LLM, and benchmarks with the research
community and the financial industry
Confidence Bands for ROC Curves: Methods and an Empirical Study
In this paper we study techniques for generating
and evaluating confidence bands on ROC curves. ROC
curve evaluation is rapidly becoming a commonly used evaluation
metric in machine learning, although evaluating ROC
curves has thus far been limited to studying the area under
the curve (AUC) or generation of one-dimensional confidence
intervals by freezing one variable—the false-positive rate, or
threshold on the classification scoring function. Researchers in
the medical field have long been using ROC curves and have
many well-studied methods for analyzing such curves, including
generating confidence intervals as well as simultaneous
confidence bands. In this paper we introduce these techniques
to the machine learning community and show their empirical
fitness on the Covertype data set—a standard machine learning
benchmark from the UCI repository. We show how some
of these methods work remarkably well, others are too loose,
and that existing machine learning methods for generation
of 1-dimensional confidence intervals do not translate well to
generation of simultanous bands—their bands are too tight.NYU, Stern School of Business, IOMS Department, Center for Digital Economy Researc
Orientation-dependent backbone-only residue pair scoring functions for fixed backbone protein design
<p>Abstract</p> <p>Background</p> <p>Empirical scoring functions have proven useful in protein structure modeling. Most such scoring functions depend on protein side chain conformations. However, backbone-only scoring functions do not require computationally intensive structure optimization and so are well suited to protein design, which requires fast score evaluation. Furthermore, scoring functions that account for the distinctive relative position and orientation preferences of residue pairs are expected to be more accurate than those that depend only on the separation distance.</p> <p>Results</p> <p>Residue pair scoring functions for fixed backbone protein design were derived using only backbone geometry. Unlike previous studies that used spherical harmonics to fit 2D angular distributions, Gaussian Mixture Models were used to fit the full 3D (position only) and 6D (position and orientation) distributions of residue pairs. The performance of the 1D (residue separation only), 3D, and 6D scoring functions were compared by their ability to identify correct threading solutions for a non-redundant benchmark set of protein backbone structures. The threading accuracy was found to steadily increase with increasing dimension, with the 6D scoring function achieving the highest accuracy. Furthermore, the 3D and 6D scoring functions were shown to outperform side chain-dependent empirical potentials from three other studies. Next, two computational methods that take advantage of the speed and pairwise form of these new backbone-only scoring functions were investigated. The first is a procedure that exploits available sequence data by averaging scores over threading solutions for homologs. This was evaluated by applying it to the challenging problem of identifying interacting transmembrane alpha-helices and found to further improve prediction accuracy. The second is a protein design method for determining the optimal sequence for a backbone structure by applying Belief Propagation optimization using the 6D scoring functions. The sensitivity of this method to backbone structure perturbations was compared with that of fixed-backbone all-atom modeling by determining the similarities between optimal sequences for two different backbone structures within the same protein family. The results showed that the design method using 6D scoring functions was more robust to small variations in backbone structure than the all-atom design method.</p> <p>Conclusions</p> <p>Backbone-only residue pair scoring functions that account for all six relative degrees of freedom are the most accurate and including the scores of homologs further improves the accuracy in threading applications. The 6D scoring function outperformed several side chain-dependent potentials while avoiding time-consuming and error prone side chain structure prediction. These scoring functions are particularly useful as an initial filter in protein design problems before applying all-atom modeling.</p
Confidence Bands for Roc Curves
In this paper we study techniques for generating and evaluating
confidence bands on ROC curves. ROC curve evaluation is
rapidly becoming a commonly used evaluation metric in machine
learning, although evaluating ROC curves has thus far been limited
to studying the area under the curve (AUC) or generation of
one-dimensional confidence intervals by freezing one variableâ
the false-positive rate, or threshold on the classification scoring
function. Researchers in the medical field have long been using
ROC curves and have many well-studied methods for analyzing
such curves, including generating confidence intervals as
well as simultaneous confidence bands. In this paper we introduce
these techniques to the machine learning community and
show their empirical fitness on the Covertype data setâa standard
machine learning benchmark from the UCI repository. We
show how some of these methods work remarkably well, others
are too loose, and that existing machine learning methods for generation
of 1-dimensional confidence intervals do not translate well
to generation of simultaneous bandsâtheir bands are too tight.Information Systems Working Papers Serie
- …