80,947 research outputs found
A new innovative method for model efficiency performance
In every aspect of scientific research, model predictions need calibration and validation as their representativity of the record measurement. In the literature, there are a myriad of formulations, empirical expressions, algorithms and software for model efficiency assessment. In general, model predictions are curve fitting procedures with a set of assumptions that are not cared for sensitively in many studies, but only a single value comparison between the measurements and predictions is taken into consideration, and then the researcher makes the decision as for the model efficiency. Among the classical statistical efficiency formulations, the most widely used ones are bias (BI), mean square error (MSE), correlation coefficient (CC) and Nash-Sutcliffe efficiency (NSE) procedures all of which are embedded within the visual inspection and numerical analysis (VINAM) square graph as measurements versus predictions scatter diagram. The VINAM provides a set of verbal interpretations and then numerical improvements embracing all the previous statistical efficiency formulations. The fundamental criterion in the VINAM is 1:1 (45 degrees) main diagonal along which all visual, science philosophical, logical, rational and mathematical procedures boil down for model validation. The application of the VINAM approach is presented for artificial neural network (ANN) and adaptive network-based fuzzy inference system (ANFIS) model predictions
Of `Cocktail Parties' and Exoplanets
The characterisation of ever smaller and fainter extrasolar planets requires
an intricate understanding of one's data and the analysis techniques used.
Correcting the raw data at the 10^-4 level of accuracy in flux is one of the
central challenges. This can be difficult for instruments that do not feature a
calibration plan for such high precision measurements. Here, it is not always
obvious how to de-correlate the data using auxiliary information of the
instrument and it becomes paramount to know how well one can disentangle
instrument systematics from one's data, given nothing but the data itself. We
propose a non-parametric machine learning algorithm, based on the concept of
independent component analysis, to de-convolve the systematic noise and all
non-Gaussian signals from the desired astrophysical signal. Such a `blind'
signal de-mixing is commonly known as the `Cocktail Party problem' in
signal-processing. Given multiple simultaneous observations of the same
exoplanetary eclipse, as in the case of spectrophotometry, we show that we can
often disentangle systematic noise from the original light curve signal without
the use of any complementary information of the instrument. In this paper, we
explore these signal extraction techniques using simulated data and two data
sets observed with the Hubble-NICMOS instrument. Another important application
is the de-correlation of the exoplanetary signal from time-correlated stellar
variability. Using data obtained by the Kepler mission we show that the desired
signal can be de-convolved from the stellar noise using a single time series
spanning several eclipse events. Such non-parametric techniques can provide
important confirmations of the existent parametric corrections reported in the
literature, and their associated results. Additionally they can substantially
improve the precision exoplanetary light curve analysis in the future.Comment: ApJ accepte
Accounting for Calibration Uncertainties in X-ray Analysis: Effective Areas in Spectral Fitting
While considerable advance has been made to account for statistical
uncertainties in astronomical analyses, systematic instrumental uncertainties
have been generally ignored. This can be crucial to a proper interpretation of
analysis results because instrumental calibration uncertainty is a form of
systematic uncertainty. Ignoring it can underestimate error bars and introduce
bias into the fitted values of model parameters. Accounting for such
uncertainties currently requires extensive case-specific simulations if using
existing analysis packages. Here we present general statistical methods that
incorporate calibration uncertainties into spectral analysis of high-energy
data. We first present a method based on multiple imputation that can be
applied with any fitting method, but is necessarily approximate. We then
describe a more exact Bayesian approach that works in conjunction with a Markov
chain Monte Carlo based fitting. We explore methods for improving computational
efficiency, and in particular detail a method of summarizing calibration
uncertainties with a principal component analysis of samples of plausible
calibration files. This method is implemented using recently codified Chandra
effective area uncertainties for low-resolution spectral analysis and is
verified using both simulated and actual Chandra data. Our procedure for
incorporating effective area uncertainty is easily generalized to other types
of calibration uncertainties.Comment: 61 pages double spaced, 8 figures, accepted for publication in Ap
A generalization of moderated statistics to data adaptive semiparametric estimation in high-dimensional biology
The widespread availability of high-dimensional biological data has made the
simultaneous screening of numerous biological characteristics a central
statistical problem in computational biology. While the dimensionality of such
datasets continues to increase, the problem of teasing out the effects of
biomarkers in studies measuring baseline confounders while avoiding model
misspecification remains only partially addressed. Efficient estimators
constructed from data adaptive estimates of the data-generating distribution
provide an avenue for avoiding model misspecification; however, in the context
of high-dimensional problems requiring simultaneous estimation of numerous
parameters, standard variance estimators have proven unstable, resulting in
unreliable Type-I error control under standard multiple testing corrections. We
present the formulation of a general approach for applying empirical Bayes
shrinkage approaches to asymptotically linear estimators of parameters defined
in the nonparametric model. The proposal applies existing shrinkage estimators
to the estimated variance of the influence function, allowing for increased
inferential stability in high-dimensional settings. A methodology for
nonparametric variable importance analysis for use with high-dimensional
biological datasets with modest sample sizes is introduced and the proposed
technique is demonstrated to be robust in small samples even when relying on
data adaptive estimators that eschew parametric forms. Use of the proposed
variance moderation strategy in constructing stabilized variable importance
measures of biomarkers is demonstrated by application to an observational study
of occupational exposure. The result is a data adaptive approach for robustly
uncovering stable associations in high-dimensional data with limited sample
sizes
On the complexity of curve fitting algorithms
We study a popular algorithm for fitting polynomial curves to scattered data
based on the least squares with gradient weights. We show that sometimes this
algorithm admits a substantial reduction of complexity, and, furthermore, find
precise conditions under which this is possible. It turns out that this is,
indeed, possible when one fits circles but not ellipses or hyperbolas.Comment: 8 pages, no figure
The Sloan Digital Sky Survey Quasar Lens Search. I. Candidate Selection Algorithm
We present an algorithm for selecting an uniform sample of gravitationally
lensed quasar candidates from low-redshift (0.6<z<2.2) quasars brighter than
i=19.1 that have been spectroscopically identified in the SDSS. Our algorithm
uses morphological and color selections that are intended to identify small-
and large-separation lenses, respectively. Our selection algorithm only relies
on parameters that the SDSS standard image processing pipeline generates,
allowing easy and fast selection of lens candidates. The algorithm has been
tested against simulated SDSS images, which adopt distributions of field and
quasar parameters taken from the real SDSS data as input. Furthermore, we take
differential reddening into account. We find that our selection algorithm is
almost complete down to separations of 1'' and flux ratios of 10^-0.5. The
algorithm selects both double and quadruple lenses. At a separation of 2'',
doubles and quads are selected with similar completeness, and above (below) 2''
the selection of quads is better (worse) than for doubles. Our morphological
selection identifies a non-negligible fraction of single quasars: To remove
these we fit images of candidates with a model of two point sources and reject
those with unusually small image separations and/or large magnitude differences
between the two point sources. We estimate the efficiency of our selection
algorithm to be at least 8% at image separations smaller than 2'', comparable
to that of radio surveys. The efficiency declines as the image separation
increases, because of larger contamination from stars. We also present the
magnification factor of lensed images as a function of the image separation,
which is needed for accurate computation of magnification bias.Comment: 15 pages, 17 figures, 4 tables, accepted for publication in A
- …