80,947 research outputs found

    A new innovative method for model efficiency performance

    Get PDF
    In every aspect of scientific research, model predictions need calibration and validation as their representativity of the record measurement. In the literature, there are a myriad of formulations, empirical expressions, algorithms and software for model efficiency assessment. In general, model predictions are curve fitting procedures with a set of assumptions that are not cared for sensitively in many studies, but only a single value comparison between the measurements and predictions is taken into consideration, and then the researcher makes the decision as for the model efficiency. Among the classical statistical efficiency formulations, the most widely used ones are bias (BI), mean square error (MSE), correlation coefficient (CC) and Nash-Sutcliffe efficiency (NSE) procedures all of which are embedded within the visual inspection and numerical analysis (VINAM) square graph as measurements versus predictions scatter diagram. The VINAM provides a set of verbal interpretations and then numerical improvements embracing all the previous statistical efficiency formulations. The fundamental criterion in the VINAM is 1:1 (45 degrees) main diagonal along which all visual, science philosophical, logical, rational and mathematical procedures boil down for model validation. The application of the VINAM approach is presented for artificial neural network (ANN) and adaptive network-based fuzzy inference system (ANFIS) model predictions

    Of `Cocktail Parties' and Exoplanets

    Full text link
    The characterisation of ever smaller and fainter extrasolar planets requires an intricate understanding of one's data and the analysis techniques used. Correcting the raw data at the 10^-4 level of accuracy in flux is one of the central challenges. This can be difficult for instruments that do not feature a calibration plan for such high precision measurements. Here, it is not always obvious how to de-correlate the data using auxiliary information of the instrument and it becomes paramount to know how well one can disentangle instrument systematics from one's data, given nothing but the data itself. We propose a non-parametric machine learning algorithm, based on the concept of independent component analysis, to de-convolve the systematic noise and all non-Gaussian signals from the desired astrophysical signal. Such a `blind' signal de-mixing is commonly known as the `Cocktail Party problem' in signal-processing. Given multiple simultaneous observations of the same exoplanetary eclipse, as in the case of spectrophotometry, we show that we can often disentangle systematic noise from the original light curve signal without the use of any complementary information of the instrument. In this paper, we explore these signal extraction techniques using simulated data and two data sets observed with the Hubble-NICMOS instrument. Another important application is the de-correlation of the exoplanetary signal from time-correlated stellar variability. Using data obtained by the Kepler mission we show that the desired signal can be de-convolved from the stellar noise using a single time series spanning several eclipse events. Such non-parametric techniques can provide important confirmations of the existent parametric corrections reported in the literature, and their associated results. Additionally they can substantially improve the precision exoplanetary light curve analysis in the future.Comment: ApJ accepte

    Accounting for Calibration Uncertainties in X-ray Analysis: Effective Areas in Spectral Fitting

    Full text link
    While considerable advance has been made to account for statistical uncertainties in astronomical analyses, systematic instrumental uncertainties have been generally ignored. This can be crucial to a proper interpretation of analysis results because instrumental calibration uncertainty is a form of systematic uncertainty. Ignoring it can underestimate error bars and introduce bias into the fitted values of model parameters. Accounting for such uncertainties currently requires extensive case-specific simulations if using existing analysis packages. Here we present general statistical methods that incorporate calibration uncertainties into spectral analysis of high-energy data. We first present a method based on multiple imputation that can be applied with any fitting method, but is necessarily approximate. We then describe a more exact Bayesian approach that works in conjunction with a Markov chain Monte Carlo based fitting. We explore methods for improving computational efficiency, and in particular detail a method of summarizing calibration uncertainties with a principal component analysis of samples of plausible calibration files. This method is implemented using recently codified Chandra effective area uncertainties for low-resolution spectral analysis and is verified using both simulated and actual Chandra data. Our procedure for incorporating effective area uncertainty is easily generalized to other types of calibration uncertainties.Comment: 61 pages double spaced, 8 figures, accepted for publication in Ap

    A generalization of moderated statistics to data adaptive semiparametric estimation in high-dimensional biology

    Full text link
    The widespread availability of high-dimensional biological data has made the simultaneous screening of numerous biological characteristics a central statistical problem in computational biology. While the dimensionality of such datasets continues to increase, the problem of teasing out the effects of biomarkers in studies measuring baseline confounders while avoiding model misspecification remains only partially addressed. Efficient estimators constructed from data adaptive estimates of the data-generating distribution provide an avenue for avoiding model misspecification; however, in the context of high-dimensional problems requiring simultaneous estimation of numerous parameters, standard variance estimators have proven unstable, resulting in unreliable Type-I error control under standard multiple testing corrections. We present the formulation of a general approach for applying empirical Bayes shrinkage approaches to asymptotically linear estimators of parameters defined in the nonparametric model. The proposal applies existing shrinkage estimators to the estimated variance of the influence function, allowing for increased inferential stability in high-dimensional settings. A methodology for nonparametric variable importance analysis for use with high-dimensional biological datasets with modest sample sizes is introduced and the proposed technique is demonstrated to be robust in small samples even when relying on data adaptive estimators that eschew parametric forms. Use of the proposed variance moderation strategy in constructing stabilized variable importance measures of biomarkers is demonstrated by application to an observational study of occupational exposure. The result is a data adaptive approach for robustly uncovering stable associations in high-dimensional data with limited sample sizes

    On the complexity of curve fitting algorithms

    Get PDF
    We study a popular algorithm for fitting polynomial curves to scattered data based on the least squares with gradient weights. We show that sometimes this algorithm admits a substantial reduction of complexity, and, furthermore, find precise conditions under which this is possible. It turns out that this is, indeed, possible when one fits circles but not ellipses or hyperbolas.Comment: 8 pages, no figure

    The Sloan Digital Sky Survey Quasar Lens Search. I. Candidate Selection Algorithm

    Get PDF
    We present an algorithm for selecting an uniform sample of gravitationally lensed quasar candidates from low-redshift (0.6<z<2.2) quasars brighter than i=19.1 that have been spectroscopically identified in the SDSS. Our algorithm uses morphological and color selections that are intended to identify small- and large-separation lenses, respectively. Our selection algorithm only relies on parameters that the SDSS standard image processing pipeline generates, allowing easy and fast selection of lens candidates. The algorithm has been tested against simulated SDSS images, which adopt distributions of field and quasar parameters taken from the real SDSS data as input. Furthermore, we take differential reddening into account. We find that our selection algorithm is almost complete down to separations of 1'' and flux ratios of 10^-0.5. The algorithm selects both double and quadruple lenses. At a separation of 2'', doubles and quads are selected with similar completeness, and above (below) 2'' the selection of quads is better (worse) than for doubles. Our morphological selection identifies a non-negligible fraction of single quasars: To remove these we fit images of candidates with a model of two point sources and reject those with unusually small image separations and/or large magnitude differences between the two point sources. We estimate the efficiency of our selection algorithm to be at least 8% at image separations smaller than 2'', comparable to that of radio surveys. The efficiency declines as the image separation increases, because of larger contamination from stars. We also present the magnification factor of lensed images as a function of the image separation, which is needed for accurate computation of magnification bias.Comment: 15 pages, 17 figures, 4 tables, accepted for publication in A
    corecore