482 research outputs found

    Potentiometric Selectivities of Ionophore-Doped Ion-Selective Membranes: Concurrent Presence of Primary Ion or Interfering Ion Complexes of Multiple Stoichiometries

    Get PDF
    The selectivities of ionophore-doped ion-selective electrode (ISE) membranes are controlled by the stability and stoichiometry of the complexes between the ionophore, L, and the target and interfering ions (Izi and Jzj, respectively). Well-accepted models predict how these selectivities can be optimized by selection of ideal ionophore-to-ionic site ratios, considering complex stoichiometries and ion charges. These models were developed for systems in which the target and interfering ions each form complexes of only one stoichiometry. However, for a few ISEs, the concurrent presence of two primary ion complexes of different stoichiometries, such as ILzi and IL2zi, was reported. Indeed, similar systems were probably often overlooked and are, in fact, more common than the exclusive formation of complexes of higher stoichiometry unless the ionophore is used in excess. Importantly, misinterpreted stoichiometries misguide the design of new ionophores and are likely to result in the formulation of ISE membranes with inferior selectivities. We show here that the presence of two or more complexes of different stoichiometries for a given ion may be inferred experimentally from careful interpretation of the potentiometric selectivities as a function of the ionophore-to-ionic site ratio or from calculations of complex concentrations using experimentally determined complex stabilities. Concurrent formation of JLzj and JL2zj complexes of an interfering ion is shown here to shift the ionophore-to-ionic site ratio that provides the highest selectivities. Formation of ILn–1zi and ILnzi complexes of a primary ion is less of a concern because an optimized membrane typically contains an excess of ionophore, but lower than expected selectivities may be observed if the stepwise complex formation constant, KILn, is not sufficiently large and the ionophore-to-ionic site ratio does not markedly exceed n

    Noisy Monte Carlo: Convergence of Markov chains with approximate transition kernels

    Get PDF
    Monte Carlo algorithms often aim to draw from a distribution π\pi by simulating a Markov chain with transition kernel PP such that π\pi is invariant under PP. However, there are many situations for which it is impractical or impossible to draw from the transition kernel PP. For instance, this is the case with massive datasets, where is it prohibitively expensive to calculate the likelihood and is also the case for intractable likelihood models arising from, for example, Gibbs random fields, such as those found in spatial statistics and network analysis. A natural approach in these cases is to replace PP by an approximation P^\hat{P}. Using theory from the stability of Markov chains we explore a variety of situations where it is possible to quantify how 'close' the chain given by the transition kernel P^\hat{P} is to the chain given by PP. We apply these results to several examples from spatial statistics and network analysis.Comment: This version: results extended to non-uniformly ergodic Markov chain

    On the combination of omics data for prediction of binary outcomes

    Full text link
    Enrichment of predictive models with new biomolecular markers is an important task in high-dimensional omic applications. Increasingly, clinical studies include several sets of such omics markers available for each patient, measuring different levels of biological variation. As a result, one of the main challenges in predictive research is the integration of different sources of omic biomarkers for the prediction of health traits. We review several approaches for the combination of omic markers in the context of binary outcome prediction, all based on double cross-validation and regularized regression models. We evaluate their performance in terms of calibration and discrimination and we compare their performance with respect to single-omic source predictions. We illustrate the methods through the analysis of two real datasets. On the one hand, we consider the combination of two fractions of proteomic mass spectrometry for the calibration of a diagnostic rule for the detection of early-stage breast cancer. On the other hand, we consider transcriptomics and metabolomics as predictors of obesity using data from the Dietary, Lifestyle, and Genetic determinants of Obesity and Metabolic syndrome (DILGOM) study, a population-based cohort, from Finland

    Selection of tuning parameters in bridge regression models via Bayesian information criterion

    Full text link
    We consider the bridge linear regression modeling, which can produce a sparse or non-sparse model. A crucial point in the model building process is the selection of adjusted parameters including a regularization parameter and a tuning parameter in bridge regression models. The choice of the adjusted parameters can be viewed as a model selection and evaluation problem. We propose a model selection criterion for evaluating bridge regression models in terms of Bayesian approach. This selection criterion enables us to select the adjusted parameters objectively. We investigate the effectiveness of our proposed modeling strategy through some numerical examples.Comment: 20 pages, 5 figure

    GLMMLasso: An Algorithm for High-Dimensional Generalized Linear Mixed Models Using L1-Penalization

    Full text link
    We propose an L1-penalized algorithm for fitting high-dimensional generalized linear mixed models. Generalized linear mixed models (GLMMs) can be viewed as an extension of generalized linear models for clustered observations. This Lasso-type approach for GLMMs should be mainly used as variable screening method to reduce the number of variables below the sample size. We then suggest a refitting by maximum likelihood based on the selected variables only. This is an effective correction to overcome problems stemming from the variable screening procedure which are more severe with GLMMs. We illustrate the performance of our algorithm on simulated as well as on real data examples. Supplemental materials are available online and the algorithm is implemented in the R package glmmixedlasso

    Conditional variable importance for random forests

    Get PDF
    Random forests are becoming increasingly popular in many scientific fields because they can cope with ``small n large p'' problems, complex interactions and even highly correlated predictor variables. Their variable importance measures have recently been suggested as screening tools for, e.g., gene expression studies. However, these variable importance measures show a bias towards correlated predictor variables. We identify two mechanisms responsible for this finding: (i) A preference for the selection of correlated predictors in the tree building process and (ii) an additional advantage for correlated predictor variables induced by the unconditional permutation scheme that is employed in the computation of the variable importance measure. Based on these considerations we develop a new, conditional permutation scheme for the computation of the variable importance measure. The resulting conditional variable importance is shown to reflect the true impact of each predictor variable more reliably than the original marginal approach

    Prediction intervals for future BMI values of individual children - a non-parametric approach by quantile boosting

    Get PDF
    Background: The construction of prediction intervals (PIs) for future body mass index (BMI) values of individual children based on a recent German birth cohort study with n = 2007 children is problematic for standard parametric approaches, as the BMI distribution in childhood is typically skewed depending on age. Methods: We avoid distributional assumptions by directly modelling the borders of PIs by additive quantile regression, estimated by boosting. We point out the concept of conditional coverage to prove the accuracy of PIs. As conditional coverage can hardly be evaluated in practical applications, we conduct a simulation study before fitting child- and covariate-specific PIs for future BMI values and BMI patterns for the present data. Results: The results of our simulation study suggest that PIs fitted by quantile boosting cover future observations with the predefined coverage probability and outperform the benchmark approach. For the prediction of future BMI values, quantile boosting automatically selects informative covariates and adapts to the age-specific skewness of the BMI distribution. The lengths of the estimated PIs are child-specific and increase, as expected, with the age of the child. Conclusions: Quantile boosting is a promising approach to construct PIs with correct conditional coverage in a non-parametric way. It is in particular suitable for the prediction of BMI patterns depending on covariates, since it provides an interpretable predictor structure, inherent variable selection properties and can even account for longitudinal data structures

    Bias in random forest variable importance measures: Illustrations, sources and a solution

    Get PDF
    BACKGROUND: Variable importance measures for random forests have been receiving increased attention as a means of variable selection in many classification tasks in bioinformatics and related scientific fields, for instance to select a subset of genetic markers relevant for the prediction of a certain disease. We show that random forest variable importance measures are a sensible means for variable selection in many applications, but are not reliable in situations where potential predictor variables vary in their scale of measurement or their number of categories. This is particularly important in genomics and computational biology, where predictors often include variables of different types, for example when predictors include both sequence data and continuous variables such as folding energy, or when amino acid sequence data show different numbers of categories. RESULTS: Simulation studies are presented illustrating that, when random forest variable importance measures are used with data of varying types, the results are misleading because suboptimal predictor variables may be artificially preferred in variable selection. The two mechanisms underlying this deficiency are biased variable selection in the individual classification trees used to build the random forest on one hand, and effects induced by bootstrap sampling with replacement on the other hand. CONCLUSION: We propose to employ an alternative implementation of random forests, that provides unbiased variable selection in the individual classification trees. When this method is applied using subsampling without replacement, the resulting variable importance measures can be used reliably for variable selection even in situations where the potential predictor variables vary in their scale of measurement or their number of categories. The usage of both random forest algorithms and their variable importance measures in the R system for statistical computing is illustrated and documented thoroughly in an application re-analyzing data from a study on RNA editing. Therefore the suggested method can be applied straightforwardly by scientists in bioinformatics research

    A boosting method for maximizing the partial area under the ROC curve

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The receiver operating characteristic (ROC) curve is a fundamental tool to assess the discriminant performance for not only a single marker but also a score function combining multiple markers. The area under the ROC curve (AUC) for a score function measures the intrinsic ability for the score function to discriminate between the controls and cases. Recently, the partial AUC (pAUC) has been paid more attention than the AUC, because a suitable range of the false positive rate can be focused according to various clinical situations. However, existing pAUC-based methods only handle a few markers and do not take nonlinear combination of markers into consideration.</p> <p>Results</p> <p>We have developed a new statistical method that focuses on the pAUC based on a boosting technique. The markers are combined componentially for maximizing the pAUC in the boosting algorithm using natural cubic splines or decision stumps (single-level decision trees), according to the values of markers (continuous or discrete). We show that the resulting score plots are useful for understanding how each marker is associated with the outcome variable. We compare the performance of the proposed boosting method with those of other existing methods, and demonstrate the utility using real data sets. As a result, we have much better discrimination performances in the sense of the pAUC in both simulation studies and real data analysis.</p> <p>Conclusions</p> <p>The proposed method addresses how to combine the markers after a pAUC-based filtering procedure in high dimensional setting. Hence, it provides a consistent way of analyzing data based on the pAUC from maker selection to marker combination for discrimination problems. The method can capture not only linear but also nonlinear association between the outcome variable and the markers, about which the nonlinearity is known to be necessary in general for the maximization of the pAUC. The method also puts importance on the accuracy of classification performance as well as interpretability of the association, by offering simple and smooth resultant score plots for each marker.</p
    corecore