482 research outputs found
Potentiometric Selectivities of Ionophore-Doped Ion-Selective Membranes: Concurrent Presence of Primary Ion or Interfering Ion Complexes of Multiple Stoichiometries
The
selectivities of ionophore-doped ion-selective electrode (ISE)
membranes are controlled by the stability and stoichiometry of the
complexes between the ionophore, L, and the target and interfering
ions (Izi and Jzj, respectively). Well-accepted models predict how these selectivities
can be optimized by selection of ideal ionophore-to-ionic site ratios,
considering complex stoichiometries and ion charges. These models
were developed for systems in which the target and interfering ions
each form complexes of only one stoichiometry. However, for a few
ISEs, the concurrent presence of two primary ion complexes of different
stoichiometries, such as ILzi and IL2zi, was reported. Indeed, similar
systems were probably often overlooked and are, in fact, more common
than the exclusive formation of complexes of higher stoichiometry
unless the ionophore is used in excess. Importantly, misinterpreted
stoichiometries misguide the design of new ionophores and are likely
to result in the formulation of ISE membranes with inferior selectivities.
We show here that the presence of two or more complexes of different
stoichiometries for a given ion may be inferred experimentally from
careful interpretation of the potentiometric selectivities as a function
of the ionophore-to-ionic site ratio or from calculations of complex
concentrations using experimentally determined complex stabilities.
Concurrent formation of JLzj and JL2zj complexes of an interfering ion is shown here to shift the ionophore-to-ionic site ratio that
provides the highest selectivities. Formation of ILn–1zi and ILnzi complexes of a primary ion is less of a concern because an optimized membrane
typically contains an excess of ionophore, but lower than expected
selectivities may be observed if the stepwise complex formation constant, KILn, is not sufficiently large and the ionophore-to-ionic
site ratio does not markedly exceed n
Noisy Monte Carlo: Convergence of Markov chains with approximate transition kernels
Monte Carlo algorithms often aim to draw from a distribution by
simulating a Markov chain with transition kernel such that is
invariant under . However, there are many situations for which it is
impractical or impossible to draw from the transition kernel . For instance,
this is the case with massive datasets, where is it prohibitively expensive to
calculate the likelihood and is also the case for intractable likelihood models
arising from, for example, Gibbs random fields, such as those found in spatial
statistics and network analysis. A natural approach in these cases is to
replace by an approximation . Using theory from the stability of
Markov chains we explore a variety of situations where it is possible to
quantify how 'close' the chain given by the transition kernel is to
the chain given by . We apply these results to several examples from spatial
statistics and network analysis.Comment: This version: results extended to non-uniformly ergodic Markov chain
On the combination of omics data for prediction of binary outcomes
Enrichment of predictive models with new biomolecular markers is an important
task in high-dimensional omic applications. Increasingly, clinical studies
include several sets of such omics markers available for each patient,
measuring different levels of biological variation. As a result, one of the
main challenges in predictive research is the integration of different sources
of omic biomarkers for the prediction of health traits. We review several
approaches for the combination of omic markers in the context of binary outcome
prediction, all based on double cross-validation and regularized regression
models. We evaluate their performance in terms of calibration and
discrimination and we compare their performance with respect to single-omic
source predictions. We illustrate the methods through the analysis of two real
datasets. On the one hand, we consider the combination of two fractions of
proteomic mass spectrometry for the calibration of a diagnostic rule for the
detection of early-stage breast cancer. On the other hand, we consider
transcriptomics and metabolomics as predictors of obesity using data from the
Dietary, Lifestyle, and Genetic determinants of Obesity and Metabolic syndrome
(DILGOM) study, a population-based cohort, from Finland
Selection of tuning parameters in bridge regression models via Bayesian information criterion
We consider the bridge linear regression modeling, which can produce a sparse
or non-sparse model. A crucial point in the model building process is the
selection of adjusted parameters including a regularization parameter and a
tuning parameter in bridge regression models. The choice of the adjusted
parameters can be viewed as a model selection and evaluation problem. We
propose a model selection criterion for evaluating bridge regression models in
terms of Bayesian approach. This selection criterion enables us to select the
adjusted parameters objectively. We investigate the effectiveness of our
proposed modeling strategy through some numerical examples.Comment: 20 pages, 5 figure
GLMMLasso: An Algorithm for High-Dimensional Generalized Linear Mixed Models Using L1-Penalization
We propose an L1-penalized algorithm for fitting high-dimensional generalized
linear mixed models. Generalized linear mixed models (GLMMs) can be viewed as
an extension of generalized linear models for clustered observations. This
Lasso-type approach for GLMMs should be mainly used as variable screening
method to reduce the number of variables below the sample size. We then suggest
a refitting by maximum likelihood based on the selected variables only. This is
an effective correction to overcome problems stemming from the variable
screening procedure which are more severe with GLMMs. We illustrate the
performance of our algorithm on simulated as well as on real data examples.
Supplemental materials are available online and the algorithm is implemented in
the R package glmmixedlasso
Conditional variable importance for random forests
Random forests are becoming increasingly popular in many scientific fields because they can cope with ``small n large p'' problems, complex interactions and even highly correlated predictor variables. Their variable importance measures have recently been suggested as screening tools for, e.g., gene expression studies. However, these variable importance measures show a bias towards correlated predictor variables. We identify two mechanisms responsible for this finding: (i) A preference for the selection of correlated predictors in the tree building process and (ii) an additional advantage for correlated predictor variables induced by the unconditional permutation scheme that is employed in the computation of the variable importance measure. Based on these considerations we develop a new, conditional permutation scheme for the computation of the variable importance measure. The resulting conditional variable importance is shown to reflect the true impact of each predictor variable more reliably than the original marginal approach
Prediction intervals for future BMI values of individual children - a non-parametric approach by quantile boosting
Background: The construction of prediction intervals (PIs) for future body mass index (BMI) values of individual children based on a recent German birth cohort study with n = 2007 children is problematic for standard parametric approaches, as the BMI distribution in childhood is typically skewed depending on age. Methods: We avoid distributional assumptions by directly modelling the borders of PIs by additive quantile regression, estimated by boosting. We point out the concept of conditional coverage to prove the accuracy of PIs. As conditional coverage can hardly be evaluated in practical applications, we conduct a simulation study before fitting child- and covariate-specific PIs for future BMI values and BMI patterns for the present data. Results: The results of our simulation study suggest that PIs fitted by quantile boosting cover future observations with the predefined coverage probability and outperform the benchmark approach. For the prediction of future BMI values, quantile boosting automatically selects informative covariates and adapts to the age-specific skewness of the BMI distribution. The lengths of the estimated PIs are child-specific and increase, as expected, with the age of the child. Conclusions: Quantile boosting is a promising approach to construct PIs with correct conditional coverage in a non-parametric way. It is in particular suitable for the prediction of BMI patterns depending on covariates, since it provides an interpretable predictor structure, inherent variable selection properties and can even account for longitudinal data structures
Bias in random forest variable importance measures: Illustrations, sources and a solution
BACKGROUND: Variable importance measures for random forests have been receiving increased attention as a means of variable selection in many classification tasks in bioinformatics and related scientific fields, for instance to select a subset of genetic markers relevant for the prediction of a certain disease. We show that random forest variable importance measures are a sensible means for variable selection in many applications, but are not reliable in situations where potential predictor variables vary in their scale of measurement or their number of categories. This is particularly important in genomics and computational biology, where predictors often include variables of different types, for example when predictors include both sequence data and continuous variables such as folding energy, or when amino acid sequence data show different numbers of categories. RESULTS: Simulation studies are presented illustrating that, when random forest variable importance measures are used with data of varying types, the results are misleading because suboptimal predictor variables may be artificially preferred in variable selection. The two mechanisms underlying this deficiency are biased variable selection in the individual classification trees used to build the random forest on one hand, and effects induced by bootstrap sampling with replacement on the other hand. CONCLUSION: We propose to employ an alternative implementation of random forests, that provides unbiased variable selection in the individual classification trees. When this method is applied using subsampling without replacement, the resulting variable importance measures can be used reliably for variable selection even in situations where the potential predictor variables vary in their scale of measurement or their number of categories. The usage of both random forest algorithms and their variable importance measures in the R system for statistical computing is illustrated and documented thoroughly in an application re-analyzing data from a study on RNA editing. Therefore the suggested method can be applied straightforwardly by scientists in bioinformatics research
A boosting method for maximizing the partial area under the ROC curve
<p>Abstract</p> <p>Background</p> <p>The receiver operating characteristic (ROC) curve is a fundamental tool to assess the discriminant performance for not only a single marker but also a score function combining multiple markers. The area under the ROC curve (AUC) for a score function measures the intrinsic ability for the score function to discriminate between the controls and cases. Recently, the partial AUC (pAUC) has been paid more attention than the AUC, because a suitable range of the false positive rate can be focused according to various clinical situations. However, existing pAUC-based methods only handle a few markers and do not take nonlinear combination of markers into consideration.</p> <p>Results</p> <p>We have developed a new statistical method that focuses on the pAUC based on a boosting technique. The markers are combined componentially for maximizing the pAUC in the boosting algorithm using natural cubic splines or decision stumps (single-level decision trees), according to the values of markers (continuous or discrete). We show that the resulting score plots are useful for understanding how each marker is associated with the outcome variable. We compare the performance of the proposed boosting method with those of other existing methods, and demonstrate the utility using real data sets. As a result, we have much better discrimination performances in the sense of the pAUC in both simulation studies and real data analysis.</p> <p>Conclusions</p> <p>The proposed method addresses how to combine the markers after a pAUC-based filtering procedure in high dimensional setting. Hence, it provides a consistent way of analyzing data based on the pAUC from maker selection to marker combination for discrimination problems. The method can capture not only linear but also nonlinear association between the outcome variable and the markers, about which the nonlinearity is known to be necessary in general for the maximization of the pAUC. The method also puts importance on the accuracy of classification performance as well as interpretability of the association, by offering simple and smooth resultant score plots for each marker.</p
- …