40,148 research outputs found

    RIBAR and xRIBAR: methods for reproducible relative MS/MS-based label-free protein quantification

    Get PDF
    Mass spectrometry-driven proteomics is increasingly relying on quantitative analyses for biological discoveries. As a result, different methods and algorithms have been developed to perform relative or absolute quantification based on mass spectrometry data. One of the most popular quantification methods are the so-called label-free approaches, which require no special sample processing, and can even be applied retroactively to existing data sets. Of these label-free methods, the MS/MS-based approaches are most often applied, mainly because of their inherent simplicity as compared to MS-based methods. The main application of these approaches is the determination of relative protein amounts between different samples, expressed as protein ratios. However, as we demonstrate here, there are some issues with the reproducibility across replicates of these protein ratio sets obtained from the various, MS/MS-based label-free methods, indicating that the existing methods are not optimally robust. We therefore present two new Methods (called RIBAR and xRIBAR) that use the available MS/MS data more effectively, achieving increased robustness. Both the accuracy and the precision of our novel methods are analyzed and compared to the existing methods to illustrate the increased robustness of our new methods over existing ones

    Sequence alignment, mutual information, and dissimilarity measures for constructing phylogenies

    Get PDF
    Existing sequence alignment algorithms use heuristic scoring schemes which cannot be used as objective distance metrics. Therefore one relies on measures like the p- or log-det distances, or makes explicit, and often simplistic, assumptions about sequence evolution. Information theory provides an alternative, in the form of mutual information (MI) which is, in principle, an objective and model independent similarity measure. MI can be estimated by concatenating and zipping sequences, yielding thereby the "normalized compression distance". So far this has produced promising results, but with uncontrolled errors. We describe a simple approach to get robust estimates of MI from global pairwise alignments. Using standard alignment algorithms, this gives for animal mitochondrial DNA estimates that are strikingly close to estimates obtained from the alignment free methods mentioned above. Our main result uses algorithmic (Kolmogorov) information theory, but we show that similar results can also be obtained from Shannon theory. Due to the fact that it is not additive, normalized compression distance is not an optimal metric for phylogenetics, but we propose a simple modification that overcomes the issue of additivity. We test several versions of our MI based distance measures on a large number of randomly chosen quartets and demonstrate that they all perform better than traditional measures like the Kimura or log-det (resp. paralinear) distances. Even a simplified version based on single letter Shannon entropies, which can be easily incorporated in existing software packages, gave superior results throughout the entire animal kingdom. But we see the main virtue of our approach in a more general way. For example, it can also help to judge the relative merits of different alignment algorithms, by estimating the significance of specific alignments.Comment: 19 pages + 16 pages of supplementary materia

    Evaluation of Motion Artifact Metrics for Coronary CT Angiography

    Get PDF
    Purpose This study quantified the performance of coronary artery motion artifact metrics relative to human observer ratings. Motion artifact metrics have been used as part of motion correction and best‐phase selection algorithms for Coronary Computed Tomography Angiography (CCTA). However, the lack of ground truth makes it difficult to validate how well the metrics quantify the level of motion artifact. This study investigated five motion artifact metrics, including two novel metrics, using a dynamic phantom, clinical CCTA images, and an observer study that provided ground‐truth motion artifact scores from a series of pairwise comparisons. Method Five motion artifact metrics were calculated for the coronary artery regions on both phantom and clinical CCTA images: positivity, entropy, normalized circularity, Fold Overlap Ratio (FOR), and Low‐Intensity Region Score (LIRS). CT images were acquired of a dynamic cardiac phantom that simulated cardiac motion and contained six iodine‐filled vessels of varying diameter and with regions of soft plaque and calcifications. Scans were repeated with different gantry start angles. Images were reconstructed at five phases of the motion cycle. Clinical images were acquired from 14 CCTA exams with patient heart rates ranging from 52 to 82 bpm. The vessel and shading artifacts were manually segmented by three readers and combined to create ground‐truth artifact regions. Motion artifact levels were also assessed by readers using a pairwise comparison method to establish a ground‐truth reader score. The Kendall\u27s Tau coefficients were calculated to evaluate the statistical agreement in ranking between the motion artifacts metrics and reader scores. Linear regression between the reader scores and the metrics was also performed. Results On phantom images, the Kendall\u27s Tau coefficients of the five motion artifact metrics were 0.50 (normalized circularity), 0.35 (entropy), 0.82 (positivity), 0.77 (FOR), 0.77(LIRS), where higher Kendall\u27s Tau signifies higher agreement. The FOR, LIRS, and transformed positivity (the fourth root of the positivity) were further evaluated in the study of clinical images. The Kendall\u27s Tau coefficients of the selected metrics were 0.59 (FOR), 0.53 (LIRS), and 0.21 (Transformed positivity). In the study of clinical data, a Motion Artifact Score, defined as the product of FOR and LIRS metrics, further improved agreement with reader scores, with a Kendall\u27s Tau coefficient of 0.65. Conclusion The metrics of FOR, LIRS, and the product of the two metrics provided the highest agreement in motion artifact ranking when compared to the readers, and the highest linear correlation to the reader scores. The validated motion artifact metrics may be useful for developing and evaluating methods to reduce motion in Coronary Computed Tomography Angiography (CCTA) images

    Optimal Data Collection For Informative Rankings Expose Well-Connected Graphs

    Get PDF
    Given a graph where vertices represent alternatives and arcs represent pairwise comparison data, the statistical ranking problem is to find a potential function, defined on the vertices, such that the gradient of the potential function agrees with the pairwise comparisons. Our goal in this paper is to develop a method for collecting data for which the least squares estimator for the ranking problem has maximal Fisher information. Our approach, based on experimental design, is to view data collection as a bi-level optimization problem where the inner problem is the ranking problem and the outer problem is to identify data which maximizes the informativeness of the ranking. Under certain assumptions, the data collection problem decouples, reducing to a problem of finding multigraphs with large algebraic connectivity. This reduction of the data collection problem to graph-theoretic questions is one of the primary contributions of this work. As an application, we study the Yahoo! Movie user rating dataset and demonstrate that the addition of a small number of well-chosen pairwise comparisons can significantly increase the Fisher informativeness of the ranking. As another application, we study the 2011-12 NCAA football schedule and propose schedules with the same number of games which are significantly more informative. Using spectral clustering methods to identify highly-connected communities within the division, we argue that the NCAA could improve its notoriously poor rankings by simply scheduling more out-of-conference games.Comment: 31 pages, 10 figures, 3 table

    Significance Analysis for Pairwise Variable Selection in Classification

    Get PDF
    The goal of this article is to select important variables that can distinguish one class of data from another. A marginal variable selection method ranks the marginal effects for classification of individual variables, and is a useful and efficient approach for variable selection. Our focus here is to consider the bivariate effect, in addition to the marginal effect. In particular, we are interested in those pairs of variables that can lead to accurate classification predictions when they are viewed jointly. To accomplish this, we propose a permutation test called Significance test of Joint Effect (SigJEff). In the absence of joint effect in the data, SigJEff is similar or equivalent to many marginal methods. However, when joint effects exist, our method can significantly boost the performance of variable selection. Such joint effects can help to provide additional, and sometimes dominating, advantage for classification. We illustrate and validate our approach using both simulated example and a real glioblastoma multiforme data set, which provide promising results.Comment: 28 pages, 7 figure

    Influence of Context on Item Parameters in Forced-Choice Personality Assessments

    Get PDF
    A fundamental assumption in computerized adaptive testing (CAT) is that item parameters are invariant with respect to context – items surrounding the administered item. This assumption, however, may not hold in forced-choice (FC) assessments, where explicit comparisons are made between items included in the same block. We empirically examined the influence of context on item parameters by comparing parameter estimates from two FC instruments. The first instrument was compiled of blocks of three items, whereas in the second, the context was manipulated by adding one item to each block, resulting in blocks of four. The item parameter estimates were highly similar. However, a small number of significant deviations were observed, confirming the importance of context when designing adaptive FC assessments. Two patterns of such deviations were identified, and methods to reduce their occurrences in a FC CAT setting were proposed. It was shown that with a small proportion of violations of the parameter invariance assumption, score estimation remained stable

    The Scaling of the Redshift Power Spectrum: Observations from the Las Campanas Redshift Survey

    Full text link
    In a recent paper we have studied the redshift power spectrum PS(k,μ)P^S(k,\mu) in three CDM models with the help of high resolution simulations. Here we apply the method to the largest available redshift survey, the Las Campanas Redshift Survey (LCRS). The basic model is to express PS(k,μ)P^S(k,\mu) as a product of three factors P^S(k,\mu)=P^R(k)(1+\beta\mu^2)^2 D(k,\mu). Here μ\mu is the cosine of the angle between the wave vector and the line of sight. The damping function DD for the range of scales accessible to an accurate analysis of the LCRS is well approximated by the Lorentz factor D=[1+{1\over 2}(k\mu\sigma_{12})^2]^{-1}. We have investigated different values for β\beta (β=0.4\beta=0.4, 0.5, 0.6), and measured PR(k)P^R(k) and σ12(k)\sigma_{12}(k) from PS(k,μ)P^S(k,\mu) for different values of μ\mu. The velocity dispersion σ12(k)\sigma_{12}(k) is nearly a constant from k=0.5k=0.5 to 3 \mpci. The average value for this range is 510\pm 70 \kms. The power spectrum PR(k)P^R(k) decreases with kk approximately with k1.7k^{-1.7} for kk between 0.1 and 4 \mpci. The statistical significance of the results, and the error bars, are found with the help of mock samples constructed from a large set of high resolution simulations. A flat, low-density (Ω0=0.2\Omega_0=0.2) CDM model can give a good fit to the data, if a scale-dependent special bias scheme is used which we have called the cluster-under-weighted bias (Jing et al.).Comment: accepted for publication in MNRAS, 20 pages with 7 figure

    Effect of ambient temperature on the human tear film

    Get PDF

    Correlation-Compressed Direct Coupling Analysis

    Full text link
    Learning Ising or Potts models from data has become an important topic in statistical physics and computational biology, with applications to predictions of structural contacts in proteins and other areas of biological data analysis. The corresponding inference problems are challenging since the normalization constant (partition function) of the Ising/Potts distributions cannot be computed efficiently on large instances. Different ways to address this issue have hence given size to a substantial methodological literature. In this paper we investigate how these methods could be used on much larger datasets than studied previously. We focus on a central aspect, that in practice these inference problems are almost always severely under-sampled, and the operational result is almost always a small set of leading (largest) predictions. We therefore explore an approach where the data is pre-filtered based on empirical correlations, which can be computed directly even for very large problems. Inference is only used on the much smaller instance in a subsequent step of the analysis. We show that in several relevant model classes such a combined approach gives results of almost the same quality as the computationally much more demanding inference on the whole dataset. We also show that results on whole-genome epistatic couplings that were obtained in a recent computation-intensive study can be retrieved by the new approach. The method of this paper hence opens up the possibility to learn parameters describing pair-wise dependencies in whole genomes in a computationally feasible and expedient manner.Comment: 15 pages, including 11 figure
    corecore