40,148 research outputs found
RIBAR and xRIBAR: methods for reproducible relative MS/MS-based label-free protein quantification
Mass spectrometry-driven proteomics is increasingly relying on quantitative analyses for biological discoveries. As a result, different methods and algorithms have been developed to perform relative or absolute quantification based on mass spectrometry data. One of the most popular quantification methods are the so-called label-free approaches, which require no special sample processing, and can even be applied retroactively to existing data sets. Of these label-free methods, the MS/MS-based approaches are most often applied, mainly because of their inherent simplicity as compared to MS-based methods. The main application of these approaches is the determination of relative protein amounts between different samples, expressed as protein ratios. However, as we demonstrate here, there are some issues with the reproducibility across replicates of these protein ratio sets obtained from the various, MS/MS-based label-free methods, indicating that the existing methods are not optimally robust. We therefore present two new Methods (called RIBAR and xRIBAR) that use the available MS/MS data more effectively, achieving increased robustness. Both the accuracy and the precision of our novel methods are analyzed and compared to the existing methods to illustrate the increased robustness of our new methods over existing ones
Sequence alignment, mutual information, and dissimilarity measures for constructing phylogenies
Existing sequence alignment algorithms use heuristic scoring schemes which
cannot be used as objective distance metrics. Therefore one relies on measures
like the p- or log-det distances, or makes explicit, and often simplistic,
assumptions about sequence evolution. Information theory provides an
alternative, in the form of mutual information (MI) which is, in principle, an
objective and model independent similarity measure. MI can be estimated by
concatenating and zipping sequences, yielding thereby the "normalized
compression distance". So far this has produced promising results, but with
uncontrolled errors. We describe a simple approach to get robust estimates of
MI from global pairwise alignments. Using standard alignment algorithms, this
gives for animal mitochondrial DNA estimates that are strikingly close to
estimates obtained from the alignment free methods mentioned above. Our main
result uses algorithmic (Kolmogorov) information theory, but we show that
similar results can also be obtained from Shannon theory. Due to the fact that
it is not additive, normalized compression distance is not an optimal metric
for phylogenetics, but we propose a simple modification that overcomes the
issue of additivity. We test several versions of our MI based distance measures
on a large number of randomly chosen quartets and demonstrate that they all
perform better than traditional measures like the Kimura or log-det (resp.
paralinear) distances. Even a simplified version based on single letter Shannon
entropies, which can be easily incorporated in existing software packages, gave
superior results throughout the entire animal kingdom. But we see the main
virtue of our approach in a more general way. For example, it can also help to
judge the relative merits of different alignment algorithms, by estimating the
significance of specific alignments.Comment: 19 pages + 16 pages of supplementary materia
Evaluation of Motion Artifact Metrics for Coronary CT Angiography
Purpose
This study quantified the performance of coronary artery motion artifact metrics relative to human observer ratings. Motion artifact metrics have been used as part of motion correction and best‐phase selection algorithms for Coronary Computed Tomography Angiography (CCTA). However, the lack of ground truth makes it difficult to validate how well the metrics quantify the level of motion artifact. This study investigated five motion artifact metrics, including two novel metrics, using a dynamic phantom, clinical CCTA images, and an observer study that provided ground‐truth motion artifact scores from a series of pairwise comparisons. Method
Five motion artifact metrics were calculated for the coronary artery regions on both phantom and clinical CCTA images: positivity, entropy, normalized circularity, Fold Overlap Ratio (FOR), and Low‐Intensity Region Score (LIRS). CT images were acquired of a dynamic cardiac phantom that simulated cardiac motion and contained six iodine‐filled vessels of varying diameter and with regions of soft plaque and calcifications. Scans were repeated with different gantry start angles. Images were reconstructed at five phases of the motion cycle. Clinical images were acquired from 14 CCTA exams with patient heart rates ranging from 52 to 82 bpm. The vessel and shading artifacts were manually segmented by three readers and combined to create ground‐truth artifact regions. Motion artifact levels were also assessed by readers using a pairwise comparison method to establish a ground‐truth reader score. The Kendall\u27s Tau coefficients were calculated to evaluate the statistical agreement in ranking between the motion artifacts metrics and reader scores. Linear regression between the reader scores and the metrics was also performed. Results
On phantom images, the Kendall\u27s Tau coefficients of the five motion artifact metrics were 0.50 (normalized circularity), 0.35 (entropy), 0.82 (positivity), 0.77 (FOR), 0.77(LIRS), where higher Kendall\u27s Tau signifies higher agreement. The FOR, LIRS, and transformed positivity (the fourth root of the positivity) were further evaluated in the study of clinical images. The Kendall\u27s Tau coefficients of the selected metrics were 0.59 (FOR), 0.53 (LIRS), and 0.21 (Transformed positivity). In the study of clinical data, a Motion Artifact Score, defined as the product of FOR and LIRS metrics, further improved agreement with reader scores, with a Kendall\u27s Tau coefficient of 0.65. Conclusion
The metrics of FOR, LIRS, and the product of the two metrics provided the highest agreement in motion artifact ranking when compared to the readers, and the highest linear correlation to the reader scores. The validated motion artifact metrics may be useful for developing and evaluating methods to reduce motion in Coronary Computed Tomography Angiography (CCTA) images
Optimal Data Collection For Informative Rankings Expose Well-Connected Graphs
Given a graph where vertices represent alternatives and arcs represent
pairwise comparison data, the statistical ranking problem is to find a
potential function, defined on the vertices, such that the gradient of the
potential function agrees with the pairwise comparisons. Our goal in this paper
is to develop a method for collecting data for which the least squares
estimator for the ranking problem has maximal Fisher information. Our approach,
based on experimental design, is to view data collection as a bi-level
optimization problem where the inner problem is the ranking problem and the
outer problem is to identify data which maximizes the informativeness of the
ranking. Under certain assumptions, the data collection problem decouples,
reducing to a problem of finding multigraphs with large algebraic connectivity.
This reduction of the data collection problem to graph-theoretic questions is
one of the primary contributions of this work. As an application, we study the
Yahoo! Movie user rating dataset and demonstrate that the addition of a small
number of well-chosen pairwise comparisons can significantly increase the
Fisher informativeness of the ranking. As another application, we study the
2011-12 NCAA football schedule and propose schedules with the same number of
games which are significantly more informative. Using spectral clustering
methods to identify highly-connected communities within the division, we argue
that the NCAA could improve its notoriously poor rankings by simply scheduling
more out-of-conference games.Comment: 31 pages, 10 figures, 3 table
Significance Analysis for Pairwise Variable Selection in Classification
The goal of this article is to select important variables that can
distinguish one class of data from another. A marginal variable selection
method ranks the marginal effects for classification of individual variables,
and is a useful and efficient approach for variable selection. Our focus here
is to consider the bivariate effect, in addition to the marginal effect. In
particular, we are interested in those pairs of variables that can lead to
accurate classification predictions when they are viewed jointly. To accomplish
this, we propose a permutation test called Significance test of Joint Effect
(SigJEff). In the absence of joint effect in the data, SigJEff is similar or
equivalent to many marginal methods. However, when joint effects exist, our
method can significantly boost the performance of variable selection. Such
joint effects can help to provide additional, and sometimes dominating,
advantage for classification. We illustrate and validate our approach using
both simulated example and a real glioblastoma multiforme data set, which
provide promising results.Comment: 28 pages, 7 figure
Influence of Context on Item Parameters in Forced-Choice Personality Assessments
A fundamental assumption in computerized adaptive testing (CAT) is that item parameters are invariant with respect to context – items surrounding the administered item. This assumption, however, may not hold in forced-choice (FC) assessments, where explicit comparisons are made between items included in the same block. We empirically examined the influence of context on item parameters by comparing parameter estimates from two FC instruments. The first instrument was compiled of blocks of three items, whereas in the second, the context was manipulated by adding one item to each block, resulting in blocks of four. The item parameter estimates were highly similar. However, a small number of significant deviations were observed, confirming the importance of context when designing adaptive FC assessments. Two patterns of such deviations were identified, and methods to reduce their occurrences in a FC CAT setting were proposed. It was shown that with a small proportion of violations of the parameter invariance assumption, score estimation remained stable
The Scaling of the Redshift Power Spectrum: Observations from the Las Campanas Redshift Survey
In a recent paper we have studied the redshift power spectrum in
three CDM models with the help of high resolution simulations. Here we apply
the method to the largest available redshift survey, the Las Campanas Redshift
Survey (LCRS). The basic model is to express as a product of three
factors P^S(k,\mu)=P^R(k)(1+\beta\mu^2)^2 D(k,\mu). Here is the cosine of
the angle between the wave vector and the line of sight. The damping function
for the range of scales accessible to an accurate analysis of the LCRS is
well approximated by the Lorentz factor D=[1+{1\over
2}(k\mu\sigma_{12})^2]^{-1}. We have investigated different values for
(, 0.5, 0.6), and measured and from
for different values of . The velocity dispersion
is nearly a constant from to 3 \mpci. The average
value for this range is 510\pm 70 \kms. The power spectrum decreases
with approximately with for between 0.1 and 4 \mpci. The
statistical significance of the results, and the error bars, are found with the
help of mock samples constructed from a large set of high resolution
simulations. A flat, low-density () CDM model can give a good fit
to the data, if a scale-dependent special bias scheme is used which we have
called the cluster-under-weighted bias (Jing et al.).Comment: accepted for publication in MNRAS, 20 pages with 7 figure
Correlation-Compressed Direct Coupling Analysis
Learning Ising or Potts models from data has become an important topic in
statistical physics and computational biology, with applications to predictions
of structural contacts in proteins and other areas of biological data analysis.
The corresponding inference problems are challenging since the normalization
constant (partition function) of the Ising/Potts distributions cannot be
computed efficiently on large instances. Different ways to address this issue
have hence given size to a substantial methodological literature. In this paper
we investigate how these methods could be used on much larger datasets than
studied previously. We focus on a central aspect, that in practice these
inference problems are almost always severely under-sampled, and the
operational result is almost always a small set of leading (largest)
predictions. We therefore explore an approach where the data is pre-filtered
based on empirical correlations, which can be computed directly even for very
large problems. Inference is only used on the much smaller instance in a
subsequent step of the analysis. We show that in several relevant model classes
such a combined approach gives results of almost the same quality as the
computationally much more demanding inference on the whole dataset. We also
show that results on whole-genome epistatic couplings that were obtained in a
recent computation-intensive study can be retrieved by the new approach. The
method of this paper hence opens up the possibility to learn parameters
describing pair-wise dependencies in whole genomes in a computationally
feasible and expedient manner.Comment: 15 pages, including 11 figure
- …
