9,109 research outputs found
Modeling and Estimation for Real-Time Microarrays
Microarrays are used for collecting information about a large number of different genomic particles simultaneously. Conventional fluorescent-based microarrays acquire data after the hybridization phase. During this phase, the target analytes (e.g., DNA fragments) bind to the capturing probes on the array and, by the end of it, supposedly reach a steady state. Therefore, conventional microarrays attempt to detect and quantify the targets with a single data point taken in the steady state. On the other hand, a novel technique, the so-called real-time microarray, capable of recording the kinetics of hybridization in fluorescent-based microarrays has recently been proposed. The richness of the information obtained therein promises higher signal-to-noise ratio, smaller estimation error, and broader assay detection dynamic range compared to conventional microarrays. In this paper, we study the signal processing aspects of the real-time microarray system design. In particular, we develop a probabilistic model for real-time microarrays and describe a procedure for the estimation of target amounts therein. Moreover, leveraging on system identification ideas, we propose a novel technique for the elimination of cross hybridization. These are important steps toward developing optimal detection algorithms for real-time microarrays, and to understanding their fundamental limitations
Modeling the kinetics of hybridization in microarrays
Conventional fluorescent-based microarrays acquire data
after the hybridization phase. In this phase the targets analytes
(i.e., DNA fragments) bind to the capturing probes
on the array and supposedly reach a steady state. Accordingly,
microarray experiments essentially provide only a
single, steady-state data point of the hybridization process.
On the other hand, a novel technique (i.e., realtime
microarrays) capable of recording the kinetics of hybridization
in fluorescent-based microarrays has recently
been proposed in [5]. The richness of the information obtained
therein promises higher signal-to-noise ratio, smaller
estimation error, and broader assay detection dynamic range
compared to the conventional microarrays. In the current
paper, we develop a probabilistic model of the kinetics of
hybridization and describe a procedure for the estimation
of its parameters which include the binding rate and target
concentration. This probabilistic model is an important
step towards developing optimal detection algorithms for
the microarrays which measure the kinetics of hybridization,
and to understanding their fundamental limitations
Real-time DNA microarray analysis
We present a quantification method for affinity-based
DNA microarrays which is based on the
real-time measurements of hybridization kinetics.
This method, i.e. real-time DNA microarrays,
enhances the detection dynamic range of conventional
systems by being impervious to probe
saturation in the capturing spots, washing
artifacts, microarray spot-to-spot variations, and
other signal amplitude-affecting non-idealities. We
demonstrate in both theory and practice that the
time-constant of target capturing in microarrays,
similar to all affinity-based biosensors, is inversely
proportional to the concentration of the target
analyte, which we subsequently use as the fundamental
parameter to estimate the concentration
of the analytes. Furthermore, to empirically
validate the capabilities of this method in practical
applications, we present a FRET-based assay which
enables the real-time detection in gene expression
DNA microarrays
Effect Size Estimation and Misclassification Rate Based Variable Selection in Linear Discriminant Analysis
Supervised classifying of biological samples based on genetic information,
(e.g. gene expression profiles) is an important problem in biostatistics. In
order to find both accurate and interpretable classification rules variable
selection is indispensable. This article explores how an assessment of the
individual importance of variables (effect size estimation) can be used to
perform variable selection. I review recent effect size estimation approaches
in the context of linear discriminant analysis (LDA) and propose a new
conceptually simple effect size estimation method which is at the same time
computationally efficient. I then show how to use effect sizes to perform
variable selection based on the misclassification rate which is the data
independent expectation of the prediction error. Simulation studies and real
data analyses illustrate that the proposed effect size estimation and variable
selection methods are competitive. Particularly, they lead to both compact and
interpretable feature sets.Comment: 21 pages, 2 figure
Microarrays, Empirical Bayes and the Two-Groups Model
The classic frequentist theory of hypothesis testing developed by Neyman,
Pearson and Fisher has a claim to being the twentieth century's most
influential piece of applied mathematics. Something new is happening in the
twenty-first century: high-throughput devices, such as microarrays, routinely
require simultaneous hypothesis tests for thousands of individual cases, not at
all what the classical theory had in mind. In these situations empirical Bayes
information begins to force itself upon frequentists and Bayesians alike. The
two-groups model is a simple Bayesian construction that facilitates empirical
Bayes analysis. This article concerns the interplay of Bayesian and frequentist
ideas in the two-groups setting, with particular attention focused on Benjamini
and Hochberg's False Discovery Rate method. Topics include the choice and
meaning of the null hypothesis in large-scale testing situations, power
considerations, the limitations of permutation methods, significance testing
for groups of cases (such as pathways in microarray studies), correlation
effects, multiple confidence intervals and Bayesian competitors to the
two-groups model.Comment: This paper commented in: [arXiv:0808.0582], [arXiv:0808.0593],
[arXiv:0808.0597], [arXiv:0808.0599]. Rejoinder in [arXiv:0808.0603].
Published in at http://dx.doi.org/10.1214/07-STS236 the Statistical Science
(http://www.imstat.org/sts/) by the Institute of Mathematical Statistics
(http://www.imstat.org
Variance component score test for time-course gene set analysis of longitudinal RNA-seq data
As gene expression measurement technology is shifting from microarrays to
sequencing, the statistical tools available for their analysis must be adapted
since RNA-seq data are measured as counts. Recently, it has been proposed to
tackle the count nature of these data by modeling log-count reads per million
as continuous variables, using nonparametric regression to account for their
inherent heteroscedasticity. Adopting such a framework, we propose tcgsaseq, a
principled, model-free and efficient top-down method for detecting longitudinal
changes in RNA-seq gene sets. Considering gene sets defined a priori, tcgsaseq
identifies those whose expression vary over time, based on an original variance
component score test accounting for both covariates and heteroscedasticity
without assuming any specific parametric distribution for the transformed
counts. We demonstrate that despite the presence of a nonparametric component,
our test statistic has a simple form and limiting distribution, and both may be
computed quickly. A permutation version of the test is additionally proposed
for very small sample sizes. Applied to both simulated data and two real
datasets, the proposed method is shown to exhibit very good statistical
properties, with an increase in stability and power when compared to state of
the art methods ROAST, edgeR and DESeq2, which can fail to control the type I
error under certain realistic settings. We have made the method available for
the community in the R package tcgsaseq.Comment: 23 pages, 6 figures, typo corrections & acceptance acknowledgemen
A cDNA Microarray Gene Expression Data Classifier for Clinical Diagnostics Based on Graph Theory
Despite great advances in discovering cancer molecular profiles, the proper application of microarray technology to routine clinical diagnostics is still a challenge. Current practices in the classification of microarrays' data show two main limitations: the reliability of the training data sets used to build the classifiers, and the classifiers' performances, especially when the sample to be classified does not belong to any of the available classes. In this case, state-of-the-art algorithms usually produce a high rate of false positives that, in real diagnostic applications, are unacceptable. To address this problem, this paper presents a new cDNA microarray data classification algorithm based on graph theory and is able to overcome most of the limitations of known classification methodologies. The classifier works by analyzing gene expression data organized in an innovative data structure based on graphs, where vertices correspond to genes and edges to gene expression relationships. To demonstrate the novelty of the proposed approach, the authors present an experimental performance comparison between the proposed classifier and several state-of-the-art classification algorithm
- …