1,602 research outputs found
Modeling association between DNA copy number and gene expression with constrained piecewise linear regression splines
DNA copy number and mRNA expression are widely used data types in cancer
studies, which combined provide more insight than separately. Whereas in
existing literature the form of the relationship between these two types of
markers is fixed a priori, in this paper we model their association. We employ
piecewise linear regression splines (PLRS), which combine good interpretation
with sufficient flexibility to identify any plausible type of relationship. The
specification of the model leads to estimation and model selection in a
constrained, nonstandard setting. We provide methodology for testing the effect
of DNA on mRNA and choosing the appropriate model. Furthermore, we present a
novel approach to obtain reliable confidence bands for constrained PLRS, which
incorporates model uncertainty. The procedures are applied to colorectal and
breast cancer data. Common assumptions are found to be potentially misleading
for biologically relevant genes. More flexible models may bring more insight in
the interaction between the two markers.Comment: Published in at http://dx.doi.org/10.1214/12-AOAS605 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Normalized, Segmented or Called aCGH Data?
Array comparative genomic hybridization (aCGH) is a high-throughput lab technique to measure genome-wide chromosomal copy numbers. Data from aCGH experiments require extensive pre-processing, which consists of three steps: normalization, segmentation and calling. Each of these pre-processing steps yields a different data set: normalized data, segmented data, and called data. Publications using aCGH base their findings on data from all stages of the pre-processing. Hence, there is no consensus on which should be used for further down-stream analysis. This consensus is however important for correct reporting of findings, and comparison of results from different studies. We discuss several issues that should be taken into account when deciding on which data are to be used. We express the believe that called data are best used, but would welcome opposing views
The spectral condition number plot for regularization parameter evaluation
Abstract: Many modern statistical applications ask for the estimation of a covariance (or precision) matrix in settings where the number of variables is larger than the number of observations. There exists a broad class of ridge-type estimators that employs regularization to cope with the subsequent singularity of the sample covariance matrix. These estimators depend on a penalty parameter and choosing its value can be hard, in terms of being computationally unfeasible or tenable only for a restricted set of ridge-type estimators. Here we introduce a simple graphical tool, the spectral condition number plot, for informed heuristic penalty parameter assessment. The proposed tool is computationally friendly and can be employed for the full class of ridge-type covariance (precision) estimators
A nonparametric control chart based on the Mann-Whitney statistic
Nonparametric or distribution-free charts can be useful in statistical
process control problems when there is limited or lack of knowledge about the
underlying process distribution. In this paper, a phase II Shewhart-type chart
is considered for location, based on reference data from phase I analysis and
the well-known Mann-Whitney statistic. Control limits are computed using
Lugannani-Rice-saddlepoint, Edgeworth, and other approximations along with
Monte Carlo estimation. The derivations take account of estimation and the
dependence from the use of a reference sample. An illustrative numerical
example is presented. The in-control performance of the proposed chart is shown
to be much superior to the classical Shewhart chart. Further
comparisons on the basis of some percentiles of the out-of-control conditional
run length distribution and the unconditional out-of-control ARL show that the
proposed chart is almost as good as the Shewhart chart for the normal
distribution, but is more powerful for a heavy-tailed distribution such as the
Laplace, or for a skewed distribution such as the Gamma. Interactive software,
enabling a complete implementation of the chart, is made available on a
website.Comment: Published in at http://dx.doi.org/10.1214/193940307000000112 the IMS
Collections (http://www.imstat.org/publications/imscollections.htm) by the
Institute of Mathematical Statistics (http://www.imstat.org
Better prediction by use of co-data: Adaptive group-regularized ridge regression
For many high-dimensional studies, additional information on the variables,
like (genomic) annotation or external p-values, is available. In the context of
binary and continuous prediction, we develop a method for adaptive
group-regularized (logistic) ridge regression, which makes structural use of
such 'co-data'. Here, 'groups' refer to a partition of the variables according
to the co-data. We derive empirical Bayes estimates of group-specific
penalties, which possess several nice properties: i) they are analytical; ii)
they adapt to the informativeness of the co-data for the data at hand; iii)
only one global penalty parameter requires tuning by cross-validation. In
addition, the method allows use of multiple types of co-data at little extra
computational effort.
We show that the group-specific penalties may lead to a larger distinction
between `near-zero' and relatively large regression parameters, which
facilitates post-hoc variable selection. The method, termed GRridge, is
implemented in an easy-to-use R-package. It is demonstrated on two cancer
genomics studies, which both concern the discrimination of precancerous
cervical lesions from normal cervix tissues using methylation microarray data.
For both examples, GRridge clearly improves the predictive performances of
ordinary logistic ridge regression and the group lasso. In addition, we show
that for the second study the relatively good predictive performance is
maintained when selecting only 42 variables.Comment: 15 pages, 2 figures. Supplementary Information available on first
author's web sit
Gradient Descent in Materio
Deep learning, a multi-layered neural network approach inspired by the brain, has revolutionized machine learning. One of its key enablers has been backpropagation, an algorithm that computes the gradient of a loss function with respect to the weights in the neural network model, in combination with its use in gradient descent. However, the implementation of deep learning in digital computers is intrinsically wasteful, with energy consumption becoming prohibitively high for many applications. This has stimulated the development of specialized hardware, ranging from neuromorphic CMOS integrated circuits and integrated photonic tensor cores to unconventional, material-based computing systems. The learning process in these material systems, taking place, e.g., by artificial evolution or surrogate neural network modelling, is still a complicated and time-consuming process. Here, we demonstrate an efficient and accurate homodyne gradient extraction method for performing gradient descent on the loss function directly in the material system. We demonstrate the method in our recently developed dopant network processing units, where we readily realize all Boolean gates. This shows that gradient descent can in principle be fully implemented in materio using simple electronics, opening up the way to autonomously learning material systems
Magnetic effects at the interface between nonmagnetic oxides
The electronic reconstruction at the interface between two insulating oxides
can give rise to a highly-conductive interface. In analogy to this remarkable
interface-induced conductivity we show how, additionally, magnetism can be
induced at the interface between the otherwise nonmagnetic insulating
perovskites SrTiO3 and LaAlO3. A large negative magnetoresistance of the
interface is found, together with a logarithmic temperature dependence of the
sheet resistance. At low temperatures, the sheet resistance reveals magnetic
hysteresis. Magnetic ordering is a key issue in solid-state science and its
underlying mechanisms are still the subject of intense research. In particular,
the interplay between localized magnetic moments and the spin of itinerant
conduction electrons in a solid gives rise to intriguing many-body effects such
as Ruderman-Kittel-Kasuya-Yosida (RKKY) interactions, the Kondo effect, and
carrier-induced ferromagnetism in diluted magnetic semiconductors. The
conducting oxide interface now provides a versatile system to induce and
manipulate magnetic moments in otherwise nonmagnetic materials.Comment: Nature Materials, July issu
Comparative studies on the structure of an upland African stream ecosystem
Upland stream systems have been extensively investigated in Europe, North America and Australasia and many of the central ideas concerning their function are based on these systems. One central paradigm, the river continuum concept is ultimately derived from those North American streams whose catchments remain forested with native vegetation. Streams of the tropics may or may not fit the model. They have been little studied. The Amani Nature Reserve in the East Usambara Mountains of north-eastern Tanzania offers an opportunity to bring these naturally forested systems to the attention of the ecological community. This article describes a comparison made between two lengths of the River Dodwe in this area. The work was carried out by a group of postgraduate students from eighteen European and African countries with advice from five staff members, as part of a course organised by the Tropical Biology Association. Rigorous efforts were made to standardise techniques, in a situation where equipment and laboratory facilities were very basic, through a management structure and deliberate allocation of work to specialists in each area.The article offers a summary of invertebrate communities found in the stream and its biomass. Crabs seem to be the key organism in both sections of the streams
- ā¦