1,602 research outputs found

    Modeling association between DNA copy number and gene expression with constrained piecewise linear regression splines

    Get PDF
    DNA copy number and mRNA expression are widely used data types in cancer studies, which combined provide more insight than separately. Whereas in existing literature the form of the relationship between these two types of markers is fixed a priori, in this paper we model their association. We employ piecewise linear regression splines (PLRS), which combine good interpretation with sufficient flexibility to identify any plausible type of relationship. The specification of the model leads to estimation and model selection in a constrained, nonstandard setting. We provide methodology for testing the effect of DNA on mRNA and choosing the appropriate model. Furthermore, we present a novel approach to obtain reliable confidence bands for constrained PLRS, which incorporates model uncertainty. The procedures are applied to colorectal and breast cancer data. Common assumptions are found to be potentially misleading for biologically relevant genes. More flexible models may bring more insight in the interaction between the two markers.Comment: Published in at http://dx.doi.org/10.1214/12-AOAS605 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Normalized, Segmented or Called aCGH Data?

    Get PDF
    Array comparative genomic hybridization (aCGH) is a high-throughput lab technique to measure genome-wide chromosomal copy numbers. Data from aCGH experiments require extensive pre-processing, which consists of three steps: normalization, segmentation and calling. Each of these pre-processing steps yields a different data set: normalized data, segmented data, and called data. Publications using aCGH base their findings on data from all stages of the pre-processing. Hence, there is no consensus on which should be used for further down-stream analysis. This consensus is however important for correct reporting of findings, and comparison of results from different studies. We discuss several issues that should be taken into account when deciding on which data are to be used. We express the believe that called data are best used, but would welcome opposing views

    The spectral condition number plot for regularization parameter evaluation

    Get PDF
    Abstract: Many modern statistical applications ask for the estimation of a covariance (or precision) matrix in settings where the number of variables is larger than the number of observations. There exists a broad class of ridge-type estimators that employs regularization to cope with the subsequent singularity of the sample covariance matrix. These estimators depend on a penalty parameter and choosing its value can be hard, in terms of being computationally unfeasible or tenable only for a restricted set of ridge-type estimators. Here we introduce a simple graphical tool, the spectral condition number plot, for informed heuristic penalty parameter assessment. The proposed tool is computationally friendly and can be employed for the full class of ridge-type covariance (precision) estimators

    A nonparametric control chart based on the Mann-Whitney statistic

    Full text link
    Nonparametric or distribution-free charts can be useful in statistical process control problems when there is limited or lack of knowledge about the underlying process distribution. In this paper, a phase II Shewhart-type chart is considered for location, based on reference data from phase I analysis and the well-known Mann-Whitney statistic. Control limits are computed using Lugannani-Rice-saddlepoint, Edgeworth, and other approximations along with Monte Carlo estimation. The derivations take account of estimation and the dependence from the use of a reference sample. An illustrative numerical example is presented. The in-control performance of the proposed chart is shown to be much superior to the classical Shewhart XĖ‰\bar{X} chart. Further comparisons on the basis of some percentiles of the out-of-control conditional run length distribution and the unconditional out-of-control ARL show that the proposed chart is almost as good as the Shewhart XĖ‰\bar{X} chart for the normal distribution, but is more powerful for a heavy-tailed distribution such as the Laplace, or for a skewed distribution such as the Gamma. Interactive software, enabling a complete implementation of the chart, is made available on a website.Comment: Published in at http://dx.doi.org/10.1214/193940307000000112 the IMS Collections (http://www.imstat.org/publications/imscollections.htm) by the Institute of Mathematical Statistics (http://www.imstat.org

    Better prediction by use of co-data: Adaptive group-regularized ridge regression

    Full text link
    For many high-dimensional studies, additional information on the variables, like (genomic) annotation or external p-values, is available. In the context of binary and continuous prediction, we develop a method for adaptive group-regularized (logistic) ridge regression, which makes structural use of such 'co-data'. Here, 'groups' refer to a partition of the variables according to the co-data. We derive empirical Bayes estimates of group-specific penalties, which possess several nice properties: i) they are analytical; ii) they adapt to the informativeness of the co-data for the data at hand; iii) only one global penalty parameter requires tuning by cross-validation. In addition, the method allows use of multiple types of co-data at little extra computational effort. We show that the group-specific penalties may lead to a larger distinction between `near-zero' and relatively large regression parameters, which facilitates post-hoc variable selection. The method, termed GRridge, is implemented in an easy-to-use R-package. It is demonstrated on two cancer genomics studies, which both concern the discrimination of precancerous cervical lesions from normal cervix tissues using methylation microarray data. For both examples, GRridge clearly improves the predictive performances of ordinary logistic ridge regression and the group lasso. In addition, we show that for the second study the relatively good predictive performance is maintained when selecting only 42 variables.Comment: 15 pages, 2 figures. Supplementary Information available on first author's web sit

    Gradient Descent in Materio

    Get PDF
    Deep learning, a multi-layered neural network approach inspired by the brain, has revolutionized machine learning. One of its key enablers has been backpropagation, an algorithm that computes the gradient of a loss function with respect to the weights in the neural network model, in combination with its use in gradient descent. However, the implementation of deep learning in digital computers is intrinsically wasteful, with energy consumption becoming prohibitively high for many applications. This has stimulated the development of specialized hardware, ranging from neuromorphic CMOS integrated circuits and integrated photonic tensor cores to unconventional, material-based computing systems. The learning process in these material systems, taking place, e.g., by artificial evolution or surrogate neural network modelling, is still a complicated and time-consuming process. Here, we demonstrate an efficient and accurate homodyne gradient extraction method for performing gradient descent on the loss function directly in the material system. We demonstrate the method in our recently developed dopant network processing units, where we readily realize all Boolean gates. This shows that gradient descent can in principle be fully implemented in materio using simple electronics, opening up the way to autonomously learning material systems

    Magnetic effects at the interface between nonmagnetic oxides

    Get PDF
    The electronic reconstruction at the interface between two insulating oxides can give rise to a highly-conductive interface. In analogy to this remarkable interface-induced conductivity we show how, additionally, magnetism can be induced at the interface between the otherwise nonmagnetic insulating perovskites SrTiO3 and LaAlO3. A large negative magnetoresistance of the interface is found, together with a logarithmic temperature dependence of the sheet resistance. At low temperatures, the sheet resistance reveals magnetic hysteresis. Magnetic ordering is a key issue in solid-state science and its underlying mechanisms are still the subject of intense research. In particular, the interplay between localized magnetic moments and the spin of itinerant conduction electrons in a solid gives rise to intriguing many-body effects such as Ruderman-Kittel-Kasuya-Yosida (RKKY) interactions, the Kondo effect, and carrier-induced ferromagnetism in diluted magnetic semiconductors. The conducting oxide interface now provides a versatile system to induce and manipulate magnetic moments in otherwise nonmagnetic materials.Comment: Nature Materials, July issu

    Comparative studies on the structure of an upland African stream ecosystem

    Get PDF
    Upland stream systems have been extensively investigated in Europe, North America and Australasia and many of the central ideas concerning their function are based on these systems. One central paradigm, the river continuum concept is ultimately derived from those North American streams whose catchments remain forested with native vegetation. Streams of the tropics may or may not fit the model. They have been little studied. The Amani Nature Reserve in the East Usambara Mountains of north-eastern Tanzania offers an opportunity to bring these naturally forested systems to the attention of the ecological community. This article describes a comparison made between two lengths of the River Dodwe in this area. The work was carried out by a group of postgraduate students from eighteen European and African countries with advice from five staff members, as part of a course organised by the Tropical Biology Association. Rigorous efforts were made to standardise techniques, in a situation where equipment and laboratory facilities were very basic, through a management structure and deliberate allocation of work to specialists in each area.The article offers a summary of invertebrate communities found in the stream and its biomass. Crabs seem to be the key organism in both sections of the streams
    • ā€¦
    corecore