    Analyzing Multiple-Probe Microarray: Estimation and Application of Gene Expression Indexes

    Gene expression index estimation is an essential step in analyzing multiple probe microarray data. Various modeling methods have been proposed in this area. Amidst all, a popular method proposed in Li and Wong (2001) is based on a multiplicative model, which is similar to the additive model discussed in Irizarry et al. (2003a) at the logarithm scale. Along this line, Hu et al. (2006) proposed data transformation to improve expression index estimation based on an ad hoc entropy criteria and naive grid search approach. In this work, we re-examined this problem using a new profile likelihood-based transformation estimation approach that is more statistically elegant and computationally efficient. We demonstrate the applicability of the proposed method using a benchmark Affymetrix U95A spiked-in experiment. Moreover, We introduced a new multivariate expression index and used the empirical study to shows its promise in terms of improving model fitting and power of detecting differential expression over the commonly used univariate expression index. As the other important content of the work, we discussed two generally encountered practical issues in application of gene expression index: normalization and summary statistic used for detecting differential expression. Our empirical study shows somewhat different findings from the MAQC project (MAQC, 2006)

    Outliers in dynamic factor models

    Dynamic factor models have a wide range of applications in econometrics and applied economics. The basic motivation resides in their capability of reducing a large set of time series to only few indicators (factors). If the number of time series is large compared to the available number of observations then most information may be conveyed to the factors. This way low dimension models may be estimated for explaining and forecasting one or more time series of interest. It is desirable that outlier free time series be available for estimation. In practice, outlying observations are likely to arise at unknown dates due, for instance, to external unusual events or gross data entry errors. Several methods for outlier detection in time series are available. Most methods, however, apply to univariate time series while even methods designed for handling the multivariate framework do not include dynamic factor models explicitly. A method for discovering outliers occurrences in a dynamic factor model is introduced that is based on linear transforms of the observed data. Some strategies to separate outliers that add to the model and outliers within the common component are discussed. Applications to simulated and real data sets are presented to check the effectiveness of the proposed method.Comment: Published in at http://dx.doi.org/10.1214/07-EJS082 the Electronic Journal of Statistics (http://www.i-journals.org/ejs/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Viewpoints: A high-performance high-dimensional exploratory data analysis tool

    Scientific data sets continue to increase in both size and complexity. In the past, dedicated graphics systems at supercomputing centers were required to visualize large data sets, but as the price of commodity graphics hardware has dropped and its capability has increased, it is now possible, in principle, to view large complex data sets on a single workstation. To do this in practice, an investigator will need software that is written to take advantage of the relevant graphics hardware. The Viewpoints visualization package described herein is an example of such software. Viewpoints is an interactive tool for exploratory visual analysis of large, high-dimensional (multivariate) data. It leverages the capabilities of modern graphics boards (GPUs) to run on a single workstation or laptop. Viewpoints is minimalist: it attempts to do a small set of useful things very well (or at least very quickly) in comparison with similar packages today. Its basic feature set includes linked scatter plots with brushing, dynamic histograms, normalization and outlier detection/removal. Viewpoints was originally designed for astrophysicists, but it has since been used in a variety of fields that range from astronomy, quantum chemistry, fluid dynamics, machine learning, bioinformatics, and finance to information technology server log mining. In this article, we describe the Viewpoints package and show examples of its usage.Comment: 18 pages, 3 figures, PASP in press, this version corresponds more closely to that to be publishe

    GLRT-based threshold detection-estimation performance improvement and application to uniform circular antenna arrays

    ©2006 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE."This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder."The problem of estimating the number of independent Gaussian sources and their parameters impinging upon an antenna array is addressed for scenarios that are problematic for standard techniques, namely, under "threshold conditions" (where subspace techniques such as MUSIC experience an abrupt and dramatic performance breakdown). We propose an antenna geometry-invariant method that adopts the generalized-likelihood-ratio test (GLRT) methodology, supported by a maximum-likelihood-ratio lower-bound analysis that allows erroneous solutions ("outliers") to be found and rectified. Detection-estimation performance in both uniform circular and linear antenna arrays is shown to be significantly improved compared with conventional techniques but limited by the performance-breakdown phenomenon that is intrinsic to all such maximum-likelihood (ML) techniques.Yuri I. Abramovich, Nicholas K. Spencer, and Alexei Y. Gorokho

    Spectral Mapping Reconstruction of Extended Sources

    Three dimensional spectroscopy of extended sources is typically performed with dedicated integral field spectrographs. We describe a method of reconstructing full spectral cubes, with two spatial and one spectral dimension, from rastered spectral mapping observations employing a single slit in a traditional slit spectrograph. When the background and image characteristics are stable, as is often achieved in space, the use of traditional long slits for integral field spectroscopy can substantially reduce instrument complexity over dedicated integral field designs, without loss of mapping efficiency -- particularly compelling when a long slit mode for single unresolved source followup is separately required. We detail a custom flux-conserving cube reconstruction algorithm, discuss issues of extended source flux calibration, and describe CUBISM, a tool which implements these methods for spectral maps obtained with ther Spitzer Space Telescope's Infrared Spectrograph.Comment: 11 pages, 8 figures, accepted by PAS

    Latest results of the Tunka Radio Extension (ISVHECRI2016)

    The Tunka Radio Extension (Tunka-Rex) is an antenna array consisting of 63 antennas at the location of the TAIGA facility (Tunka Advanced Instrument for cosmic ray physics and Gamma Astronomy) in Eastern Siberia, nearby Lake Baikal. Tunka-Rex is triggered by the air-Cherenkov array Tunka-133 during clear and moonless winter nights and by the scintillator array Tunka-Grande during the remaining time. Tunka-Rex measures the radio emission from the same air-showers as Tunka-133 and Tunka-Grande, but with a higher threshold of about 100 PeV. During the first stages of its operation, Tunka-Rex has proven, that sparse radio arrays can measure air-showers with an energy resolution of better than 15\% and the depth of the shower maximum with a resolution of better than 40 g/cm\textsuperscript{2}. To improve and interpret our measurements as well as to study systematic uncertainties due to interaction models, we perform radio simulations with CORSIKA and CoREAS. In this overview we present the setup of Tunka-Rex, discuss the achieved results and the prospects of mass-composition studies with radio arrays.Comment: proceedings of ISVHECRI2016 conferenc

    Multivariate classification of gene expression microarray data

    L'expressiódels gens obtinguts de l'anàliside microarrays s'utilitza en molts casos, per classificar les cèllules. En aquestatesi, unaversióprobabilística del mètodeDiscriminant Partial Least Squares (p-DPLS)s'utilitza per classificar les mostres de les expressions delsseus gens. p-DPLS esbasa en la regla de Bayes de la probabilitat a posteriori. Aquestsclassificadorssónforaçats a classficarsempre.Per superaraquestalimitaciós'haimplementatl'opció de rebuig.Aquestaopciópermetrebutjarlesmostresamb alt riscd'errors de classificació (és a dir, mostresambigüesi outliers).Aquestaopció de rebuigcombinacriterisbasats en els residuals x, el leverage ielsvalorspredits. A més,esdesenvolupa un mètode de selecció de variables per triarels gens mésrellevants, jaque la majoriadels gens analitzatsamb un microarraysónirrellevants per al propòsit particular de classificacióI podenconfondre el classificador. Finalment, el DPLSs'estenen a la classificació multi-classemitjançant la combinació de PLS ambl'anàlisidiscriminant lineal