147 research outputs found

    IPAD: Stable Interpretable Forecasting with Knockoffs Inference

    Get PDF
    Interpretability and stability are two important features that are desired in many contemporary big data applications arising in economics and finance. While the former is enjoyed to some extent by many existing forecasting approaches, the latter in the sense of controlling the fraction of wrongly discovered features which can enhance greatly the interpretability is still largely underdeveloped in the econometric settings. To this end, in this paper we exploit the general framework of model-X knockoffs introduced recently in Cand\`{e}s, Fan, Janson and Lv (2018), which is nonconventional for reproducible large-scale inference in that the framework is completely free of the use of p-values for significance testing, and suggest a new method of intertwined probabilistic factors decoupling (IPAD) for stable interpretable forecasting with knockoffs inference in high-dimensional models. The recipe of the method is constructing the knockoff variables by assuming a latent factor model that is exploited widely in economics and finance for the association structure of covariates. Our method and work are distinct from the existing literature in that we estimate the covariate distribution from data instead of assuming that it is known when constructing the knockoff variables, our procedure does not require any sample splitting, we provide theoretical justifications on the asymptotic false discovery rate control, and the theory for the power analysis is also established. Several simulation examples and the real data analysis further demonstrate that the newly suggested method has appealing finite-sample performance with desired interpretability and stability compared to some popularly used forecasting methods

    Deposizione di rivestimenti in Al2O3 mediante High Velocity Suspension Flame Spraying (HVSFS): caratteristiche dei riporti ed effetto dei parametri operativi

    Get PDF
    La tecnica denominata HVSFS (High Velocity Suspension Flame Spraying) è un processo di termospruzzaturainnovativo che, alimentando la torcia con una sospensione di particelle finissime (micro- o nano-metriche)disperse in una fase liquida, permette la deposizione di rivestimenti ceramici ad alta densità e basso spessore(<100 ?m). Per approfondire la relazione fra caratteristiche dei riporti e proprietà della sospensione, varirivestimenti a base di Al2O3 sono stati depositati utilizzando sospensioni di particelle sia micrometriche, siananometriche. Indipendentemente dai parametri di processo selezionati, una sospensione di particellemicrometriche sufficientemente disperse garantisce maggior efficienza di deposizione (>50%) e producerivestimenti costituiti da un’ottima sovrapposizione di lamelle fortemente coese, con maggior durezza(?1200 HV0.05) e minor rugosità (Ra ? 1.3 ?m) rispetto ai rivestimenti ottenibili con sospensioni dinanoparticelle. Sebbene i rivestimenti ottenuti da sospensioni di particelle micrometriche siano anche soggettia tensioni residue trattive più elevate (fra 50 MPa e 100 MPa), la loro eccellente densità e ottima coesione lirendono molto più resistenti all’usura per strisciamento (studiata con test “ball on disk”) rispetto a riporti diAl2O3 prodotti sia con sospensioni di nanoparticelle, sia con tecniche di termospruzzatura convenzionali

    GeneBins: a database for classifying gene expression data, with application to plant genome arrays

    Get PDF
    BACKGROUND: To interpret microarray experiments, several ontological analysis tools have been developed. However, current tools are limited to specific organisms. RESULTS: We developed a bioinformatics system to assign the probe set sequences of any organism to a hierarchical functional classification modelled on KEGG ontology. The GeneBins database currently supports the functional classification of expression data from four Affymetrix arrays; Arabidopsis thaliana, Oryza sativa, Glycine max and Medicago truncatula. An online analysis tool to identify relevant functions is also provided. CONCLUSION: GeneBins provides resources to interpret gene expression results from microarray experiments. It is available a

    Semi-supervised discovery of differential genes

    Get PDF
    BACKGROUND: Various statistical scores have been proposed for evaluating the significance of genes that may exhibit differential expression between two or more controlled conditions. However, in many clinical studies to detect clinical marker genes for example, the conditions have not necessarily been controlled well, thus condition labels are sometimes hard to obtain due to physical, financial, and time costs. In such a situation, we can consider an unsupervised case where labels are not available or a semi-supervised case where labels are available for a part of the whole sample set, rather than a well-studied supervised case where all samples have their labels. RESULTS: We assume a latent variable model for the expression of active genes and apply the optimal discovery procedure (ODP) proposed by Storey (2005) to the model. Our latent variable model allows gene significance scores to be applied to unsupervised and semi-supervised cases. The ODP framework improves detectability by sharing the estimated parameters of null and alternative models of multiple tests over multiple genes. A theoretical consideration leads to two different interpretations of the latent variable, i.e., it only implicitly affects the alternative model through the model parameters, or it is explicitly included in the alternative model, so that the interpretations correspond to two different implementations of ODP. By comparing the two implementations through experiments with simulation data, we have found that sharing the latent variable estimation is effective for increasing the detectability of truly active genes. We also show that the unsupervised and semi-supervised rating of genes, which takes into account the samples without condition labels, can improve detection of active genes in real gene discovery problems. CONCLUSION: The experimental results indicate that the ODP framework is effective for hypotheses including latent variables and is further improved by sharing the estimations of hidden variables over multiple tests

    The distribution of genetic diversity in a Brassica oleracea gene bank collection related to the effects on diversity of regeneration, as measured with AFLPs

    Get PDF
    The ex situ conservation of plant genetic resources in gene banks involves the selection of accessions to be conserved and the maintenance of these accessions for current and future users. Decisions concerning both these issues require knowledge about the distribution of genetic diversity within and between accessions sampled from the gene pool, but also about the changes in variation of these samples as a result of regenerations. These issues were studied in an existing gene bank collection of a cross-pollinating crop using a selection of groups of very similar Dutch white cabbage accessions, and additional groups of reference material representing the Dutch, and the global white cabbage gene pool. Six accessions were sampled both before and after a standard regeneration. 30 plants of each of 50 accessions plus 6 regeneration populations included in the study were characterised with AFLPs, using scores for 103 polymorphic bands. It was shown that the genetic changes as a result of standard gene bank regenerations, as measured by AFLPs, are of a comparable magnitude as the differences between some of the more similar accessions. The observed changes are mainly due to highly significant changes in allele frequencies for a few fragments, whereas for the majority of fragments the alleles occur in similar frequencies before and after regeneration. It is argued that, given the changes of accessions over generations, accessions that display similar levels of differentiation may be combined safely

    Adaptive Strategy for the Statistical Analysis of Connectomes

    Get PDF
    We study an adaptive statistical approach to analyze brain networks represented by brain connection matrices of interregional connectivity (connectomes). Our approach is at a middle level between a global analysis and single connections analysis by considering subnetworks of the global brain network. These subnetworks represent either the inter-connectivity between two brain anatomical regions or by the intra-connectivity within the same brain anatomical region. An appropriate summary statistic, that characterizes a meaningful feature of the subnetwork, is evaluated. Based on this summary statistic, a statistical test is performed to derive the corresponding p-value. The reformulation of the problem in this way reduces the number of statistical tests in an orderly fashion based on our understanding of the problem. Considering the global testing problem, the p-values are corrected to control the rate of false discoveries. Finally, the procedure is followed by a local investigation within the significant subnetworks. We contrast this strategy with the one based on the individual measures in terms of power. We show that this strategy has a great potential, in particular in cases where the subnetworks are well defined and the summary statistics are properly chosen. As an application example, we compare structural brain connection matrices of two groups of subjects with a 22q11.2 deletion syndrome, distinguished by their IQ scores

    Impact of RNA degradation on gene expression profiling

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Gene expression profiling is a highly sensitive technique which is used for profiling tumor samples for medical prognosis. RNA quality and degradation influence the analysis results of gene expression profiles. The impact of this influence on the profiles and its medical impact is not fully understood. As patient samples are very valuable for clinical studies, it is necessary to establish criteria for the RNA quality to be able to use these samples in later analysis.</p> <p>Methods</p> <p>To investigate the effects of RNA integrity on gene expression profiling, whole genome expression arrays were used. We used tumor biopsies from patients diagnosed with locally advanced rectal cancer. To simulate degradation, the isolated total RNA of all patients was subjected to heat-induced degradation in a time-dependent manner. Expression profiling was then performed and data were analyzed bioinformatically to assess the differences.</p> <p>Results</p> <p>The differences introduced by RNA degradation were largely outweighed by the biological differences between the patients. Only a relatively small number of probes (275 out of 41,000) show a significant effect due to degradation. The genes that show the strongest effect due to RNA degradation were, especially, those with short mRNAs and probe positions near the 5' end.</p> <p>Conclusions</p> <p>Degraded RNA from tumor samples (RIN > 5) can still be used to perform gene expression analysis. A much higher biological variance between patients is observed compared to the effect that is imposed by degradation of RNA. Nevertheless there are genes, very short ones and those with the probe binding side close to the 5' end that should be excluded from gene expression analysis when working with degraded RNA. These results are limited to the Agilent 44 k microarray platform and should be carefully interpreted when transferring to other settings.</p
    corecore