360 research outputs found

    On Graphical Models via Univariate Exponential Family Distributions

    Full text link
    Undirected graphical models, or Markov networks, are a popular class of statistical models, used in a wide variety of applications. Popular instances of this class include Gaussian graphical models and Ising models. In many settings, however, it might not be clear which subclass of graphical models to use, particularly for non-Gaussian and non-categorical data. In this paper, we consider a general sub-class of graphical models where the node-wise conditional distributions arise from exponential families. This allows us to derive multivariate graphical model distributions from univariate exponential family distributions, such as the Poisson, negative binomial, and exponential distributions. Our key contributions include a class of M-estimators to fit these graphical model distributions; and rigorous statistical analysis showing that these M-estimators recover the true graphical model structure exactly, with high probability. We provide examples of genomic and proteomic networks learned via instances of our class of graphical models derived from Poisson and exponential distributions.Comment: Journal of Machine Learning Researc

    On the Reproducibility of TCGA Ovarian Cancer MicroRNA Profiles

    Get PDF
    Dysregulated microRNA (miRNA) expression is a well-established feature of human cancer. However, the role of specific miRNAs in determining cancer outcomes remains unclear. Using Level 3 expression data from the Cancer Genome Atlas (TCGA), we identified 61 miRNAs that are associated with overall survival in 469 ovarian cancers profiled by microarray (p<0.01). We also identified 12 miRNAs that are associated with survival when miRNAs were profiled in the same specimens using Next Generation Sequencing (miRNA-Seq) (p<0.01). Surprisingly, only 1 miRNA transcript is associated with ovarian cancer survival in both datasets. Our analyses indicate that this discrepancy is due to the fact that miRNA levels reported by the two platforms correlate poorly, even after correcting for potential issues inherent to signal detection algorithms. Further investigation is warranted

    Comprehensive evaluation of RNA-seq quantification methods for linearity

    Get PDF
    Figure S3. Concordant analysis between rank of estimated quantifications and rank of measured abundance value at gene level (a) and isoform level (b). The fitted value in the y-axis is estimated from model D∼m×A+n×B+ε. Ranks were normalized by the number of quantifications in each plot. (PDF 5950 kb

    Sequence space coverage, entropy of genomes and the potential to detect non-human DNA in human samples

    Get PDF
    Background: Genomes store information for building and maintaining organisms. Complete sequencing of many genomes provides the opportunity to study and compare global information properties of those genomes. Results: We have analyzed aspects of the information content of Homo sapiens, Mus musculus, Drosophila melanogaster, Caenorhabditis elegans, Arabidopsis thaliana, Saccharomyces cerevisiae, and Escherichia coli (K-12) genomes. Virtually all possible (\u3e 98%) 12 bp oligomers appear in vertebrate genomes while \u3c 2% of 19 bp oligomers are present. Other species showed different ranges of \u3e 98% to \u3c 2% of possible oligomers in D. melanogaster (12-17 bp), C. elegans (11-17 bp), A. thaliana (11-17 bp), S. cerevisiae (10-16 bp) and E. coli (9-15 bp). Frequencies of unique oligomers in the genomes follow similar patterns. We identified a set of 2.6 M 15-mers that are more than 1 nucleotide different from all 15-mers in the human genome and so could be used as probes to detect microbes in human samples. In a human sample, these probes would detect 100% of the 433 currently fully sequenced prokaryotes and 75% of the 3065 fully sequenced viruses. The human genome is significantly more compact in sequence space than a random genome. We identified the most frequent 5- to 20-mers in the human genome, which may prove useful as PCR primers. We also identified a bacterium, Anaeromyxobacter dehalogenans, which has an exceptionally low diversity of oligomers given the size of its genome and its GC content. The entropy of coding regions in the human genome is significantly higher than non-coding regions and chromosomes. However chromosomes 1, 2, 9, 12 and 14 have a relatively high proportion of coding DNA without high entropy, and chromosome 20 is the opposite with a low frequency of coding regions but relatively high entropy. Conclusion: Measures of the frequency of oligomers are useful for designing PCR assays and for identifying chromosomes and organisms with hidden structure that had not been previously recognized. This information may be used to detect novel microbes in human tissues

    The Adaptive Quadratic Linear Unit (AQuLU): Adaptive Non Monotonic Piecewise Activation Function

    Get PDF
    The activation function plays a key role in influencing the performance and training dynamics of neural networks. There are hundreds of activation functions widely used as rectified linear units (ReLUs), but most of them are applied to complex and large neural networks, which often have gradient explosion and vanishing gradient problems. By studying a variety of non-monotonic activation functions, we propose a method to construct a non-monotonic activation function, x·Φ(x), with Φ(x) [0, 1]. With the hardening treatment of Φ(x), we propose an adaptive non-monotonic segmented activation function, called the adaptive quadratic linear unit, abbreviated as AQuLU, which ensures the sparsity of the input data and improves training efficiency. In image classification based on different state-of-the-art neural network architectures, the performance of AQuLUs has significant advantages for more complex and deeper architectures with various activation functions. The ablation experimental study further validates the compatibility and stability of AQuLUs with different depths, complexities, optimizers, learning rates, and batch sizes. We thus demonstrate the high efficiency, robustness, and simplicity of AQuLUs

    Precipitation patterns and associated hydrological extremes in the Yangtze River basin, China, using TRMM/PR data and EOF analysis

    Get PDF
    A decadal-scale study to retrieve the spatio-temporal precipitation patterns of the Yangtze River basin, China, using the Tropical Rain Mapping Mission, Precipitation Radar (TRMM/PR) data is presented. The empirical orthogonal function (EOF) based on monthly TRMM/PR data extracts several leading precipitation patterns, which are largely connected with physical implications at the basin scale. With the aid of gauge station data, the amplitudes of major principal components (PCs) were used to examine the generic relationships between precipitation variations and hydrological extremes (e. g. floods and droughts) during summer seasons over the past decade. The emergence of such major precipitation patterns clearly reveals the possible linkages with hydrological processes, and the oscillations in relation to the amplitude of major PCs are consistent with these observed hydrological extremes. Although the floods in some sections of the Yangtze River were, to some extent, tied to human activities, such as the removal of wetlands, the variations in major precipitation patterns are recognized as the primary driving force of the flow extremes associated with floods and droughts. The research findings indicate that long-distance hydro-meteorological signals of large-scale precipitation variations over such a large river basin can be successfully identified with the aid of EOF analysis. The retrieved precipitation patterns and their low-frequency jumps of amplitude in relation to PCs are valuable tools to help understand the association between the precipitation variations and the occurrence of hydrological extremes. Such a study can certainly aid in disaster mitigation and decision-making in water resource management

    A Review of Particle Removal Due to Thermophoretic Deposition

    Get PDF
    Thermophoretic deposition is an important technique for particle removal. The thermophoretic force of the particles under an appropriate temperature gradient can achieve a good particle removal effect. At present, there have been many studies on the deposition mechanism of ultrafine particles under the action of thermophoresis. In this chapter, the development history and current research status of the research on the thermophoretic deposition effect of ultrafine particles are summarized, and the future direction of thermophoretic deposition is proposed

    Using SPOT-VGT NDVI as a successive ecological indicator for understanding the environmental implications in the Tarim River Basin, China

    Get PDF
    The resilience and vulnerability of terrestrial ecosystem in the Tarim River Basin, Xinjiang is critical in sustainable development of the northwest region in China. To learn more about causes of the ecosystem evolution in this wide region, vegetation dynamics can be a surrogate indicator of environmental responses and human perturbations. This paper aims to use the inter-annual and intra-annual coefficient of variation (CoV) derived by the SPOT-VGT Normalized Difference Vegetation Index (NDVI) as an integrated measure of vegetation dynamics to address the environmental implications in response to climate change. To finally pin down the vegetation dynamics, the intra-annual CoV based on monthly NDVI values and the inter-annual CoV based on seasonally accumulated NDVI values were respectively calculated. Such vegetation dynamics can then be associated with precipitation patterns extracted from the Tropical Rainfall Measuring Mission (TRMM) data and irrigation efforts reflecting the cross-linkages between human society and natural systems. Such a remote sensing analysis enables us to explore the complex vegetation dynamics in terms of distribution and evolution of the collective features of heterogeneity over local soil characteristics, climate change impacts, and anthropogenic activities at differing space and time scales. Findings clearly indicate that the vegetation changes had an obvious trend in some high mountainous areas as a result of climate change whereas the vegetation changes in fluvial plains reflected the increasing evidence of human perturbations due to anthropogenic activities. Some possible environmental implications were finally elaborated from those cross-linkages between economic development and resources depletion in the context of sustainable development
    corecore