4,221 research outputs found

    Random Feature Maps via a Layered Random Projection (LaRP) Framework for Object Classification

    Full text link
    The approximation of nonlinear kernels via linear feature maps has recently gained interest due to their applications in reducing the training and testing time of kernel-based learning algorithms. Current random projection methods avoid the curse of dimensionality by embedding the nonlinear feature space into a low dimensional Euclidean space to create nonlinear kernels. We introduce a Layered Random Projection (LaRP) framework, where we model the linear kernels and nonlinearity separately for increased training efficiency. The proposed LaRP framework was assessed using the MNIST hand-written digits database and the COIL-100 object database, and showed notable improvement in object classification performance relative to other state-of-the-art random projection methods.Comment: 5 page

    Locality-aware parallel block-sparse matrix-matrix multiplication using the Chunks and Tasks programming model

    Full text link
    We present a method for parallel block-sparse matrix-matrix multiplication on distributed memory clusters. By using a quadtree matrix representation, data locality is exploited without prior information about the matrix sparsity pattern. A distributed quadtree matrix representation is straightforward to implement due to our recent development of the Chunks and Tasks programming model [Parallel Comput. 40, 328 (2014)]. The quadtree representation combined with the Chunks and Tasks model leads to favorable weak and strong scaling of the communication cost with the number of processes, as shown both theoretically and in numerical experiments. Matrices are represented by sparse quadtrees of chunk objects. The leaves in the hierarchy are block-sparse submatrices. Sparsity is dynamically detected by the matrix library and may occur at any level in the hierarchy and/or within the submatrix leaves. In case graphics processing units (GPUs) are available, both CPUs and GPUs are used for leaf-level multiplication work, thus making use of the full computing capacity of each node. The performance is evaluated for matrices with different sparsity structures, including examples from electronic structure calculations. Compared to methods that do not exploit data locality, our locality-aware approach reduces communication significantly, achieving essentially constant communication per node in weak scaling tests.Comment: 35 pages, 14 figure

    A topographic mechanism for arcing of dryland vegetation bands

    Full text link
    Banded patterns consisting of alternating bare soil and dense vegetation have been observed in water-limited ecosystems across the globe, often appearing along gently sloped terrain with the stripes aligned transverse to the elevation gradient. In many cases these vegetation bands are arced, with field observations suggesting a link between the orientation of arcing relative to the grade and the curvature of the underlying terrain. We modify the water transport in the Klausmeier model of water-biomass interactions, originally posed on a uniform hillslope, to qualitatively capture the influence of terrain curvature on the vegetation patterns. Numerical simulations of this modified model indicate that the vegetation bands change arcing-direction from convex-downslope when growing on top of a ridge to convex-upslope when growing in a valley. This behavior is consistent with observations from remote sensing data that we present here. Model simulations show further that whether bands grow on ridges, valleys, or both depends on the precipitation level. A survey of three banded vegetation sites, each with a different aridity level, indicates qualitatively similar behavior.Comment: 26 pages, 13 figures, 2 table

    Finding Banded Patterns in Data: The Banded Pattern Mining Algorithm

    Get PDF

    Finding banded patternsin large data set using segmentation

    Get PDF
    No Abstrac

    Atmospheric Circulation of Brown Dwarfs and Jupiter and Saturn-like Planets: Zonal Jets, Long-term Variability, and QBO-type Oscillations

    Full text link
    Brown dwarfs and directly imaged giant planets exhibit significant evidence for active atmospheric circulation, which induces a large-scale patchiness in the cloud structure that evolves significantly over time, as evidenced by infrared light curves and Doppler maps. These observations raise critical questions about the fundamental nature of the circulation, its time variability, and the overall relationship to the circulation on Jupiter and Saturn. Jupiter and Saturn themselves exhibit numerous robust zonal (east-west) jet streams at the cloud level; moreover, both planets exhibit long-term stratospheric oscillations involving perturbations of zonal wind and temperature that propagate downward over time on timescales of ~4 years (Jupiter) and ~15 years (Saturn). These oscillations, dubbed the Quasi Quadrennial Oscillation (QQO) for Jupiter and the Semi-Annual Oscillation (SAO) on Saturn, are thought to be analogous to the Quasi-Biennial Oscillation (QBO) on Earth, which is driven by upward propagation of equatorial waves from the troposphere. To investigate these issues, we here present global, three-dimensional, high-resolution numerical simulations of the flow in the stratified atmosphere--overlying the convective interior--of brown dwarfs and Jupiter-like planets. The effect of interior convection is parameterized by inducing small-scale, randomly varying perturbations in the radiative-convective boundary at the base of the model. In the simulations, the convective perturbations generate atmospheric waves and turbulence that interact with the rotation to produce numerous zonal jets. Moreover, the equatorial stratosphere exhibits stacked eastward and westward jets that migrate downward over time, exactly as occurs in the terrestrial QBO, Jovian QQO, and Saturnian SAO. This is the first demonstration of a QBO-like phenomenon in 3D numerical simulations of a giant planet.Comment: 27 pages, 15 figures, in press at ApJ; this is the revised (accepted) version, which includes a major new section providing detailed analysis of the types of wave modes present in the model, and characterizing the wave-mean-flow interactions by which they generate the QBO-like oscillation

    A Reduction of the Dynamic Time Warping Distance to the Longest Increasing Subsequence Length

    Get PDF

    Assisted Network Analysis in Cancer Genomics

    Get PDF
    Cancer is a molecular disease. In the past two decades, we have witnessed a surge of high- throughput profiling in cancer research and corresponding development of high-dimensional statistical techniques. In this dissertation, the focus is on gene expression, which has played a uniquely important role in cancer research. Compared to some other types of molecular measurements, for example DNA changes, gene expressions are “closer” to cancer outcomes. In addition, processed gene expression data have good statistical properties, in particular, continuity. In the “early” cancer gene expression data analysis, attention has been on marginal properties such as mean and variance. Genes function in a coordinated way. As such, techniques that take a system perspective have been developed to also take into account the interconnections among genes. Among such techniques, graphical models, with lucid biological interpretations and satisfactory statistical properties, have attracted special attention. Graphical model-based analysis can not only lead to a deeper understanding of genes’ properties but also serve as a basis for other analyses, for example, regression and clustering. Cancer molecular studies usually have limited sizes. In the graphical model- based analysis, the number of parameters to be estimated gets squared. Combined together, they lead to a serious lack of information.The overarching goal of this dissertation is to conduct more effective graphical model analysis for cancer gene expression studies. One literature review and three methodological projects have been conducted. The overall strategy is to borrow strength from additional information so as to assist gene expression graphical model estimation. In the first chapter, the literature review is conducted. The methods developed in Chapter 2 and Chapter 4 take advantage of information on regulators of gene expressions (such as methylation, copy number variation, microRNA, and others). As they belong to the vertical data integration framework, we first provide a review of such data integration for gene expression data in Chapter 1. Additional, graphical model-based analysis for gene expression data is reviewed. Research reported in this chapter has led to a paper published in Briefings in Bioinformat- ics. In Chapters 2-4, to accommodate the extreme complexity of information-borrowing for graphical models, three different approaches have been proposed. In Chapter 2, two graphical models, with a gene-expression-only one and a gene-expression-regulator one, are simultaneously considered. A biologically sensible hierarchy between the sparsity structures of these two networks is developed, which is the first of its kind. This hierarchy is then used to link the estimation of the two graphical models. This work has led to a paper published in Genetic Epidemiology. In Chapter 3, additional information is mined from published literature, for example, those deposited at PubMed. The consideration is that published studies have been based on many independent experiments and can contain valuable in- formation on genes’ interconnections. The challenge is to recognize that such information can be partial or even wrong. A two-step approach, consisting of information-guided and information-incorporated estimations, is developed. This work has led to a paper published in Biometrics. In Chapter 4, we slightly shift attention and examine the difference in graphs, which has important implications for understanding cancer development and progression. Our strategy is to link changes in gene expression graphs with those in regulator graphs, which means additional information for estimation. It is noted that to make individual chapters standing-alone, there can be minor overlapping in descriptions. All methodological developments in this research fit the advanced penalization paradigm, which has been popular for cancer gene expression and other molecular data analysis. This methodological coherence is highly desirable. For the methods described in Chapters 2- 4, we have developed new penalized estimations which have lucid interpretations and can directly lead to variable selection (and so sparse and interpretable graphs). We have also developed effective computational algorithms and R codes, which have been made publicly available at Dr. Shuangge Ma’s Github software repository. For the methods described in Chapters 2 and 3, statistical properties under ultrahigh dimensional settings and mild regularity conditions have been established, providing the proposed methods a uniquely strong ground. Statistical properties for the method developed in Chapter 4 are relatively straightforward and hence are omitted. For all the proposed methods, we have conducted extensive simulations, comparisons with the most relevant competitors, and data analysis. The practical advantage is fully established. Overall, this research has delivered a practically sensible information-incorporating strategy for improving graphical model-based analysis for cancer gene expression data, multiple highly competitive methods, R programs that can have broad utilization, and new findings for multiple cancer types
    • …
    corecore