4,363 research outputs found

    Dimension Reduction in Nonparametric Discriminant Analysis

    Get PDF
    A dimension reduction method in kernel discriminant analysis is presented, based on the concept of dimension reduction subspace. Examples of application are discussed.

    Mapping crime: Understanding Hotspots

    Get PDF

    Data-Driven Shape Analysis and Processing

    Full text link
    Data-driven methods play an increasingly important role in discovering geometric, structural, and semantic relationships between 3D shapes in collections, and applying this analysis to support intelligent modeling, editing, and visualization of geometric data. In contrast to traditional approaches, a key feature of data-driven approaches is that they aggregate information from a collection of shapes to improve the analysis and processing of individual shapes. In addition, they are able to learn models that reason about properties and relationships of shapes without relying on hard-coded rules or explicitly programmed instructions. We provide an overview of the main concepts and components of these techniques, and discuss their application to shape classification, segmentation, matching, reconstruction, modeling and exploration, as well as scene analysis and synthesis, through reviewing the literature and relating the existing works with both qualitative and numerical comparisons. We conclude our report with ideas that can inspire future research in data-driven shape analysis and processing.Comment: 10 pages, 19 figure

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

    Group transformation and identification with kernel methods and big data mixed logistic regression

    Get PDF
    Exploratory Data Analysis (EDA) is a crucial step in the life cycle of data analysis. Exploring data with effective methods would reveal main characteristics of data and provides guidance for model building. The goal of this thesis is to develop effective and efficient methods for data exploration in the regression setting. First, we propose to use optimal group transformations as a general approach for exploring the relationship between predictor variables X and the response Y. This approach can be considered an automatic procedure to identify the best characteristic of P( Y|X) under which the relationship between Y and X can be fully explored. The emphasis on using group transformations allows the approach to recover true group structures among the predictors. We also develop kernel methods for estimating the optimal group transformations based on cross-covariance and conditional covariance operators. The statistical consistency of the estimates has been established. We refer to the proposed framework and approach as the Optimal Kernel Group Transformation (OKGT) method. Secondly, we define the true additive group structure for OKGT when the response transformation is known, and further develop an effective penalized kernel regression method for its identification. The procedure uses a novel penalty we propose to control the complexity of additive group structures. This method is referred to as the Additive Group Structure Identification (AGSI). We also establish the selection consistency for AGSI. Finally, we construct the Hierarchical Mixed Logistic Regression Model (HMLRM) and propose to use it for exploring heterogeneity in big data. By explicitly modeling the hidden layer, we individualize the calculation of the probability that a sample belongs to a subpopulation. While estimating the model parameters by EM algorithm, the separability of the parameter space is exploited. In order to apply HMLRM on big data, we design a distributed algorithm for model estimation which is implemented in Apache Spark

    Statistical Regression Methods for GPGPU Design Space Exploration

    Get PDF
    General Purpose Graphics Processing Units (GPGPUs) have leveraged the performance and power efficiency of today\u27s heterogeneous systems to usher in a new era of innovation in high-performance scientific computing. These systems can offer significantly high performance for massively parallel applications; however, their resources may be wasted due to inefficient tuning strategies. Previous application tuning studies pre-dominantly employ low-level, architecture specific tuning which can make the performance modeling task difficult and less generic. In this research, we explore the GPGPU design space featuring the memory hierarchy for application tuning using regression-based performance prediction framework and rank the design space based on the runtime performance. The regression-based framework models the GPGPU device computations using algorithm characteristics such as the number of floating-point operations, total number of bytes, and hardware parameters pertaining to the GPGPU memory hierarchy as predictor variables. The computation component regression models are developed using several instrumented executions of the algorithms that include a range of FLOPS-to-Byte requirement. We validate our model with a Synchronous Iterative Algorithm (SIA) set that includes Spiking Neural Networks (SNNs) and Anisotropic Diffusion Filtering (ADF) for massive images. The highly parallel nature of the above mentioned algorithms, in addition to their wide range of communication-to-computation complexities, makes them good candidates for this study. A hierarchy of implementations for the SNNs and ADF is constructed and ranked using the regression-based framework. We further illustrate the Synchronous Iterative GPGPU Execution (SIGE) model on the GPGPU-augmented Palmetto Cluster. The performance prediction framework maps appropriate design space implementation for 4 out of 5 case studies used in this research. The final goal of this research is to establish the efficacy of the regression-based framework to accurately predict the application kernel runtime, allowing developers to correctly rank their design space prior to the large-scale implementation
    • …
    corecore