Search CORE

110 research outputs found

Scalable learning for geostatistics and speaker recognition

Author: Srinivasan Balaji Vasan
Publication venue
Publication date: 01/01/2011
Field of study

With improved data acquisition methods, the amount of data that is being collected has increased severalfold. One of the objectives in data collection is to learn useful underlying patterns. In order to work with data at this scale, the methods not only need to be effective with the underlying data, but also have to be scalable to handle larger data collections. This thesis focuses on developing scalable and effective methods targeted towards different domains, geostatistics and speaker recognition in particular. Initially we focus on kernel based learning methods and develop a GPU based parallel framework for this class of problems. An improved numerical algorithm that utilizes the GPU parallelization to further enhance the computational performance of kernel regression is proposed. These methods are then demonstrated on problems arising in geostatistics and speaker recognition. In geostatistics, data is often collected at scattered locations and factors like instrument malfunctioning lead to missing observations. Applications often require the ability interpolate this scattered spatiotemporal data on to a regular grid continuously over time. This problem can be formulated as a regression problem, and one of the most popular geostatistical interpolation techniques, kriging is analogous to a standard kernel method: Gaussian process regression. Kriging is computationally expensive and needs major modifications and accelerations in order to be used practically. The GPU framework developed for kernel methods is extended to kriging and further the GPU's texture memory is better utilized for enhanced computational performance. Speaker recognition deals with the task of verifying a person's identity based on samples of his/her speech - "utterances". This thesis focuses on text-independent framework and three new recognition frameworks were developed for this problem. We proposed a kernelized Renyi distance based similarity scoring for speaker recognition. While its performance is promising, it does not generalize well for limited training data and therefore does not compare well to state-of-the-art recognition systems. These systems compensate for the variability in the speech data due to the message, channel variability, noise and reverberation. State-of-the-art systems model each speaker as a mixture of Gaussians (GMM) and compensate for the variability (termed "nuisance"). We propose a novel discriminative framework using a latent variable technique, partial least squares (PLS), for improved recognition. The kernelized version of this algorithm is used to achieve a state of the art speaker ID system, that shows results competitive with the best systems reported on in NIST's 2010 Speaker Recognition Evaluation

Digital Repository at the University of Maryland

Large scale geostatistics with locally varying anisotropy

Author: Peredo Andrade Oscar Francisco
Publication venue: Universitat Politècnica de Catalunya
Publication date: 09/06/2022
Field of study

Classical geostatistical methods are based on the hypothesis of stationarity, which allows to apply repetitive sampling in different locations of the spatial domain, in order to obtain enough information to infer cumulative distributions. In case of non stationarity, anisotropy is observed in the underlying physical phenomena. This feature manifest itself as preferential directions of continuity in the phenomena, i.e. properties are more continuous in one orientation than in another. In the case of local anisotropy, each location of the domain in study presents different preferential directions of continuity. The locally varying anisotropy (LVA) approach in geostatistics allows to incorporate a field of local anisotropy parameters defined for each domain point. With this additional input, more realistic spatial simulations can be generated, including geological features to the computational model such as folds, veins, faults, among others. Since the seminal article published by Boisvert and Deutsch (2011), to the best of the author's knowledge, no further analysis or public code improvements were developed. This is in part because acceleration and parallelization techniques must be applied to the inner kernels of the baseline LVA codes. Large execution time is needed to generate small-scale domain simulations, making large-scale domain simulations a prohibitive task. The contributions of this thesis are accelerating and parallelizing classical and LVA-based geostatistical simulation methods, particularly sequential simulation, which is one of the most common and computationally intensive methods in the field. This fact was recently remarked by some of the main authors in the field, Gómez-Hernández and Srivastava (2021), which shows the relevance of this work today. Two main parallel algorithms and an optimized version of a kd-tree search implementation are presented, all of them applied to both classical and LVA-based sequential simulation implementations. The first parallel algorithm is related to the parallel simulation of different domain points, after rearranging the order of simulation but preserving the exact results of a single-thread execution. The second parallel algorithm is related to the parallel search of neighbour points in the domain, which will be used to build data dependencies for the parallel simulation of points. The optimized kd-tree search was used in each test case in order to reduce the computational complexity of neighbour search tasks. Its modified implementation reduces the number of branching instructions and introduces specialized code sections to accelerate the execution. The main focus is on multi-core architectures using OpenMP and optimization techniques applied to Fortran and C++ codes. Additionally, acceleration and parallelization techniques were also applied to auxiliary applications, such as shortest path and variogram calculation on hybrid CPU/GPU architectures using Fortran, C++ and CUDA codes. In the last application, an analytical and heuristic model was developed to estimate the optimal workload distribution between CPU and GPU in the hybrid context. The overall results of this work are a set of applications that will allow researchers and practitioners to accelerate dramatically the execution of their experiments and simulations, being sgsim, sisim, sgs-lva and sisim-lva the accelerated codes presented. Final speedup results of 11x and 50x are obtained for non-LVA codes using 16 threads, and 56x and 1822x are obtained for LVA codes using 20 threads. These tools can be combined with other geostatistical tools, in order to improve the existing landscape of open source codes that can be used in practical scenarios.Los métodos geoestadísticos clásicos se basan en la hipótesis de la estacionariedad, que permite aplicar muestreos repetitivos en diferentes lugares del dominio espacial, con el fin de obtener información suficiente para inferir distribuciones acumuladas. En caso de no estacionariedad, se observa anisotropía en los fenómenos físicos subyacentes. Esta característica se manifiesta como direcciones preferenciales de continuidad en los fenómenos, es decir, las propiedades son más continuas en una orientación que en otra. En el caso de la anisotropía local, cada ubicación del dominio en estudio puede presentar diferentes direcciones preferenciales de continuidad. El enfoque de anisotropía localmente variable (LVA) en geoestadística permite incorporar un campo de parámetros de anisotropía locales definidos para cada punto de dominio. Con esta entrada adicional, se pueden generar simulaciones espaciales más realistas, incluyendo características geológicas al modelo computacional como pliegues, vetas, fallas, entre otras. Desde el artículo seminal publicado por Boisvert y Deutsch (2011), según el conocimiento del autor, no se han desarrollado más análisis ni mejoras en el código público. Esto se debe en parte a que se deben aplicar técnicas de aceleración y paralelización a los núcleos internos de los códigos LVA de referencia. Se necesita mucho tiempo de ejecución para generar simulaciones de dominio a pequeña escala, lo que hace que las simulaciones de dominio a gran escala sean una tarea prohibitiva. Las contribuciones de esta tesis consisten en acelerar y paralelizar métodos de simulación geoestadística clásicos y basados en LVA, particularmente la simulación secuencial, que es uno de los métodos más comunes e intensivos en computación en el campo. Este hecho fue señalado recientemente por algunos de los principales autores en el campo, Gómez-Hernández y Srivastava (2021), lo que demuestra la relevancia de este trabajo en la actualidad. Se presentan dos algoritmos paralelos principales y una versión optimizada de una implementación de búsqueda de árbol kd, todos ellos aplicados a implementaciones de simulación secuencial clásicas y basadas en LVA. El primer algoritmo paralelo está relacionado con la simulación paralela de diferentes puntos del dominio, después de reorganizar el orden de simulación pero conservando los resultados exactos de una ejecución de un solo hilo. El segundo algoritmo paralelo está relacionado con la búsqueda paralela de puntos vecinos en el dominio, que se utilizará para resolver dependencias de datos para la simulación paralela de puntos. La búsqueda optimizada de kd-tree se utilizó en cada caso de prueba para reducir la complejidad computacional de las tareas de búsqueda de vecinos. Su implementación modificada reduce el número de instrucciones branching e introduce código especializado para acelerar la ejecución. El foco principal está en arquitecturas multi-núcleo usando OpenMP y técnicas de optimización aplicadas a códigos Fortran y C++. Además, también se aplicaron técnicas de aceleración y paralelización a aplicaciones auxiliares, como el cálculo de la ruta más corta en un grafo y el cálculo de variogramas en arquitecturas híbridas CPU/GPU utilizando códigos Fortran, C++ y CUDA. En la última aplicación, se desarrolló un modelo analítico y heurístico para estimar la distribución óptima de la carga de trabajo entre CPU y GPU en el contexto híbrido. Los resultados generales de este trabajo son un conjunto de aplicaciones que permitirán a los investigadores y profesionales acelerar la ejecución de sus experimentos, siendo sgsim, sisim, sgs-lva y sisim-lva los códigos acelerados. Se obtienen resultados finales de aceleración de 11x y 50x para códigos que no son LVA usando 16 hilos, y se obtienen 56x y 1822x para códigos LVA usando 20 hilos. Estas herramientas se pueden combinar con otras herramientas geoestadícasPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Tesis Doctorals en Xarxa

Automatic Performance Optimization on Heterogeneous Computer Systems using Manycore Coprocessors

Author: Lai Chenggang
Publication venue: ScholarWorks@UARK
Publication date: 01/12/2018
Field of study

Emerging computer architectures and advanced computing technologies, such as Intel’s Many Integrated Core (MIC) Architecture and graphics processing units (GPU), provide a promising solution to employ parallelism for achieving high performance, scalability and low power consumption. As a result, accelerators have become a crucial part in developing supercomputers. Accelerators usually equip with different types of cores and memory. It will compel application developers to reach challenging performance goals. The added complexity has led to the development of task-based runtime systems, which allow complex computations to be expressed as task graphs, and rely on scheduling algorithms to perform load balancing between all resources of the platforms. Developing good scheduling algorithms, even on a single node, and analyzing them can thus have a very high impact on the performance of current HPC systems. Load balancing strategies, at different levels, will be critical to obtain an effective usage of the heterogeneous hardware and to reduce the impact of communication on energy and performance. Implementing efficient load balancing algorithms, able to manage heterogeneous hardware, can be a challenging task, especially when a parallel programming model for distributed memory architecture. In this paper, we presents several novel runtime approaches to determine the optimal data and task partition on heterogeneous platforms, targeting the Intel Xeon Phi accelerated heterogeneous systems

ScholarWorks@UARK

UARK (University of Arkansas )

ELASTIC CLOUD COMPUTING ARCHITECTURE AND SYSTEM FOR HETEROGENEOUS SPATIOTEMPORAL COMPUTING

Author
Publication venue: 'Copernicus GmbH'
Publication date
Field of study

Crossref

Recommended from our members

On Simplified Bayesian Modeling for Massive Geostatistical Datasets: Conjugacy and Beyond

Author: Zhang Lu
Publication venue: eScholarship, University of California
Publication date: 01/01/2020
Field of study

With continued advances in Geographic Information Systems and related computational technologies, researchers in diverse fields like forestry, environmental health, climate sciences etc. have growing interests in analyzing large scale data sets measured at a substantial number of geographic locations. Geostatistical models used to capture the space varying relationships in such data are often accompanied by onerous computations which prohibit the analysis of large scale spatial data sets. Less burdensome alternatives proposed recently for analyzing massive spatial datasets often lead to inaccurate inference or require slow sampling process. Bayesian inference, while attractive for accommodating uncertainties through their hierarchical structures, can become computationally onerous for modeling massive spatial data sets because of their reliance on iterative estimation algorithms. My dissertation research aims at developing computationally scalable Bayesian geostatistical models that provide valid inference through highly accelerated sampling process. We also study the asymptotic properties of estimators in spatial analysis.In Chapter 2 and 3, we develop conjugate Bayesian frameworks for analyzing univariate and multivariate spatial data. We propose a conjugate latent Nearest-Neighbor Gaussian Process (NNGP) model in Chapter 2, which uses analytically tractable posterior distributions to obtain posterior inferences, including the large dimensional latent process. In Chapter 3, we focus on building conjugate Bayesian frameworks for analyzing multivariate spatial data. We utilize Matrix-Normal Inverse-Wishart(MNIW) prior to propose conjugate Bayesian frameworks and algorithms that can incorporate a family of scalable spatial modeling methodologies.In Chapter 4, we pursue general Bayesian modeling methodologies beyond a conjugate Bayesian hierarchical modeling. We build scalable versions of a hierarchical linear model of coregionalization (LMC) and spatial factor models, and propose a highly accelerated block update MCMC algorithm. Using the proposed Bayesian LMC model, we extend scalable modeling strategies for a single process into multivariate process cases. All proposed frameworks are tested on simulated data and fit to real data sets with observed locations numbering in the millions. Our contribution is to offer practicing scientists and spatial analysts practical and flexible scalable hierarchical models for analyzing massive spatial data sets.In Chapter 5, we investigate the asymptotic properties of the estimators in spatial analysis. We formally establish results on the identifiability and consistency of the nugget in spatial models based upon the Gaussian process within the framework of in-fill asymptotics, i.e. the sample size increases within a sampling domain that is bounded. We establish the identifiability of parameters in the Matern covariance function and the consistency of their maximum likelihood estimators in the presence of discontinuities due to the nugget

eScholarship - University of California

Beyond 16GB: Out-of-Core Stencil Computations

Author: Giles Michael B.
Mudalige Gihan R.
Reguly István Zoltán
Publication venue: ACM Press
Publication date: 01/01/2017
Field of study

Stencil computations are a key class of applications, widely used in the scientific computing community, and a class that has particularly benefited from performance improvements on architectures with high memory bandwidth. Unfortunately, such architectures come with a limited amount of fast memory, which is limiting the size of the problems that can be efficiently solved. In this paper, we address this challenge by applying the well-known cache-blocking tiling technique to large scale stencil codes implemented using the OPS domain specific language, such as CloverLeaf 2D, CloverLeaf 3D, and OpenSBLI. We introduce a number of techniques and optimisations to help manage data resident in fast memory, and minimise data movement. Evaluating our work on Intel's Knights Landing Platform as well as NVIDIA P100 GPUs, we demonstrate that it is possible to solve 3 times larger problems than the on-chip memory size with at most 15\% loss in efficienc

arXiv.org e-Print Archive

Warwick Research Archives Portal Repository

Oxford University Research Archive

Repository of the Academy's Library

Task-based Runtime Optimizations Towards High Performance Computing Applications

Author: Cao Qinglei
Publication venue: TRACE: Tennessee Research and Creative Exchange
Publication date: 01/08/2022
Field of study

The last decades have witnessed a rapid improvement of computational capabilities in high-performance computing (HPC) platforms thanks to hardware technology scaling. HPC architectures benefit from mainstream advances on the hardware with many-core systems, deep hierarchical memory subsystem, non-uniform memory access, and an ever-increasing gap between computational power and memory bandwidth. This has necessitated continuous adaptations across the software stack to maintain high hardware utilization. In this HPC landscape of potentially million-way parallelism, task-based programming models associated with dynamic runtime systems are becoming more popular, which fosters developers’ productivity at extreme scale by abstracting the underlying hardware complexity. In this context, this dissertation highlights how a software bundle powered by a task-based programming model can address the heterogeneous workloads engendered by HPC applications., i.e., data redistribution, geospatial modeling and 3D unstructured mesh deformation here. Data redistribution aims to reshuffle data to optimize some objective for an algorithm, whose objective can be multi-dimensional, such as improving computational load balance or decreasing communication volume or cost, with the ultimate goal of increasing the efficiency and therefore reducing the time-to-solution for the algorithm. Geostatistical modeling, one of the prime motivating applications for exascale computing, is a technique for predicting desired quantities from geographically distributed data, based on statistical models and optimization of parameters. Meshing the deformable contour of moving 3D bodies is an expensive operation that can cause huge computational challenges in fluid-structure interaction (FSI) applications. Therefore, in this dissertation, Redistribute-PaRSEC, ExaGeoStat-PaRSEC and HiCMA-PaRSEC are proposed to efficiently tackle these HPC applications respectively at extreme scale, and they are evaluated on multiple HPC clusters, including AMD-based, Intel-based, Arm-based CPU systems and IBM-based multi-GPU system. This multidisciplinary work emphasizes the need for runtime systems to go beyond their primary responsibility of task scheduling on massively parallel hardware system for servicing the next-generation scientific applications

University of Tennessee, Knoxville: Trace