10 research outputs found

    Study of Parallel Programming Models on Computer Clusters with Accelerators

    Get PDF
    In order to reach exascale computing capability, accelerators have become a crucial part in developing supercomputers. This work examines the potential of two latest acceleration technologies, Intel Many Integrated Core (MIC) Architecture and Graphics Processing Units (GPUs). This thesis applies three benchmarks under 3 different configurations, MPI+CPU, MPI+GPU, and MPI+MIC. The benchmarks include intensely communicating application, loosely communicating application, and embarrassingly parallel application. This thesis also carries out a detailed study on the scalability and performance of MIC processors under two programming models, i.e., offload model and native model, on the Beacon computer cluster. According to different benchmarks, the results demonstrate different performance and scalability between GPU and MIC. (1) For embarrassingly parallel case, GPU-based parallel implementation on Keeneland computer cluster has a better performance than other accelerators. However, MIC-based parallel implementation shows a better scalability than the implementation on GPU. The performances of native model and offload model on MIC are very close. (2) For loosely communicating case, the performances on GPU and MIC are very close. The MIC-based parallel implementation still demonstrates a strong scalability when using 120 MIC processors in computation. (3) For the intensely communicating case, the MPI implementations on CPUs and GPUs both have a strong scalability. GPUs can consistently outperform other accelerators. However, the MIC-based implementation cannot scale quite well. The performance of different models on MIC is different from the performance of embarrassingly parallel case. Native model can consistently outperform the offload model by ~10 times. And there is not much performance gain when allocating more MIC processors. The increase of communication cost will offset the performance gain from the reduced workload on each MIC core. This work also tests the performance capabilities and scalability by changing the number of threads on each MIC card form 10 to 60. When using different number of threads for the intensely communicating case, it shows different capabilities of the MIC based offload model. The scalability can hold when the number of threads increases from 10 to 30, and the computation time reduces with a smaller rate from 30 threads to 50 threads. When using 60 threads, the computation time will increase. The reason is that the communication overhead will offset the performance gain when 60 threads are deployed on a single MIC card

    Spectral-spatial classification of n-dimensional images in real-time based on segmentation and mathematical morphology on GPUs

    Get PDF
    The objective of this thesis is to develop efficient schemes for spectral-spatial n-dimensional image classification. By efficient schemes, we mean schemes that produce good classification results in terms of accuracy, as well as schemes that can be executed in real-time on low-cost computing infrastructures, such as the Graphics Processing Units (GPUs) shipped in personal computers. The n-dimensional images include images with two and three dimensions, such as images coming from the medical domain, and also images ranging from ten to hundreds of dimensions, such as the multiand hyperspectral images acquired in remote sensing. In image analysis, classification is a regularly used method for information retrieval in areas such as medical diagnosis, surveillance, manufacturing and remote sensing, among others. In addition, as the hyperspectral images have been widely available in recent years owing to the reduction in the size and cost of the sensors, the number of applications at lab scale, such as food quality control, art forgery detection, disease diagnosis and forensics has also increased. Although there are many spectral-spatial classification schemes, most are computationally inefficient in terms of execution time. In addition, the need for efficient computation on low-cost computing infrastructures is increasing in line with the incorporation of technology into everyday applications. In this thesis we have proposed two spectral-spatial classification schemes: one based on segmentation and other based on wavelets and mathematical morphology. These schemes were designed with the aim of producing good classification results and they perform better than other schemes found in the literature based on segmentation and mathematical morphology in terms of accuracy. Additionally, it was necessary to develop techniques and strategies for efficient GPU computing, for example, a block–asynchronous strategy, resulting in an efficient implementation on GPU of the aforementioned spectral-spatial classification schemes. The optimal GPU parameters were analyzed and different data partitioning and thread block arrangements were studied to exploit the GPU resources. The results show that the GPU is an adequate computing platform for on-board processing of hyperspectral information

    ELASTIC CLOUD COMPUTING ARCHITECTURE AND SYSTEM FOR HETEROGENEOUS SPATIOTEMPORAL COMPUTING

    Get PDF

    Efficient multitemporal change detection techniques for hyperspectral images on GPU

    Get PDF
    Hyperspectral images contain hundreds of reflectance values for each pixel. Detecting regions of change in multiple hyperspectral images of the same scene taken at different times is of widespread interest for a large number of applications. For remote sensing, in particular, a very common application is land-cover analysis. The high dimensionality of the hyperspectral images makes the development of computationally efficient processing schemes critical. This thesis focuses on the development of change detection approaches at object level, based on supervised direct multidate classification, for hyperspectral datasets. The proposed approaches improve the accuracy of current state of the art algorithms and their projection onto Graphics Processing Units (GPUs) allows their execution in real-time scenarios

    A Novel Methodology for Calculating Large Numbers of Symmetrical Matrices on a Graphics Processing Unit: Towards Efficient, Real-Time Hyperspectral Image Processing

    Get PDF
    Hyperspectral imagery (HSI) is often processed to identify targets of interest. Many of the quantitative analysis techniques developed for this purpose mathematically manipulate the data to derive information about the target of interest based on local spectral covariance matrices. The calculation of a local spectral covariance matrix for every pixel in a given hyperspectral data scene is so computationally intensive that real-time processing with these algorithms is not feasible with today’s general purpose processing solutions. Specialized solutions are cost prohibitive, inflexible, inaccessible, or not feasible for on-board applications. Advances in graphics processing unit (GPU) capabilities and programmability offer an opportunity for general purpose computing with access to hundreds of processing cores in a system that is affordable and accessible. The GPU also offers flexibility, accessibility and feasibility that other specialized solutions do not offer. The architecture for the NVIDIA GPU used in this research is significantly different from the architecture of other parallel computing solutions. With such a substantial change in architecture it follows that the paradigm for programming graphics hardware is significantly different from traditional serial and parallel software development paradigms. In this research a methodology for mapping an HSI target detection algorithm to the NVIDIA GPU hardware and Compute Unified Device Architecture (CUDA) Application Programming Interface (API) is developed. The RX algorithm is chosen as a representative stochastic HSI algorithm that requires the calculation of a spectral covariance matrix. The developed methodology is designed to calculate a local covariance matrix for every pixel in the input HSI data scene. A characterization of the limitations imposed by the chosen GPU is given and a path forward toward optimization of a GPU-based method for real-time HSI data processing is defined

    Desarrollo eficiente de algoritmos de clasificación difusa en entornos Big Data

    Get PDF
    Estamos presenciando una época de transición donde los “datos” son los principales protagonistas. En la actualidad, cada día se genera una ingente cantidad de información en la conocida como era del Big Data. La toma de decisiones basada en estos datos, su estructuración, organización, así como su correcta integración y análisis, constituyen un factor clave para muchos sectores estratégicos de la sociedad. En el tratamiento de cantidades grandes de datos, las técnicas de almacenamiento y análisis asociadas al Big Data nos proporcionan una gran ayuda. Entre estas técnicas predominan los algoritmos conocidos como machine learning, esenciales para el análisis predictivo a partir de grandes cantidades de datos. Dentro del campo del machine learning, los algoritmos de clasificación difusa son empleados con frecuencia para la resolución de una gran variedad de problemas, principalmente, los relacionados con control de procesos industriales complejos, sistemas de decisión en general, la resolución y la compresión de datos. Los sistemas de clasificación están también muy extendidos en la tecnología cotidiana, por ejemplo, en cámaras digitales, sistemas de aire acondicionado, etc. El éxito del uso de las técnicas de machine learning está limitado por las restricciones de los recursos computacionales actuales, especialmente, cuando se trabaja con grandes conjuntos de datos y requisitos de tiempo real. En este contexto, dichos algoritmos necesitan ser rediseñados e, incluso, repensados con la finalidad de aprovechar al máximo las arquitecturas masivamente paralelas que ofrecen el máximo rendimiento en la actualidad. Esta tesis doctoral se centra dentro de este contexto, analizando computacionalmente el actual panorama de algoritmos de clasificación y proponiendo algoritmos de clasificación paralelos que permitan ofrecer soluciones adecuadas en un intervalo de tiempo reducido. En concreto, se ha realizado un estudio en profundidad de técnicas bien conocidas de machine learning mediante un caso de aplicación práctica. Esta aplicación predice el nivel de ozono en diferentes áreas de la Región de Murcia. Dicho análisis se fundamentó en la recogida de distintos parámetros de contaminación para cada día durante los años 2013 y 2014. El estudio reveló que la técnica que obtenía mejores resultados fue Random Forest y se obtuvo una regionalización en dos grandes zonas, atendiendo a los datos procesados. A continuación, se centró el objetivo en los algoritmos de clasificación difusa. En este caso, se utilizó una modificación del algoritmo Fuzzy C-Means (FCM), mFCM, como técnica de discretización con el objetivo de convertir los datos de entrada de continuos a discretos. Este proceso tiene especial importancia debido a que hay determinados algoritmos que necesitan valores discretos para poder trabajar, incluso técnicas que sí trabajan con datos continuos, obtienen mejores resultados con datos discretos. Esta técnica fue validada a través de la aplicación al bien conocido conjunto de Iris Data de Anderson, donde se comparó estadísticamente con la técnica de K-Means (KM), proporcionando mejores resultados. Una vez realizado el estudio de los algoritmos de clasificación difusa, se detecta que dichas técnicas son sensibles a la cantidad de datos, incrementando su tiempo computacional. De modo que la eficiencia en la programación de estos algoritmos es un factor crítico para su posible aplicabilidad al Big Data. Por lo tanto, se propone la paralización de un algoritmo de clasificación difusa a fin de conseguir que la aplicación sea más rápida conforme aumente el grado de paralelismo del sistema. Para ello, se propuso el algoritmo de clasificación difusa Parallel Fuzzy Minimals (PFM) y se comparó con los algoritmos FCM y Fuzzy Minimals (FM) en diferentes conjuntos de datos. En términos de calidad, la clasificación era similar a la obtenida por los tres algoritmos, sin embargo, en términos de escalabilidad, el algoritmo paralelizado PFM obtenía una aceleración lineal con respecto al número de procesadores empleados. Habiendo identificado la necesidad de que dichas técnicas tengan que ser desarrolladas en entornos masivamente paralelos, se propone una infraestructura de hardware y software de alto rendimiento para procesar, en tiempo real, los datos obtenidos de varios vehículos en relación a variables que analizan problemas de contaminación y tráfico. Los resultados mostraron un rendimiento adecuado del sistema trabajando con grandes cantidades de datos y, en términos de escalabilidad, las ejecuciones fueron satisfactorias. Se visualizan grandes retos a la hora de identificar otras aplicaciones en entornos Big Data y ser capaces de utilizar dichas técnicas para la predicción en áreas tan relevantes como la contaminación, el tráfico y las ciudades inteligentes.Ingeniería, Industria y Construcció

    XSEDE: eXtreme Science and Engineering Discovery Environment Third Quarter 2012 Report

    Get PDF
    The Extreme Science and Engineering Discovery Environment (XSEDE) is the most advanced, powerful, and robust collection of integrated digital resources and services in the world. It is an integrated cyberinfrastructure ecosystem with singular interfaces for allocations, support, and other key services that researchers can use to interactively share computing resources, data, and expertise.This a report of project activities and highlights from the third quarter of 2012.National Science Foundation, OCI-105357

    An efficient parallel ISODATA algorithm based on Kepler GPUs

    No full text
    corecore