18 research outputs found

    How general-purpose can a GPU be?

    Get PDF
    The use of graphics processing units (GPUs) in general-purpose computation (GPGPU) is a growing field. GPU instruction sets, while implementing a graphics pipeline, draw from a range of single instruction multiple datastream (SIMD) architectures characteristic of the heyday of supercomputers. Yet only one of these SIMD instruction sets has been of application on a wide enough range of problems to survive the era when the full range of supercomputer design variants was being explored: vector instructions. Supercomputers covered a range of exotic designs such as hypercubes and the Connection Machine (Fox, 1989). The latter is likely the source of the snide comment by Cray: it had thousands of relatively low-speed CPUs (Tucker & Robertson, 1988). Since Cray won, why are we not basing our ideas on his designs (Cray Inc., 2004), rather than those of the losers? The Top 500 supercomputer list is dominated by general-purpose CPUs, and nothing like the Connection Machine that headed the list in 1993 still exists

    Preventing premature convergence and proving the optimality in evolutionary algorithms

    Get PDF
    http://ea2013.inria.fr//proceedings.pdfInternational audienceEvolutionary Algorithms (EA) usually carry out an efficient exploration of the search-space, but get often trapped in local minima and do not prove the optimality of the solution. Interval-based techniques, on the other hand, yield a numerical proof of optimality of the solution. However, they may fail to converge within a reasonable time due to their inability to quickly compute a good approximation of the global minimum and their exponential complexity. The contribution of this paper is a hybrid algorithm called Charibde in which a particular EA, Differential Evolution, cooperates with a Branch and Bound algorithm endowed with interval propagation techniques. It prevents premature convergence toward local optima and outperforms both deterministic and stochastic existing approaches. We demonstrate its efficiency on a benchmark of highly multimodal problems, for which we provide previously unknown global minima and certification of optimality

    Scalable parallel evolutionary optimisation based on high performance computing

    Get PDF
    Evolutionary algorithms (EAs) have been successfully applied to solve various challenging optimisation problems. Due to their stochastic nature, EAs typically require considerable time to find desirable solutions; especially for increasingly complex and large-scale problems. As a result, many works studied implementing EAs on parallel computing facilities to accelerate the time-consuming processes. Recently, the rapid development of modern parallel computing facilities such as the high performance computing (HPC) bring not only unprecedented computational capabilities but also challenges on designing parallel algorithms. This thesis mainly focuses on designing scalable parallel evolutionary optimisation (SPEO) frameworks which run efficiently on the HPC. Motivated by the interesting phenomenon that many EAs begin to employ increasingly large population sizes, this thesis firstly studies the effect of a large population size through comprehensive experiments. Numerical results indicate that a large population benefits to the solving of complex problems but requires a large number of maximal fitness evaluations (FEs). However, since sequential EAs usually requires a considerable computing time to achieve extensive FEs, we propose a scalable parallel evolutionary optimisation framework that can efficiently deploy parallel EAs over many CPU cores at CPU-only HPC. On the other hand, since EAs using a large number of FEs can produce massive useful information in the course of evolution, we design a surrogate-based approach to learn from this historical information and to better solve complex problems. Then this approach is implemented in parallel based on the proposed scalable parallel framework to achieve remarkable speedups. Since demanding a great computing power on CPU-only HPC is usually very expensive, we design a framework based on GPU-enabled HPC to improve the cost-effectiveness of parallel EAs. The proposed framework can efficiently accelerate parallel EAs using many GPUs and can achieve superior cost-effectiveness. However, since it is very challenging to correctly implement parallel EAs on the GPU, we propose a set of guidelines to verify the correctness of GPU-based EAs. In order to examine these guidelines, they are employed to verify a GPU-based brain storm optimisation that is also proposed in this thesis. In conclusion, the comprehensively experimental study is firstly conducted to investigate the impacts of a large population. After that, a SPEO framework based on CPU-only HPC is proposed and is employed to accelerate a time-consuming implementation of EA. Finally, the correctness verification of implementing EAs based on a single GPU is discussed and the SPEO framework is then extended to be deployed based on GPU-enabled HPC

    Algorithms and architectures for MCMC acceleration in FPGAs

    Get PDF
    Markov Chain Monte Carlo (MCMC) is a family of stochastic algorithms which are used to draw random samples from arbitrary probability distributions. This task is necessary to solve a variety of problems in Bayesian modelling, e.g. prediction and model comparison, making MCMC a fundamental tool in modern statistics. Nevertheless, due to the increasing complexity of Bayesian models, the explosion in the amount of data they need to handle and the computational intensity of many MCMC algorithms, performing MCMC-based inference is often impractical in real applications. This thesis tackles this computational problem by proposing Field Programmable Gate Array (FPGA) architectures for accelerating MCMC and by designing novel MCMC algorithms and optimization methodologies which are tailored for FPGA implementation. The contributions of this work include: 1) An FPGA architecture for the Population-based MCMC algorithm, along with two modified versions of the algorithm which use custom arithmetic precision in large parts of the implementation without introducing error in the output. Mapping the two modified versions to an FPGA allows for more parallel modules to be instantiated in the same chip area. 2) An FPGA architecture for the Particle MCMC algorithm, along with a novel algorithm which combines Particle MCMC and Population-based MCMC to tackle multi-modal distributions. A proposed FPGA architecture for the new algorithm achieves higher datapath utilization than the Particle MCMC architecture. 3) A generic method to optimize the arithmetic precision of any MCMC algorithm that is implemented on FPGAs. The method selects the minimum precision among a given set of precisions, while guaranteeing a user-defined bound on the output error. By applying the above techniques to large-scale Bayesian problems, it is shown that significant speedups (one or two orders of magnitude) are possible compared to state-of-the-art MCMC algorithms implemented on CPUs and GPUs, opening the way for handling complex statistical analyses in the era of ubiquitous, ever-increasing data.Open Acces

    Programming issues for video analysis on Graphics Processing Units

    Get PDF
    El procesamiento de vídeo es la parte del procesamiento de señales, donde las señales de entrada y/o de salida son secuencias de vídeo. Cubre una amplia variedad de aplicaciones que son, en general, de cálculo intensivo, debido a su complejidad algorítmica. Por otra parte, muchas de estas aplicaciones exigen un funcionamiento en tiempo real. El cumplimiento de estos requisitos hace necesario el uso de aceleradores hardware como las Unidades de Procesamiento Gráfico (GPU). El procesamiento de propósito general en GPU representa una tendencia exitosa en la computación de alto rendimiento, desde el lanzamiento de la arquitectura y el modelo de programación NVIDIA CUDA. Esta tesis doctoral trata sobre la paralelización eficiente de aplicaciones de procesamiento de vídeo en GPU. Este objetivo se aborda desde dos vertientes: por un lado, la programación adecuada de la GPU para aplicaciones de vídeo; por otro lado, la GPU debe ser considerada como parte de un sistema heterogéneo. Dado que las secuencias de vídeo se componen de fotogramas, que son estructuras de datos regulares, muchos componentes de las aplicaciones de vídeo son inherentemente paralelizables. Sin embargo, otros componentes son irregulares en el sentido de que llevan a cabo cálculos que dependen de la carga de trabajo, sufren contención en la escritura, contienen partes inherentemente secuenciales o desbalanceadas en carga... Esta tesis propone estrategias para hacer frente a estos aspectos, a través de varios casos de estudio. También se describe una aproximación optimizada al cálculo de histogramas basada en un modelo de rendimiento de la memoria. Las secuencias de vídeo son flujos continuos que deben ser transferidos desde el ¿host¿ (CPU) al dispositivo (GPU), y los resultados del dispositivo al ¿host¿. Esta tesis doctoral propone el uso de CUDA streams para implementar el paradigma de ¿stream processing¿ en la GPU, con el fin de controlar la ejecución simultánea de las transferencias de datos y de la computación. También propone modelos de rendimiento que permiten una ejecución óptima

    Trading the stock market : hybrid financial analyses and evolutionary computation

    Get PDF
    Tesis inédita de la Universidad Complutense de Madrid, Facultad de Informática, Departamento de Arquitectura de Computadores y Automática, leída el 02-07-2014Esta tesis presenta la implementación de un innovador sistema de comercio automatizado que utiliza tres importantes análisis para determinar lugares y momentos de inversión. Para ello, este trabajo profundiza en sistemas automáticos de comercio y estudia series temporales de precios históricos pertenecientes a empresas que cotizan en el mercado bursátil. Estudiamos y clasifcamos las series temporales mediante el uso de una novedosa metodología basada en compresores de software. Este nuevo enfoque permite un estudio teórico de la formación de precios que demuestra resultados de divergencia entre precios reales de mercado y precios modelados mediante paseos aleatorios, apoyando así el desarrollo de modelos predictivos basados en el análisis de patrones históricos como los descritos en este documento. Además, esta metodología nos permite estudiar el comportamiento de series temporales de precios históricos en distintos sectores industriales mediante la búsqueda de patrones en empresas pertenecientes al mismo sector. Los resultados muestran agrupaciones que indican tendencias de mercado compartidas y ,por tanto, señalan que la inclusión de un análisis industrial puede reportar ventajas en la toma de decisiones de inversión. Comprobada la factibilidad de un sistema de predicción basado en series temporales y demostrada la existencia de tendencias macroeconómicas en las diferentes industrias, proponemos el desarrollo del sistema completo a través de diferentes etapas. Iterativamente y mediante varias aproximaciones, testeamos y analizamos las piezas que componen el sistema nal. Las primeras fases describen un sistema de comercio automatizado, basado en análisis técnico y fundamental de empresas, que presenta altos rendimientos y reduce el riesgo de pérdidas. El sistema utiliza un motor de optimización guiado por una versión modi cada de un algoritmo genético el la que presentamos operadores innovadores que proporcionan mecanismos para evitar una convergencia prematura del algoritmo y mejorar los resultados de rendimiento nales. Utilizando este mismo sistema de comercio automático proponemos técnicas de optimización novedosas en relación a uno de los problemas más característicos de estos sistemas, el tiempo de ejecución. Presentamos la paralelización del sistema de comercio automatizado mediante dos técnicas de computación paralela, computación distribuida y procesamiento grá co. Ambas arquitecturas presentan aceleraciones elevadas alcanzando los x50 y x256 respectivamente. Estápas posteriores presentan un cambio de metodologia de optimización, algoritmos genéticos por evolución gramatical, que nos permite comparar ambas estrategias e implementar características más avanzadas como reglas más complejas o la auto-generación de nuevos indicadores técnicos. Testearemos, con datos nancieros recientes, varios sistemas de comercio basados en diferentes funciones de aptitud, incluyendo una innovadora versión multi-objetivo, que nos permitirán analizar las ventajas de cada función de aptitud. Finalmente, describimos y testeamos la metodología del sistema de comercio automatizado basado en una doble capa de gramáticas evolutivas y que combina un análisis técnico, fundamental y macroeconómico en un análisis top-down híbrido. Los resultados obtenidos muestran rendimientos medios del 30% con muy pocas operaciones de perdidas.This thesis concerns to the implementation of a complex and pioneering automated trading system which uses three critical analysis to determine time-decisions and portfolios for investments. To this end, this work delves into automated trading systems and studies time series of historical prices related to companies listed in stock markets. Time series are studied using a novel methodology based on clusterings by software compressors. This new approach allows a theoretical study of price formation which shows results of divergence between market prices and prices modelled by random walks, thus supporting the implementation of predictive models based on the analysis of historical patterns. Furthermore, this methodology also provides us the tool to study behaviours of time series of historical prices from di erent industrial sectors seeking patterns among companies in the same industry. Results show clusters of companies pointing out market trends among companies developing similar activities, and suggesting a macroeconomic analysis to take advantage of investment decisions. Tested the feasibility of prediction systems based on analyses related to time series of historical prices and tested the existence of macroeconomic trends in the industries, we propose the implementation of a hybrid automated trading system through several stages which iteratively describe and test the components of the nal system. In the early stages, we implement an automated trading system based on technical and fundamental analysis of companies, it presents high returns and reducing losses. The implementation uses a methodology guided by a modi ed version of a genetic algorithm which presents novel genetic operators avoiding the premature convergence and improving nal results. Using the same automated trading system we propose novel optimization techniques related to one of the characteristic problems of these systems: the execution time. We present the parallelisation of the system using two parallel computing techniques, rst using distributed computation and, second, implementing a version for graphics processors. Both architectures achieve high speed-ups, reaching 50x and 256x respectively, thus, they present the necessary speed-ups required by systems analysing huge amount of nancial data. Subsequent stages present a transformation in the methodology, genetic algorithms for grammatical evolution, which allows us to compare the two evolutionary strategies and to implement more advanced features such as more complex rules or the self-generation of new technical indicators. In this context, we describe several automated trading system versions guided by di erent tness functions, including an innovative multi-objective version that we test with recent nancial data analysing the advantages of each tness function. Finally, we describe and test the methodology of an automated trading system based on a double layer of grammatical evolution combining technical, fundamental and macroeconomic analysis on a hybrid topdown analysis. The results show average returns of 30% with low number of negative operations.Depto. de Arquitectura de Computadores y AutomáticaFac. de InformáticaTRUEunpu

    Energy-Performance Optimization for the Cloud

    Get PDF

    High-Performance Modelling and Simulation for Big Data Applications

    Get PDF
    This open access book was prepared as a Final Publication of the COST Action IC1406 “High-Performance Modelling and Simulation for Big Data Applications (cHiPSet)“ project. Long considered important pillars of the scientific method, Modelling and Simulation have evolved from traditional discrete numerical methods to complex data-intensive continuous analytical optimisations. Resolution, scale, and accuracy have become essential to predict and analyse natural and complex systems in science and engineering. When their level of abstraction raises to have a better discernment of the domain at hand, their representation gets increasingly demanding for computational and data resources. On the other hand, High Performance Computing typically entails the effective use of parallel and distributed processing units coupled with efficient storage, communication and visualisation systems to underpin complex data-intensive applications in distinct scientific and technical domains. It is then arguably required to have a seamless interaction of High Performance Computing with Modelling and Simulation in order to store, compute, analyse, and visualise large data sets in science and engineering. Funded by the European Commission, cHiPSet has provided a dynamic trans-European forum for their members and distinguished guests to openly discuss novel perspectives and topics of interests for these two communities. This cHiPSet compendium presents a set of selected case studies related to healthcare, biological data, computational advertising, multimedia, finance, bioinformatics, and telecommunications

    High-Performance Modelling and Simulation for Big Data Applications

    Get PDF
    This open access book was prepared as a Final Publication of the COST Action IC1406 “High-Performance Modelling and Simulation for Big Data Applications (cHiPSet)“ project. Long considered important pillars of the scientific method, Modelling and Simulation have evolved from traditional discrete numerical methods to complex data-intensive continuous analytical optimisations. Resolution, scale, and accuracy have become essential to predict and analyse natural and complex systems in science and engineering. When their level of abstraction raises to have a better discernment of the domain at hand, their representation gets increasingly demanding for computational and data resources. On the other hand, High Performance Computing typically entails the effective use of parallel and distributed processing units coupled with efficient storage, communication and visualisation systems to underpin complex data-intensive applications in distinct scientific and technical domains. It is then arguably required to have a seamless interaction of High Performance Computing with Modelling and Simulation in order to store, compute, analyse, and visualise large data sets in science and engineering. Funded by the European Commission, cHiPSet has provided a dynamic trans-European forum for their members and distinguished guests to openly discuss novel perspectives and topics of interests for these two communities. This cHiPSet compendium presents a set of selected case studies related to healthcare, biological data, computational advertising, multimedia, finance, bioinformatics, and telecommunications

    Ramon Llull's Ars Magna

    Get PDF
    corecore