99 research outputs found

    Generic Techniques in General Purpose GPU Programming with Applications to Ant Colony and Image Processing Algorithms

    Get PDF
    In 2006 NVIDIA introduced a new unified GPU architecture facilitating general-purpose computation on the GPU. The following year NVIDIA introduced CUDA, a parallel programming architecture for developing general purpose applications for direct execution on the new unified GPU. CUDA exposes the GPU's massively parallel architecture of the GPU so that parallel code can be written to execute much faster than its sequential counterpart. Although CUDA abstracts the underlying architecture, fully utilising and scheduling the GPU is non-trivial and has given rise to a new active area of research. Due to the inherent complexities pertaining to GPU development, in this thesis we explore and find efficient parallel mappings of existing and new parallel algorithms on the GPU using NVIDIA CUDA. We place particular emphasis on metaheuristics, image processing and designing reusable techniques and mappings that can be applied to other problems and domains. We begin by focusing on Ant Colony Optimisation (ACO), a nature inspired heuristic approach for solving optimisation problems. We present a versatile improved data-parallel approach for solving the Travelling Salesman Problem using ACO resulting in significant speedups. By extending our initial work, we show how existing mappings of ACO on the GPU are unable to compete against their sequential counterpart when common CPU optimisation strategies are employed and detail three distinct candidate set parallelisation strategies for execution on the GPU. By further extending our data-parallel approach we present the first implementation of an ACO-based edge detection algorithm on the GPU to reduce the execution time and improve the viability of ACO-based edge detection. We finish by presenting a new color edge detection technique using the volume of a pixel in the HSI color space along with a parallel GPU implementation that is able to withstand greater levels of noise than existing algorithms

    Enhanced Deep Network Designs Using Mitochondrial DNA Based Genetic Algorithm And Importance Sampling

    Get PDF
    Machine learning (ML) is playing an increasingly important role in our lives. It has already made huge impact in areas such as cancer diagnosis, precision medicine, self-driving cars, natural disasters predictions, speech recognition, etc. The painstakingly handcrafted feature extractors used in the traditional learning, classification and pattern recognition systems are not scalable for large-sized datasets or adaptable to different classes of problems or domains. Machine learning resurgence in the form of Deep Learning (DL) in the last decade after multiple AI (artificial intelligence) winters and hype cycles is a result of the convergence of advancements in training algorithms, availability of massive data (big data) and innovation in compute resources (GPUs and cloud). If we want to solve more complex problems with machine learning, we need to optimize all three of these areas, i.e., algorithms, dataset and compute. Our dissertation research work presents the original application of nature-inspired idea of mitochondrial DNA (mtDNA) to improve deep learning network design. Additional fine-tuning is provided with Monte Carlo based method called importance sampling (IS). The primary performance indicators for machine learning are model accuracy, loss and training time. The goal of our dissertation is to provide a framework to address all these areas by optimizing network designs (in the form of hyperparameter optimization) and dataset using enhanced Genetic Algorithm (GA) and importance sampling. Algorithms are by far the most important aspect of machine learning. We demonstrate the application of mitochondrial DNA to complement the standard genetic algorithm for architecture optimization of deep Convolution Neural Network (CNN). We use importance sampling to reduce the dataset variance and sample more often from the instances that add greater value from the training outcome perspective. And finally, we leverage massive parallel and distributed processing of GPUs in the cloud to speed up training. Thus, our multi-approach method for enhancing deep learning combines architecture optimization, dataset optimization and the power of the cloud to drive better model accuracy and reduce training time

    Rámec pro plánování problémy

    Get PDF
    Import 22/07/2015Scheduling problems form an important subclass of combinatorial optimisation problems with many applications in manufacturing and logistics. Predominately these problems are NP-complete (decision based) and NP-hard (optimisation based), hence the main course of research in solving them concentrates on the design of efficient heuristic algorithms. Two main categories of these algorithms exist: deterministic algorithms and evolutionary metaheuristics. The deterministic algorithms comprise local improvement techniques, such as k-opt algorithm, which try to improve existing feasible solution, and constructive heuristics, such as NEH, which build a solution starting from scratch, adding one job at a time. Evolutionary metaheuristics have prospered in the past decades, owing to their efficiency and flexibility. Drawing inspiration from the theory of natural evolution or swarm behavioural patterns, the most popular of these algorithms in practice include for instance Genetic Algorithms, Differential Evolution, Particle Swarm Optimisation, amongst others. However, even though these heuristics provide in most cases close to optimal solution at reasonable execution time, this time is still impractically long for many applications. Therefore much effort has been dedicated to accelerating these algorithms. Since the development of hardware turns away from increasing the clock speed towards the parallel processing units, owing to reaching the limits of technology due to the increased power consumption and heat dissipation, this effort goes into parallelisation of the existing algorithms, to enable exploitation of the computing power of multi-core or many-core platforms. This is the goal of the first part of the thesis, accelerating two of the deterministic algorithms, NEH and 2-opt, with interesting results. Another approach has been taken in the second part, with the core premise of exploring the influence of stochasticity on the performance of an evolutionary algorithm, selecting the relatively recent and promising Discrete Artificial Bee Colony algorithm. The pseudo-random number generator has been replaced with the different types of dissipative chaos maps, with some of them improving the algorithm significantly. It has been shown that the population based evolutionary algorithms often form complex networks, taken from the point of view of the information exchange between individual solutions during the course of population development. The final part of this thesis puts this observation into practice by embedding the complex network analysis based self-adaptive mechanism into the ABC algorithm, a continuous optimisation problems solving evolutionary algorithm, which is however the basis for the afore mentioned DABC algorithm, and proving the effectiveness for some of the developed versions, currently on the standard continuous optimisation test functions, with the possibility to extend this modification to the combinatorial optimisations problems in the future being discussed in the conclusion.Rozvrhovací problémy jsou důležitou podtřídou úloh kombinatorické optimalizace s řadou aplikací ve výrobě a logistice. Většina těchto problémů je NP-úplných (rozhodovací forma) a NP-těžkých (optimalizační forma), proto se výzkum zaměřuje na návrh efektivních heuristických algoritmů. Dvě hlavní kategorie těchto algoritmů jsou deterministické algoritmy a evoluční metaheuristiky. Deterministické algoritmy zahrnují techniky lokálního prohledávání, například algoritmus k-opt, jejichž cílem je zlepšení existujícího přípustného řešení problému, dále pak konstruktivní heuristiky, jejichž příkladem je algoritmus NEH, které hledané řešení vytvářejí inkrementálně, bez potřeby znalosti vstupního bodu v prohledávaném prostoru řešení. Evoluční metaheuristiky mají za sebou historii úspěšného vývoje v posledních desetiletích, zejména díky jejich efektivitě a flexibilitě. Jejich inspirací jsou poznatky převzaté z biologie, teorie evoluce a inteligence hejna. Mezi nejpopulárnějšími z těchto algoritmů jsou, mimo jiné, genetické algoritmy, diferenciální evoluce, rojení částic (Particle Swarm Optimisation). Ačkoli tyto heuristiky nalézají ve většině případů řešení blížící se globálnímu optimu v přípustném výpočetním čase, pro řadu aplikací mohou být stále ještě nepřijatelně pomalé. Velké úsilí bylo věnováno zrychlení těchto algoritmů. Protože se vývoj hardware díky dosažení technologických limitů, vzhledem ke zvyšující se spotřebě energie a tepelnému vyzařování, obrací od zvyšování frekvence jednojádrového procesoru k vícejádrovým procesorům a paralelnímu zpracování, je tato snaha většinou orientovaná na paralelizaci existujících algoritmů, aby bylo umožněno využití výpočetní síly vícejádrových platforem (multi-core a many-core). Prvním cílem této práce je tudíž akcelerace dvou deterministických algoritmů, NEH a 2-opt, přičemž bylo dosaženo zajímavých výsledků. Jiný přístup byl zvolen ve druhé části, s hlavní myšlenkou prozkoumání vlivu náhodnosti na výkon evolučního algoritmu. Za tímto účelem byl zvolen relativně nový a slibný algoritmus Discrete Artificial Bee Colony. Generátor pseudonáhodných čísel byl nahrazen několika různými chaotickými mapami, z nichž některé znatelně zlepšily výsledky algoritmu. Bylo ukázáno, že evoluční algoritmy založené na populaci často formují komplexní sítě, vzato z pohledu výměny informací mezi jednotlivými řešeními v populaci během jejího vývoje. Závěrečná část práce aplikuje toto pozorování vložením samo přizpůsobivého mechanismu založeném na analýze komplexní sítě do algoritmu ABC, který je evolučním algoritmem pro spojitou optimalizaci a zároveň základem dříve zmíněného DABC algoritmu. Efektivita několika verzí algoritmu založeném na této myšlence je dokázána na standardní sadě testovacích funkcí pro spojitou optimalizaci. Možnost rozšíření této modifikace na kombinatorické optimalizační problémy je diskutována v závěru práce.460 - Katedra informatikyvýborn

    Data-driven prognostics and logistics optimisation:A deep learning journey

    Get PDF

    Data-driven prognostics and logistics optimisation:A deep learning journey

    Get PDF

    Advanced analytics through FPGA based query processing and deep reinforcement learning

    Get PDF
    Today, vast streams of structured and unstructured data have been incorporated in databases, and analytical processes are applied to discover patterns, correlations, trends and other useful relationships that help to take part in a broad range of decision-making processes. The amount of generated data has grown very large over the years, and conventional database processing methods from previous generations have not been sufficient to provide satisfactory results regarding analytics performance and prediction accuracy metrics. Thus, new methods are needed in a wide array of fields from computer architectures, storage systems, network design to statistics and physics. This thesis proposes two methods to address the current challenges and meet the future demands of advanced analytics. First, we present AxleDB, a Field Programmable Gate Array based query processing system which constitutes the frontend of an advanced analytics system. AxleDB melds highly-efficient accelerators with memory, storage and provides a unified programmable environment. AxleDB is capable of offloading complex Structured Query Language queries from host CPU. The experiments have shown that running a set of TPC-H queries, AxleDB can perform full queries between 1.8x and 34.2x faster and 2.8x to 62.1x more energy efficient compared to MonetDB, and PostgreSQL on a single workstation node. Second, we introduce TauRieL, a novel deep reinforcement learning (DRL) based method for combinatorial problems. The design idea behind combining DRL and combinatorial problems is to apply the prediction capabilities of deep reinforcement learning and to use the universality of combinatorial optimization problems to explore general purpose predictive methods. TauRieL utilizes an actor-critic inspired DRL architecture that adopts ordinary feedforward nets. Furthermore, TauRieL performs online training which unifies training and state space exploration. The experiments show that TauRieL can generate solutions two orders of magnitude faster and performs within 3% of accuracy compared to the state-of-the-art DRL on the Traveling Salesman Problem while searching for the shortest tour. Also, we present that TauRieL can be adapted to the Knapsack combinatorial problem. With a very minimal problem specific modification, TauRieL can outperform a Knapsack specific greedy heuristics.Hoy en día, se han incorporado grandes cantidades de datos estructurados y no estructurados en las bases de datos, y se les aplican procesos analíticos para descubrir patrones, correlaciones, tendencias y otras relaciones útiles que se utilizan mayormente para la toma de decisiones. La cantidad de datos generados ha crecido enormemente a lo largo de los años, y los métodos de procesamiento de bases de datos convencionales utilizados en las generaciones anteriores no son suficientes para proporcionar resultados satisfactorios respecto al rendimiento del análisis y respecto de la precisión de las predicciones. Por lo tanto, se necesitan nuevos métodos en una amplia gama de campos, desde arquitecturas de computadoras, sistemas de almacenamiento, diseño de redes hasta estadísticas y física. Esta tesis propone dos métodos para abordar los desafíos actuales y satisfacer las demandas futuras de análisis avanzado. Primero, presentamos AxleDB, un sistema de procesamiento de consultas basado en FPGAs (Field Programmable Gate Array) que constituye la interfaz de un sistema de análisis avanzado. AxleDB combina aceleradores altamente eficientes con memoria, almacenamiento y proporciona un entorno programable unificado. AxleDB es capaz de descargar consultas complejas de lenguaje de consulta estructurado desde la CPU del host. Los experimentos han demostrado que al ejecutar un conjunto de consultas TPC-H, AxleDB puede realizar consultas completas entre 1.8x y 34.2x más rápido y 2.8x a 62.1x más eficiente energéticamente que MonetDB, y PostgreSQL en un solo nodo de una estación de trabajo. En segundo lugar, presentamos TauRieL, un nuevo método basado en Deep Reinforcement Learning (DRL) para problemas combinatorios. La idea central que está detrás de la combinación de DRL y problemas combinatorios, es aplicar las capacidades de predicción del aprendizaje de refuerzo profundo y el uso de la universalidad de los problemas de optimización combinatoria para explorar métodos predictivos de propósito general. TauRieL utiliza una arquitectura DRL inspirada en el actor-crítico que se adapta a redes feedforward. Además, TauRieL realiza el entrenamieton en línea que unifica el entrenamiento y la exploración espacial de los estados. Los experimentos muestran que TauRieL puede generar soluciones dos órdenes de magnitud más rápido y funciona con un 3% de precisión en comparación con el estado del arte en DRL aplicado al problema del viajante mientras busca el recorrido más corto. Además, presentamos que TauRieL puede adaptarse al problema de la Mochila. Con una modificación específica muy mínima del problema, TauRieL puede superar a una heurística codiciosa de Knapsack Problem.Postprint (published version

    Review of Deep Learning Algorithms and Architectures

    Get PDF
    Deep learning (DL) is playing an increasingly important role in our lives. It has already made a huge impact in areas, such as cancer diagnosis, precision medicine, self-driving cars, predictive forecasting, and speech recognition. The painstakingly handcrafted feature extractors used in traditional learning, classification, and pattern recognition systems are not scalable for large-sized data sets. In many cases, depending on the problem complexity, DL can also overcome the limitations of earlier shallow networks that prevented efficient training and abstractions of hierarchical representations of multi-dimensional training data. Deep neural network (DNN) uses multiple (deep) layers of units with highly optimized algorithms and architectures. This paper reviews several optimization methods to improve the accuracy of the training and to reduce training time. We delve into the math behind training algorithms used in recent deep networks. We describe current shortcomings, enhancements, and implementations. The review also covers different types of deep architectures, such as deep convolution networks, deep residual networks, recurrent neural networks, reinforcement learning, variational autoencoders, and others.https://doi.org/10.1109/ACCESS.2019.291220

    Ant Colony Optimization

    Get PDF
    Ant Colony Optimization (ACO) is the best example of how studies aimed at understanding and modeling the behavior of ants and other social insects can provide inspiration for the development of computational algorithms for the solution of difficult mathematical problems. Introduced by Marco Dorigo in his PhD thesis (1992) and initially applied to the travelling salesman problem, the ACO field has experienced a tremendous growth, standing today as an important nature-inspired stochastic metaheuristic for hard optimization problems. This book presents state-of-the-art ACO methods and is divided into two parts: (I) Techniques, which includes parallel implementations, and (II) Applications, where recent contributions of ACO to diverse fields, such as traffic congestion and control, structural optimization, manufacturing, and genomics are presented
    corecore