39 research outputs found
Bio-Inspired Optimization of Ultra-Wideband Patch Antennas Using Graphics Processing Unit Acceleration
Ultra-wideband (UWB) wireless systems have recently gained considerable attention as effective communications platforms with the properties of low power and high data rates. Applications of UWB such as wireless USB put size constraints on the antenna, however, which can be very dicult to meet using typical narrow band antenna designs. The aim of this thesis is to show how bio-inspired evolutionary optimization algorithms, in particular genetic algorithm (GA), particle swarm optimization (PSO) and biogeography-based optimization (BBO) can produce novel UWB planar patch antenna designs that meet a size constraint of a 10 mm 10 mm patch. Each potential antenna design is evaluated with the nite dierence time domain (FDTD) technique, which is accurate but time-consuming. Another aspect of this thesis is the modication of FDTD to run on a graphics processing unit (GPU) to obtain nearly a 20 speedup. With the combination of GA, PSO, BBO and GPU-accelerated FDTD, three novel antenna designs are produced that meet the size and bandwidth requirements applicable to UWB wireless USB system
Ant Colony Optimization
Ant Colony Optimization (ACO) is the best example of how studies aimed at understanding and modeling the behavior of ants and other social insects can provide inspiration for the development of computational algorithms for the solution of difficult mathematical problems. Introduced by Marco Dorigo in his PhD thesis (1992) and initially applied to the travelling salesman problem, the ACO field has experienced a tremendous growth, standing today as an important nature-inspired stochastic metaheuristic for hard optimization problems. This book presents state-of-the-art ACO methods and is divided into two parts: (I) Techniques, which includes parallel implementations, and (II) Applications, where recent contributions of ACO to diverse fields, such as traffic congestion and control, structural optimization, manufacturing, and genomics are presented
Benchmarking optimization algorithms for auto-tuning GPU kernels
Recent years have witnessed phenomenal growth in the application, and
capabilities of Graphical Processing Units (GPUs) due to their high parallel
computation power at relatively low cost. However, writing a computationally
efficient GPU program (kernel) is challenging, and generally only certain
specific kernel configurations lead to significant increases in performance.
Auto-tuning is the process of automatically optimizing software for
highly-efficient execution on a target hardware platform. Auto-tuning is
particularly useful for GPU programming, as a single kernel requires re-tuning
after code changes, for different input data, and for different architectures.
However, the discrete, and non-convex nature of the search space creates a
challenging optimization problem. In this work, we investigate which algorithm
produces the fastest kernels if the time-budget for the tuning task is varied.
We conduct a survey by performing experiments on 26 different kernel spaces,
from 9 different GPUs, for 16 different evolutionary black-box optimization
algorithms. We then analyze these results and introduce a novel metric based on
the PageRank centrality concept as a tool for gaining insight into the
difficulty of the optimization problem. We demonstrate that our metric
correlates strongly with observed tuning performance.Comment: in IEEE Transactions on Evolutionary Computation, 202
Enhancing numerical modelling efficiency for electromagnetic simulation of physical layer components.
The purpose of this thesis is to present solutions to overcome several key difficulties that limit the application of numerical modelling in communication cable design and analysis. In particular, specific limiting factors are that simulations are time consuming, and the process of comparison requires skill and is poorly defined and understood. When much of the process of design consists of optimisation of performance within a well defined domain, the use of artificial intelligence techniques may reduce or remove the need for human interaction in the design process. The automation of human processes allows round-the-clock operation at a faster throughput. Achieving a speedup would permit greater exploration of the possible designs, improving understanding of the domain.
This thesis presents work that relates to three facets of the efficiency of numerical modelling: minimizing simulation execution time, controlling optimization processes and quantifying comparisons of results. These topics are of interest because simulation times for most problems of interest run into tens of hours. The design process for most systems being modelled may be considered an optimisation process in so far as the design is improved based upon a comparison of the test results with a specification. Development of software to automate this process permits the improvements to continue outside working hours, and produces decisions unaffected by the psychological state of a human operator. Improved performance of simulation tools would facilitate exploration of more variations on a design, which would improve understanding of the problem domain, promoting a virtuous circle of design.
The minimization of execution time was achieved through the development of a Parallel TLM Solver which did not use specialized hardware or a dedicated network. Its design was novel because it was intended to operate on a network of heterogeneous machines in a manner which was fault tolerant, and included a means to reduce vulnerability of simulated data without encryption. Optimisation processes were controlled by genetic algorithms and particle swarm optimisation which were novel applications in communication cable design. The work extended the range of cable parameters, reducing conductor diameters for twisted pair cables, and reducing optical coverage of screens for a given shielding effectiveness. Work on the comparison of results introduced ―Colour maps‖ as a way of displaying three scalar variables over a two-dimensional surface, and comparisons were quantified by extending 1D Feature Selective Validation (FSV) to two dimensions, using an ellipse shaped filter, in such a way that it could be extended to higher dimensions. In so doing, some problems with FSV were detected, and suggestions for overcoming these presented: such as the special case of zero valued DC signals. A re-description of Feature Selective Validation, using Jacobians and tensors is proposed, in order to facilitate its implementation in higher dimensional spaces
Point spread function estimation of solar surface images with a cooperative particle swarm optmization on GPUS
Orientador : Prof. Dr. Daniel WeingaertnerDissertação (mestrado) - Universidade Federal do Paraná, Setor de Ciências Exatas, Programa de Pós-Graduação em Informática. Defesa: Curitiba, 21/02/3013Bibliografia : fls. 81-86Resumo: Apresentamos um método para a estimativa da função de espalhamento pontual (PSF) de imagens de superfície solar obtidas por telescópios terrestres e corrompidas pela atmosfera. A estimativa e feita obtendo-se a fase da frente de onda usando um conjunto de imagens de curta exposto, a reconstrucão de granulado optico do objeto observado e um modelo PSF parametrizado por polinómios de Zernikes. Estimativas da fase da frente de onda e do PSF sao computados atraves da minimizacao de uma funcao de erro com um metodo de otimizacão cooperativa por nuvens de partículas (CPSO), implementados em OpenCL para tirar vantagem do ambiente altamente paralelo Um metodo de calibracao e apresentado para ajustar os parâmetros do que as unidade de processamento gráfico (GPU) provem. algoritmo para resultados de baixo custo, resultando em solidas estimativas tanto para imagens de baixa frequencia quanto para imagens de alta frequencia. Os resultados mostram que o metodo apresentado possui râpida convergencia e e robusto a degradacao causada por ruídos. Experimentos executados em uma placa NVidia Tesla C2050 computaram 100 PSFs com 50 polinómios de Zernike em " 36 minutos. Ao aumentar-se o námero de coeficientes de Zernike dez vezes, de 50 para 500, o tempo de execucão aumentou somente 17%, o que demonstra que o algoritmo proposto e pouco afetado pelo numero de Zernikes utilizado.Abstract: We present a method for estimating the point spread function (PSF) of solar surface images acquired from ground telescopes and degraded by atmosphere. The estimation is done by retrieving the wavefront phase using a set of short exposures, the speckle reconstruction of the observed object and a PSF model parametrized by Zernike polynomials. Estimates of the wavefront phase and the PSF are computed by minimizing an error function with a cooperative particle swarm optimization method (CPSO), implemented in OpenCL to take advantage of highly parallel graphical processing units (GPUs). A calibration method is presented to adjust the algorithm parameters for low cost results, providing solid estimations for both low frequency and high frequency images. Results show that the method has a fast convergence and is robust to noise degradation. Experiments run on an NVidia Tesla C2050 were able to compute 100 PSFs with 50 Zernike polynomials in " 36 minutes. The increase on the number of Zernike coefficients tenfold, from 50 to 500, caused the increase of 17% on the execution time, showing that the proposed algorithm is only slightly affected by the number of Zernikes used
Energy-aware scheduling in distributed computing systems
Distributed computing systems, such as data centers, are key for supporting modern computing demands. However, the energy consumption of data centers has become a major concern over the last decade. Worldwide energy consumption in 2012 was estimated to be around 270 TWh, and grim forecasts predict it will quadruple by 2030. Maximizing energy efficiency while also maximizing computing efficiency is a major challenge for modern data centers. This work addresses this challenge by scheduling the operation of modern data centers, considering a multi-objective approach for simultaneously optimizing both efficiency objectives. Multiple data center scenarios are studied, such as scheduling a single data center and scheduling a federation of several geographically-distributed data centers. Mathematical models are formulated for each scenario, considering the modeling of their most relevant components such as computing resources, computing workload, cooling system, networking, and green energy generators, among others. A set of accurate heuristic and metaheuristic algorithms are designed for addressing the scheduling problem. These scheduling algorithms are comprehensively studied, and compared with each other, using statistical tools to evaluate their efficacy when addressing realistic workloads and scenarios. Experimental results show the designed scheduling algorithms are able to significantly increase the energy efficiency of data centers when compared to traditional scheduling methods, while providing a diverse set of trade-off solutions regarding the computing efficiency of the data center. These results confirm the effectiveness of the proposed algorithmic approaches for data center infrastructures.Los sistemas informáticos distribuidos, como los centros de datos, son clave para satisfacer la demanda informática moderna. Sin embargo, su consumo de energético se ha convertido en una gran preocupación. Se estima que mundialmente su consumo energético rondó los 270 TWh en el año 2012, y algunos prevén que este consumo se cuadruplicará para el año 2030. Maximizar simultáneamente la eficiencia energética y computacional de los centros de datos es un desafío crítico. Esta tesis aborda dicho desafío mediante la planificación de la operativa del centro de datos considerando un enfoque multiobjetivo para optimizar simultáneamente ambos objetivos de eficiencia. En esta tesis se estudian múltiples variantes del problema, desde la planificación de un único centro de datos hasta la de una federación de múltiples centros de datos geográficmentea distribuidos. Para esto, se formulan modelos matemáticos para cada variante del problema, modelado sus componentes más relevantes, como: recursos computacionales, carga de trabajo, refrigeración, redes, energía verde, etc. Para resolver el problema de planificación planteado, se diseñan un conjunto de algoritmos heurísticos y metaheurísticos. Estos son estudiados exhaustivamente y su eficiencia es evaluada utilizando una batería de herramientas estadísticas. Los resultados experimentales muestran que los algoritmos de planificación diseñados son capaces de aumentar significativamente la eficiencia energética de un centros de datos en comparación con métodos tradicionales planificación. A su vez, los métodos propuestos proporcionan un conjunto diverso de soluciones con diferente nivel de compromiso respecto a la eficiencia computacional del centro de datos. Estos resultados confirman la eficacia del enfoque algorítmico propuesto
Neuromorphic Learning Systems for Supervised and Unsupervised Applications
The advancements in high performance computing (HPC) have enabled the large-scale implementation of neuromorphic learning models and pushed the research on computational intelligence into a new era. Those bio-inspired models are constructed on top of unified building blocks, i.e. neurons, and have revealed potentials for learning of complex information. Two major challenges remain in neuromorphic computing. Firstly, sophisticated structuring methods are needed to determine the connectivity of the neurons in order to model various problems accurately. Secondly, the models need to adapt to non-traditional architectures for improved computation speed and energy efficiency. In this thesis, we address these two problems and apply our techniques to different cognitive applications.
This thesis first presents the self-structured confabulation network for anomaly detection. Among the machine learning applications, unsupervised detection of the anomalous streams is especially challenging because it requires both detection accuracy and real-time performance. Designing a computing framework that harnesses the growing computing power of the multicore systems while maintaining high sensitivity and specificity to the anomalies is an urgent research need. We present AnRAD (Anomaly Recognition And Detection), a bio-inspired detection framework that performs probabilistic inferences. We leverage the mutual information between the features and develop a self-structuring procedure that learns a succinct confabulation network from the unlabeled data. This network is capable of fast incremental learning, which continuously refines the knowledge base from the data streams. Compared to several existing anomaly detection methods, the proposed approach provides competitive detection accuracy as well as the insight to reason the decision making. Furthermore, we exploit the massive parallel structure of the AnRAD framework. Our implementation of the recall algorithms on the graphic processing unit (GPU) and the Xeon Phi co-processor both obtain substantial speedups over the sequential implementation on general-purpose microprocessor (GPP). The implementation enables real-time service to concurrent data streams with diversified contexts, and can be applied to large problems with multiple local patterns. Experimental results demonstrate high computing performance and memory efficiency. For vehicle abnormal behavior detection, the framework is able to monitor up to 16000 vehicles and their interactions in real-time with a single commodity co-processor, and uses less than 0.2ms for each testing subject.
While adapting our streaming anomaly detection model to mobile devices or unmanned systems, the key challenge is to deliver required performance under the stringent power constraint. To address the paradox between performance and power consumption, brain-inspired hardware, such as the IBM Neurosynaptic System, has been developed to enable low power implementation of neural models. As a follow-up to the AnRAD framework, we proposed to port the detection network to the TrueNorth architecture. Implementing inference based anomaly detection on a neurosynaptic processor is not straightforward due to hardware limitations. A design flow and the supporting component library are developed to flexibly map the learned detection networks to the neurosynaptic cores. Instead of the popular rate code, burst code is adopted in the design, which represents numerical value using the phase of a burst of spike trains. This does not only reduce the hardware complexity, but also increases the result\u27s accuracy. A Corelet library, NeoInfer-TN, is implemented for basic operations in burst code and two-phase pipelines are constructed based on the library components. The design can be configured for different tradeoffs between detection accuracy, hardware resource consumptions, throughput and energy. We evaluate the system using network intrusion detection data streams. The results show higher detection rate than some conventional approaches and real-time performance, with only 50mW power consumption. Overall, it achieves 10^8 operations per Joule.
In addition to the modeling and implementation of unsupervised anomaly detection, we also investigate a supervised learning model based on neural networks and deep fragment embedding and apply it to text-image retrieval. The study aims at bridging the gap between image and natural language. It continues to improve the bidirectional retrieval performance across the modalities. Unlike existing works that target at single sentence densely describing the image objects, we elevate the topic to associating deep image representations with noisy texts that are only loosely correlated. Based on text-image fragment embedding, our model employs a sequential configuration, connects two embedding stages together. The first stage learns the relevancy of the text fragments, and the second stage uses the filtered output from the first one to improve the matching results. The model also integrates multiple convolutional neural networks (CNN) to construct the image fragments, in which rich context information such as human faces can be extracted to increase the alignment accuracy. The proposed method is evaluated with both synthetic dataset and real-world dataset collected from picture news website. The results show up to 50% ranking performance improvement over the comparison models
Fundamentals
Volume 1 establishes the foundations of this new field. It goes through all the steps from data collection, their summary and clustering, to different aspects of resource-aware learning, i.e., hardware, memory, energy, and communication awareness. Machine learning methods are inspected with respect to resource requirements and how to enhance scalability on diverse computing architectures ranging from embedded systems to large computing clusters
Fundamentals
Volume 1 establishes the foundations of this new field. It goes through all the steps from data collection, their summary and clustering, to different aspects of resource-aware learning, i.e., hardware, memory, energy, and communication awareness. Machine learning methods are inspected with respect to resource requirements and how to enhance scalability on diverse computing architectures ranging from embedded systems to large computing clusters