213 research outputs found

    On the Average Complexity of Moore's State Minimization Algorithm

    Get PDF
    We prove that, for any arbitrary finite alphabet and for the uniform distribution over deterministic and accessible automata with n states, the average complexity of Moore's state minimization algorithm is in O(n log n). Moreover this bound is tight in the case of unary utomata

    General Purpose Computation on Graphics Processing Units Using OpenCL

    Get PDF
    Computational Science has emerged as a third pillar of science along with theory and experiment, where the parallelization for scientific computing is promised by different shared and distributed memory architectures such as, super-computer systems, grid and cluster based systems, multi-core and multiprocessor systems etc. In the recent years the use of GPUs (Graphic Processing Units) for General purpose computing commonly known as GPGPU made it an exciting addition to high performance computing systems (HPC) with respect to price and performance ratio. Current GPUs consist of several hundred computing cores arranged in streaming multi-processors so the degree of parallelism is promising. Moreover with the development of new and easy to use interfacing tools and programming languages such as OpenCL and CUDA made the GPUs suitable for different computation demanding applications such as micromagnetic simulations. In micromagnetic simulations, the study of magnetic behavior at very small time and space scale demands a huge computation time, where the calculation of magnetostatic field with complexity of O(Nlog(N)) using FFT algorithm for discrete convolution is the main contribution towards the whole simulation time, and it is computed many times at each time step interval. This study and observation of magnetization behavior at sub-nanosecond time-scales is crucial to a number of areas such as magnetic sensors, non volatile storage devices and magnetic nanowires etc. Since micromagnetic codes in general are suitable for parallel programming as it can be easily divided into independent parts which can run in parallel, therefore current trend for micromagnetic code concerns shifting the computationally intensive parts to GPUs. My PhD work mainly focuses on the development of highly parallel magnetostatic field solver for micromagnetic simulators on GPUs. I am using OpenCL for GPU implementation, with consideration that it is an open standard for parallel programming of heterogeneous systems for cross platform. The magnetostatic field calculation is dominated by the multidimensional FFTs (Fast Fourier Transform) computation. Therefore i have developed the specialized OpenCL based 3D-FFT library for magnetostatic field calculation which made it possible to fully exploit the zero padded input data with out transposition and symmetries inherent in the field calculation. Moreover it also provides a common interface for different vendors' GPUs. In order to fully utilize the GPUs parallel architecture the code needs to handle many hardware specific technicalities such as coalesced memory access, data transfer overhead between GPU and CPU, GPU global memory utilization, arithmetic computation, batch execution etc. In the second step to further increase the level of parallelism and performance, I have developed a parallel magnetostatic field solver on multiple GPUs. Utilizing multiple GPUs avoids dealing with many of the limitations of GPUs (e.g., on-chip memory resources) by exploiting the combined resources of multiple on board GPUs. The GPU implementation have shown an impressive speedup against equivalent OpenMp based parallel implementation on CPU, which means the micromagnetic simulations which require weeks of computation on CPU now can be performed very fast in hours or even in minutes on GPUs. In parallel I also worked on ordered queue management on GPUs. Ordered queue management is used in many applications including real-time systems, operating systems, and discrete event simulations. In most cases, the efficiency of an application itself depends on usage of a sorting algorithm for priority queues. Lately, the usage of graphic cards for general purpose computing has again revisited sorting algorithms. In this work i have presented the analysis of different sorting algorithms with respect to sorting time, sorting rate and speedup on different GPU and CPU architectures and provided a new sorting technique on GPU

    Neural Network Exploration Using Optimal Experiment Design

    Get PDF
    We consider the question "How should one act when the only goal is to learn as much as possible?" Building on the theoretical results of Fedorov [1972] and MacKay [1992], we apply techniques from Optimal Experiment Design (OED) to guide the query/action selection of a neural network learner. We demonstrate that these techniques allow the learner to minimize its generalization error by exploring its domain efficiently and completely. We conclude that, while not a panacea, OED-based query/action has much to offer, especially in domains where its high computational costs can be tolerated

    Fully Distributed And Mixed Symmetric Diagonal Dominant Solvers For Large Scale Optimization

    Get PDF
    Over the past twenty years, we have witnessed an unprecedented growth in data, inaugurating the so-called Big Data Epoch. Throughout these years, the exponential growth in the power of computer chips forecasted by Moore\u27s Law has allowed us to increasingly handle such growing data progression. However, due to the physical limitations on the size of transistors we have already reached the computational limits of traditional microprocessors\u27 architecture.Therefore, we either need conceptually new computers or distributed models of computation to allow processors to solve Big Data problems in a collaborative manner. The purpose of this thesis is to show that decentralized optimization is capable of addressing our growing computational demands by exploiting the power of coordinated data processing. In particular, we propose an exact distributed Newton method for two important challenges in large-scale optimization: Network Flow and Empirical Risk Minimization. The key observation behind our method is related to the symmetric diagonal dominant structure of the Hessian of dual functions correspondent to the aforementioned problems. Consequently, one can calculate the Newton direction by solving symmetric diagonal dominant (SDD) systems in a decentralized fashion. We first propose a fully distributed SDD solver based on a recursive approximation of SDD matrix inverses with a collection of specifically structured distributed matrices. To improve the precision of the algorithm, we then apply Richardson Preconditioners arriving at an efficient algorithm capable of approximating the solution of SDD system with any arbitrary precision. vi Our second fully distributed SDD solver significantly improves the computational performance of the rst algorithm by utilizing Chebyshev polynomials for an approximation of the SDD matrix inverse. The particular choice of Chebyshev polynomials is motivated by their extremal properties and their recursive relation. We then explore mixed strategies for solving SDD systems by slightly relaxing the decentralization requirements. Roughly speaking, by allowing for one computer to aggregate some particular information from all others, one can gain quite surprising computational benefits. The key idea is to construct a spectral sparsifier of the underlying graph of computers by using local communication between them. Finally, we apply these solvers for calculating the Newton direction for the dual function of Network Flow and Empirical Risk Minimization. On the theoretical side, we establish quadratic convergence rate for our algorithms surpassing all existing techniques. On the empirical side, we verify our superior performance in a set of extensive numerical simulations

    Memristors -- from In-memory computing, Deep Learning Acceleration, Spiking Neural Networks, to the Future of Neuromorphic and Bio-inspired Computing

    Full text link
    Machine learning, particularly in the form of deep learning, has driven most of the recent fundamental developments in artificial intelligence. Deep learning is based on computational models that are, to a certain extent, bio-inspired, as they rely on networks of connected simple computing units operating in parallel. Deep learning has been successfully applied in areas such as object/pattern recognition, speech and natural language processing, self-driving vehicles, intelligent self-diagnostics tools, autonomous robots, knowledgeable personal assistants, and monitoring. These successes have been mostly supported by three factors: availability of vast amounts of data, continuous growth in computing power, and algorithmic innovations. The approaching demise of Moore's law, and the consequent expected modest improvements in computing power that can be achieved by scaling, raise the question of whether the described progress will be slowed or halted due to hardware limitations. This paper reviews the case for a novel beyond CMOS hardware technology, memristors, as a potential solution for the implementation of power-efficient in-memory computing, deep learning accelerators, and spiking neural networks. Central themes are the reliance on non-von-Neumann computing architectures and the need for developing tailored learning and inference algorithms. To argue that lessons from biology can be useful in providing directions for further progress in artificial intelligence, we briefly discuss an example based reservoir computing. We conclude the review by speculating on the big picture view of future neuromorphic and brain-inspired computing systems.Comment: Keywords: memristor, neuromorphic, AI, deep learning, spiking neural networks, in-memory computin

    Simulated Annealing with min-cut and greedy perturbations

    Get PDF
    Custom integrated circuit design requires an ever increasing number of elements to be placed on a physical die. The process of searching for an optimal solution is NP-hard so heuristics are required to achieve satisfactory results under time constraints. Simulated Annealing is an algorithm which uses randomly generated perturbations to adjust a single solution. The effect of a generated perturbation is examined by a cost function which evaluates the solution. If the perturbation decreases the cost, it is accepted. If it increases the cost, it is accepted probabilistically. Such an approach allows the algorithm to avoid local minima and find satisfactory solutions. One problem faced by Simulated Annealing is that it can take a very large number of iterations to reach a desired result. Greedy perturbations use knowledge of the system to generate solutions which may be satisfactory after fewer iterations than non-greedy, however previous work has indicated that the exclusive use of greedy perturbations seems to result in a solution constrained to local minima. Min-cut is a procedure in which a graph is split into two pieces with the least interconnection possible between them. Using this with a placement problem helps to recognize components which belong to the same functional unit and thus enhance results of Simulated Annealing. The feasibility of this approach has been assessed. Hardware, through parallelization, can be used to increase the performance of algorithms by decreasing runtime. The possibility of increased performance motivated the exploration of the ability to model greedy perturbations in hardware. The use of greedy perturbations while avoiding local minima was also explored

    On the benefits of resource disaggregation for virtual data centre provisioning in optical data centres

    Get PDF
    Virtual Data Centre (VDC) allocation requires the provisioning of both computing and network resources. Their joint provisioning allows for an optimal utilization of the physical Data Centre (DC) infrastructure resources. However, traditional DCs can suffer from computing resource underutilization due to the rigid capacity configurations of the server units, resulting in high computing resource fragmentation across the DC servers. To overcome these limitations, the disaggregated DC paradigm has been recently introduced. Thanks to resource disaggregation, it is possible to allocate the exact amount of resources needed to provision a VDC instance. In this paper, we focus on the static planning of a shared optically interconnected disaggregated DC infrastructure to support a known set of VDC instances to be deployed on top. To this end, we provide optimal and sub-optimal techniques to determine the necessary capacity (both in terms of computing and network resources) required to support the expected set of VDC demands. Next, we quantitatively evaluate the benefits yielded by the disaggregated DC paradigm in front of traditional DC architectures, considering various VDC profiles and Data Centre Network (DCN) topologies.Peer ReviewedPostprint (author's final draft
    • …
    corecore