174 research outputs found
Provable advantages of kernel-based quantum learners and quantum preprocessing based on Grover's algorithm
There is an ongoing effort to find quantum speedups for learning problems.
Recently, [Y. Liu et al., Nat. Phys. , 1013--1017 (2021)] have
proven an exponential speedup for quantum support vector machines by leveraging
the speedup of Shor's algorithm. We expand upon this result and identify a
speedup utilizing Grover's algorithm in the kernel of a support vector machine.
To show the practicality of the kernel structure we apply it to a problem
related to pattern matching, providing a practical yet provable advantage.
Moreover, we show that combining quantum computation in a preprocessing step
with classical methods for classification further improves classifier
performance.Comment: 14 pages, 5 figure
Connectionist-Symbolic Machine Intelligence using Cellular Automata based Reservoir-Hyperdimensional Computing
We introduce a novel framework of reservoir computing, that is capable of
both connectionist machine intelligence and symbolic computation. Cellular
automaton is used as the reservoir of dynamical systems. Input is randomly
projected onto the initial conditions of automaton cells and nonlinear
computation is performed on the input via application of a rule in the
automaton for a period of time. The evolution of the automaton creates a
space-time volume of the automaton state space, and it is used as the
reservoir. The proposed framework is capable of long short-term memory and it
requires orders of magnitude less computation compared to Echo State Networks.
We prove that cellular automaton reservoir holds a distributed representation
of attribute statistics, which provides a more effective computation than local
representation. It is possible to estimate the kernel for linear cellular
automata via metric learning, that enables a much more efficient distance
computation in support vector machine framework. Also, binary reservoir feature
vectors can be combined using Boolean operations as in hyperdimensional
computing, paving a direct way for concept building and symbolic processing.Comment: Corrected Typos. Responded some comments on section 8. Added appendix
for details. Recurrent architecture emphasize
Recommended from our members
Efficient Learning in Heterogeneous Internet of Things Ecosystems
The Internet of Things (IoT) is a growing network of heterogeneous devices, combining various sensing and computing nodes at different scales, which creates a large volume of data. Many IoT applications use machine learning (ML) algorithms to analyze the data. The high computational complexity of ML workloads poses significant computational challenges to IoT computing platforms, which tend to be less-powerful and resource-constrained devices. Transmitting such large volumes of data to the cloud also have various issues such as scalability, security and privacy. In this dissertation, we propose efficient solutions to perform the ML tasks while decreasing power consumption and improving performance. We first leverage the heterogeneous and interconnected nature of the IoT systems, where IoT applications run on many different architectures (e.g., X86 server or ARM-based edge device) while communicating with each other. We present a cross-platform power and performance prediction technique for intelligent task allocation. The proposed technique estimates the time-variant energy consumption with only 7% error across completely different architectures, enabling the intelligent task allocation that saves the energy consumption of 16.5% for state-of-the-art ML workloads.We next show how to further advance the learning procedures towards real-time and online processing by distributing such learning tasks onto the hierarchy of IoT devices. Our solution leverages brain-inspired high-dimensional (HD) computing to derive a new class oflearning algorithms that can easily run on IoT devices, while providing high accuracy comparable to the state-of-the-arts. We present that the HD-based learning algorithms can cover various real-world problems from conventional classification to other cognitive tasks beyond classical MLs such as DNA pattern matching. We demonstrate that the HD-based learning can enable secure, collaborative learning by efficiently distributing a large volume of learning tasks into heterogeneous computing nodes. We have implemented the proposed learning solution on various platforms while offering superior computing efficiency. For example, our solution achieves 486×and 7× performance improvements for each of the training and inference phases on a low-power ARM processor, as compared to state-of-the-art deep learning
CDCL(Crypto) and Machine Learning based SAT Solvers for Cryptanalysis
Over the last two decades, we have seen a dramatic improvement in the efficiency of conflict-driven clause-learning Boolean satisfiability (CDCL SAT) solvers over industrial problems from a variety of applications such as verification, testing, security, and AI. The availability of such powerful general-purpose search tools as the SAT solver has led many researchers to propose SAT-based methods for cryptanalysis, including techniques for finding collisions in hash functions and breaking symmetric encryption schemes.
A feature of all of the previously proposed SAT-based cryptanalysis work is that they are \textit{blackbox}, in the sense that the cryptanalysis problem is encoded as a SAT instance and then a CDCL SAT solver is invoked to solve said instance. A weakness of this approach is that the encoding thus generated may be too large for any modern solver to solve it efficiently. Perhaps a more important weakness of this approach is that the solver is in no way specialized or tuned to solve the given instance. Finally, very little work has been done to leverage parallelism in the context of SAT-based cryptanalysis.
To address these issues, we developed a set of methods that improve on the state-of-the-art SAT-based cryptanalysis along three fronts. First, we describe an approach called \cdcl (inspired by the CDCL() paradigm) to tailor the internal subroutines of the CDCL SAT solver with domain-specific knowledge about cryptographic primitives. Specifically, we extend the propagation and conflict analysis subroutines of CDCL solvers with specialized codes that have knowledge about the cryptographic primitive being analyzed by the solver. We demonstrate the power of this framework in two cryptanalysis tasks of algebraic fault attack and differential cryptanalysis of SHA-1 and SHA-256 cryptographic hash functions. Second, we propose a machine-learning based parallel SAT solver that performs well on cryptographic problems relative to many state-of-the-art parallel SAT solvers. Finally, we use a formulation of SAT into Bayesian moment matching to address heuristic initialization problem in SAT solvers
Building Blocks for Mapping Services
Mapping services are ubiquitous on the Internet. These services enjoy a considerable user base. But it is often overlooked that providing a service on a global scale with virtually millions of users has been the playground of an oligopoly of a select few service providers are able to do so. Unfortunately, the literature on these solutions is more than scarce. This thesis adds a number of building blocks to the literature that explain how to design and implement a number of features
An Artificial Immune System Strategy for Robust Chemical Spectra Classification via Distributed Heterogeneous Sensors
The timely detection and classification of chemical and biological agents in a wartime environment is a critical component of force protection in hostile areas. Moreover, the possibility of toxic agent use in heavily populated civilian areas has risen dramatically in recent months. This thesis effort proposes a strategy for identifying such agents vis distributed sensors in an Artificial Immune System (AIS) network. The system may be used to complement electronic nose ( E-nose ) research being conducted in part by the Air Force Research Laboratory Sensors Directorate. In addition, the proposed strategy may facilitate fulfillment of a recent mandate by the President of the United States to the Office of Homeland Defense for the provision of a system that protects civilian populations from chemical and biological agents. The proposed system is composed of networked sensors and nodes, communicating via wireless or wired connections. Measurements are continually taken via dispersed, redundant, and heterogeneous sensors strategically placed in high threat areas. These sensors continually measure and classify air or liquid samples, alerting personnel when toxic agents are detected. Detection is based upon the Biological Immune System (BIS) model of antigens and antibodies, and alerts are generated when a measured sample is determined to be a valid toxic agent (antigen). Agent signatures (antibodies) are continually distributed throughout the system to adapt to changes in the environment or to new antigens. Antibody features are determined via data mining techniques in order to improve system performance and classification capabilities. Genetic algorithms (GAs) are critical part of the process, namely in antibody generation and feature subset selection calculations. Demonstrated results validate the utility of the proposed distributed AIS model for robust chemical spectra recognition
What broke where for distributed and parallel applications — a whodunit story
Detection, diagnosis and mitigation of performance problems in today\u27s large-scale distributed and parallel systems is a difficult task. These large distributed and parallel systems are composed of various complex software and hardware components. When the system experiences some performance or correctness problem, developers struggle to understand the root cause of the problem and fix in a timely manner. In my thesis, I address these three components of the performance problems in computer systems. First, we focus on diagnosing performance problems in large-scale parallel applications running on supercomputers. We developed techniques to localize the performance problem for root-cause analysis. Parallel applications, most of which are complex scientific simulations running in supercomputers, can create up to millions of parallel tasks that run on different machines and communicate using the message passing paradigm. We developed a highly scalable and accurate automated debugging tool called PRODOMETER, which uses sophisticated algorithms to first, create a logical progress dependency graph of the tasks to highlight how the problem spread through the system manifesting as a system-wide performance issue. Second, uses this logical progress dependence graph to identify the task where the problem originated. Finally, PRODOMETER pinpoints the code region corresponding to the origin of the bug. Second, we developed a tool-chain that can detect performance anomaly using machine-learning techniques and can achieve very low false positive rate. Our input-aware performance anomaly detection system consists of a scalable data collection framework to collect performance related metrics from different granularity of code regions, an offline model creation and prediction-error characterization technique, and a threshold based anomaly-detection-engine for production runs. Our system requires few training runs and can handle unknown inputs and parameter combinations by dynamically calibrating the anomaly detection threshold according to the characteristics of the input data and the characteristics of the prediction-error of the models. Third, we developed performance problem mitigation scheme for erasure-coded distributed storage systems. Repair operations of the failed blocks in erasure-coded distributed storage system take really long time in networked constrained data-centers. The reason being, during the repair operation for erasure-coded distributed storage, a lot of data from multiple nodes are gathered into a single node and then a mathematical operation is performed to reconstruct the missing part. This process severely congests the links toward the destination where newly recreated data is to be hosted. We proposed a novel distributed repair technique, called Partial-Parallel-Repair (PPR) that performs this reconstruction in parallel on multiple nodes and eliminates network bottlenecks, and as a result, greatly speeds up the repair process. Fourth, we study how for a class of applications, performance can be improved (or performance problems can be mitigated) by selectively approximating some of the computations. For many applications, the main computation happens inside a loop that can be logically divided into a few temporal segments, we call phases. We found that while approximating the initial phases might severely degrade the quality of the results, approximating the computation for the later phases have very small impact on the final quality of the result. Based on this observation, we developed an optimization framework that for a given budget of quality-loss, would find the best approximation settings for each phase in the execution
Recommended from our members
Enabling high-performance, mixed-signal approximate computing
textFor decades, the semiconductor industry enjoyed exponential improvements in microprocessor power and performance with the device scaling of successive technology generations. Scaling limitations at sub-micron technologies, however, have ceased to provide these historical performance improvements within a limited power budget. While device scaling provides a larger number of transistors per chip, for the same chip area, a growing percentage of the chip will have to be powered off at any given time due to power constraints. As such, the architecture community has focused on energy-efficient designs and is looking to specialized hardware to provide gains in performance. A focus on energy efficiency, along with increasingly less reliable transistors due to device scaling, has led to research in the area of approximate computing, where accuracy is traded for energy efficiency when precise computation is not required. There is a growing body of approximation-tolerant applications that, for example, compute on noisy or incomplete data, such as real-world sensor inputs, or make approximations to decrease the computation load in the analysis of cumbersome data sets. These approximation-tolerant applications span application domains, such as machine learning, image processing, robotics, and financial analysis, among others. Since the advent of the modern processor, computing models have largely presumed the attribute of accuracy. A willingness to relax accuracy requirements, however, with goal of gaining energy efficiency, warrants the re-investigation of the potential of analog computing. Analog hardware offers the opportunity for fast and low-power computation; however, it presents challenges in the form of accuracy. Where analog compute blocks have been applied to solve fixed-function problems, general-purpose computing has relied on digital hardware implementations that provide generality and programmability. The work presented in this thesis aims to answer the following questions: Can analog circuits be successfully integrated into general-purpose computing to provide performance and energy savings? And, what is required to address the historical analog challenges of inaccuracy, programmability, and a lack of generality to enable such an approach? This thesis work investigates a neural approach as a means to address the historical analog challenges of inaccuracy, programmability, and generality and to enable the use of analog circuits in general-purpose, high-performance computing. The first piece of this thesis work investigates the use of analog circuits at the microarchitecture level in the form of an analog neural branch predictor. The task of branch prediction can tolerate imprecision, as roll-back mechanisms correct for branch mispredictions, and application-level accuracy remains unaffected. We show that analog circuits enable the implementation of a highly-accurate, neural-prediction algorithm that is infeasible to implement in the digital domain. The second piece of this thesis work presents a neural accelerator that targets approximation-tolerant code. Analog neural acceleration provides application speedup of 3.3x and energy savings of 12.1x with a quality loss less than 10% for all except one approximation-tolerant benchmark. These results show that, using a neural approach, analog circuits can be applied to provide performance and energy efficiency in high-performance, general-purpose computing.Computer Science
Acceleration for the many, not the few
Although specialized hardware promises orders of magnitude performance gains, their
uptake has been limited by how challenging it is to program them. Hardware accelerators
present challenges programmers are not used to, exposing details of the hardware that
are often hidden and requiring new programming styles to use them effectively.
Existing programming models often involve learning complex and hardware-specific
APIs, using Domain Specific Languages (DSLs), or programming in customized assembly languages. These programming models for hardware accelerators present a
significant challenge to uptake: a steep, unforgiving, and untransferable learning curve.
However, programming hardware accelerators using traditional programming models
presents a challenge: mapping code not written with hardware accelerators in mind to
accelerators with restricted behaviour.
This thesis presents these challenges in the context of the acceleration equation, and
it presents solutions to it in three different contexts: for regular expression accelerators,
for API-programmable accelerators (with Fourier Transforms as a key case-study) and
for heterogeneous coarse-grained reconfigurable arrays (CGRAs). This thesis shows
that automatically morphing software written in traditional manners to fit hardware
accelerators is possible with no programmer effort and that huge potential speedups are
available
- …