136 research outputs found

    A Fully-Pipelined Hardware Design for Gaussian Mixture Models

    No full text
    Gaussian Mixture Models (GMMs) are widely used in many applications such as data mining, signal processing and computer vision, for probability density modeling and soft clustering. However, the parameters of a GMM need to be estimated from data by, for example, the Expectation-Maximization algorithm for Gaussian Mixture Models (EM-GMM), which is computationally demanding. This paper presents a novel design for the EM-GMM algorithm targeting reconfigurable platforms, with five main contributions. First, a pipeline-friendly EM-GMM with diagonal covariance matrices that can easily be mapped to hardware architectures. Second, a function evaluation unit for Gaussian probability density based on fixed-point arithmetic. Third, our approach is extended to support a wide range of dimensions or/and components by fitting multiple pieces of smaller dimensions onto an FPGA chip. Fourth, we derive a cost and performance model that estimates logic resources. Fifth, our dataflow design targeting the Maxeler MPCX2000 with a Stratix-5SGSD8 FPGA can run over 200 times faster than a 6-core Xeon E5645 processor, and over 39 times faster than a Pascal TITAN-X GPU. Our design provides a practical solution to applications for training and explores better parameters for GMMs with hundreds of millions of high dimensional input instances, for low-latency and high-performance applications

    Noisy mixture models for nanopore genomics

    Get PDF
    We describe a new scenario based on a combination of using nanoscale semi-conductor materials and statistical algorithms to achieve high SNR current signals for robust DNA sequence base calling. In our setting, altered DNA molecules are threaded through nanopores in electrically active two- dimensional membranes such as graphene and molybdenum di-sulphide to be sensed by changes in electronic currents owing through the membrane. Unfortunately, solid-state nanopores have been unsuccessful in DNA base identification due to the conformational stochastic fluctuations of DNA in the electrolytic solution inside the pore, which introduces signifcant noise to the measured signal. Hence, we propose an integrated effort that combines electronic simulation based on device physics with statistical learning algorithms to perform clustering and inference from the solid-state nanopore data. In particular we develop Gaussian Mixture Models (GMMs) that take into account the characteristics of the system to cluster the electrical current data and estimate the probability of the DNA position inside the nanopore. The validity of the learning algorithms for noisy GMM model has been demonstrated for uniform and Gaussian noise models with synthetic data sets. We also demonstrate the implementation of a pipelined version of the GMM training algorithm, which can be used to realize in near-sensor computing and inference systems. Finally, we also propose one possible solution to the theoretical resolution limit of nanopore DNA sequencing.Ope

    Distributed GraphLab: A Framework for Machine Learning in the Cloud

    Full text link
    While high-level data parallel frameworks, like MapReduce, simplify the design and implementation of large-scale data processing systems, they do not naturally or efficiently support many important data mining and machine learning algorithms and can lead to inefficient learning systems. To help fill this critical void, we introduced the GraphLab abstraction which naturally expresses asynchronous, dynamic, graph-parallel computation while ensuring data consistency and achieving a high degree of parallel performance in the shared-memory setting. In this paper, we extend the GraphLab framework to the substantially more challenging distributed setting while preserving strong data consistency guarantees. We develop graph based extensions to pipelined locking and data versioning to reduce network congestion and mitigate the effect of network latency. We also introduce fault tolerance to the GraphLab abstraction using the classic Chandy-Lamport snapshot algorithm and demonstrate how it can be easily implemented by exploiting the GraphLab abstraction itself. Finally, we evaluate our distributed implementation of the GraphLab abstraction on a large Amazon EC2 deployment and show 1-2 orders of magnitude performance gains over Hadoop-based implementations.Comment: VLDB201

    New benchmarking methodology and programming model for big data processing

    Get PDF
    Big data processing is becoming a reality in numerous real-world applications. With the emergence of new data intensive technologies and increasing amounts of data, new computing concepts are needed. The integration of big data producing technologies, such as wireless sensor networks, Internet of Things, and cloud computing, into cyber-physical systems is reducing the available time to find the appropriate solutions. This paper presents one possible solution for the coming exascale big data processing: a data flow computing concept. The performance of data flow systems that are processing big data should not be measured with the measures defined for the prevailing control flow systems. A new benchmarking methodology is proposed, which integrates the performance issues of speed, area, and power needed to execute the task. The computer ranking would look different if the new benchmarking methodologies were used; data flow systems would outperform control flow systems. This statement is backed by the recent results gained from implementations of specialized algorithms and applications in data flow systems. They show considerable factors of speedup, space savings, and power reductions regarding the implementations of the same in control flow computers. In our view, the next step of data flow computing development should be a move from specialized to more general algorithms and applications.Peer ReviewedPostprint (published version

    Design of large polyphase filters in the Quadratic Residue Number System

    Full text link

    Temperature aware power optimization for multicore floating-point units

    Full text link
    • …
    corecore