38,565 research outputs found

    Macroservers: An Execution Model for DRAM Processor-In-Memory Arrays

    Get PDF
    The emergence of semiconductor fabrication technology allowing a tight coupling between high-density DRAM and CMOS logic on the same chip has led to the important new class of Processor-In-Memory (PIM) architectures. Newer developments provide powerful parallel processing capabilities on the chip, exploiting the facility to load wide words in single memory accesses and supporting complex address manipulations in the memory. Furthermore, large arrays of PIMs can be arranged into a massively parallel architecture. In this report, we describe an object-based programming model based on the notion of a macroserver. Macroservers encapsulate a set of variables and methods; threads, spawned by the activation of methods, operate asynchronously on the variables' state space. Data distributions provide a mechanism for mapping large data structures across the memory region of a macroserver, while work distributions allow explicit control of bindings between threads and data. Both data and work distributuions are first-class objects of the model, supporting the dynamic management of data and threads in memory. This offers the flexibility required for fully exploiting the processing power and memory bandwidth of a PIM array, in particular for irregular and adaptive applications. Thread synchronization is based on atomic methods, condition variables, and futures. A special type of lightweight macroserver allows the formulation of flexible scheduling strategies for the access to resources, using a monitor-like mechanism

    Asynchronous iterative computations with Web information retrieval structures: The PageRank case

    Get PDF
    There are several ideas being used today for Web information retrieval, and specifically in Web search engines. The PageRank algorithm is one of those that introduce a content-neutral ranking function over Web pages. This ranking is applied to the set of pages returned by the Google search engine in response to posting a search query. PageRank is based in part on two simple common sense concepts: (i)A page is important if many important pages include links to it. (ii)A page containing many links has reduced impact on the importance of the pages it links to. In this paper we focus on asynchronous iterative schemes to compute PageRank over large sets of Web pages. The elimination of the synchronizing phases is expected to be advantageous on heterogeneous platforms. The motivation for a possible move to such large scale distributed platforms lies in the size of matrices representing Web structure. In orders of magnitude: 101010^{10} pages with 101110^{11} nonzero elements and 101210^{12} bytes just to store a small percentage of the Web (the already crawled); distributed memory machines are necessary for such computations. The present research is part of our general objective, to explore the potential of asynchronous computational models as an underlying framework for very large scale computations over the Grid. The area of ``internet algorithmics'' appears to offer many occasions for computations of unprecedent dimensionality that would be good candidates for this framework.Comment: 8 pages to appear at ParCo2005 Conference Proceeding

    Fast Conical Hull Algorithms for Near-separable Non-negative Matrix Factorization

    Full text link
    The separability assumption (Donoho & Stodden, 2003; Arora et al., 2012) turns non-negative matrix factorization (NMF) into a tractable problem. Recently, a new class of provably-correct NMF algorithms have emerged under this assumption. In this paper, we reformulate the separable NMF problem as that of finding the extreme rays of the conical hull of a finite set of vectors. From this geometric perspective, we derive new separable NMF algorithms that are highly scalable and empirically noise robust, and have several other favorable properties in relation to existing methods. A parallel implementation of our algorithm demonstrates high scalability on shared- and distributed-memory machines.Comment: 15 pages, 6 figure

    Classification and Geometry of General Perceptual Manifolds

    Get PDF
    Perceptual manifolds arise when a neural population responds to an ensemble of sensory signals associated with different physical features (e.g., orientation, pose, scale, location, and intensity) of the same perceptual object. Object recognition and discrimination requires classifying the manifolds in a manner that is insensitive to variability within a manifold. How neuronal systems give rise to invariant object classification and recognition is a fundamental problem in brain theory as well as in machine learning. Here we study the ability of a readout network to classify objects from their perceptual manifold representations. We develop a statistical mechanical theory for the linear classification of manifolds with arbitrary geometry revealing a remarkable relation to the mathematics of conic decomposition. Novel geometrical measures of manifold radius and manifold dimension are introduced which can explain the classification capacity for manifolds of various geometries. The general theory is demonstrated on a number of representative manifolds, including L2 ellipsoids prototypical of strictly convex manifolds, L1 balls representing polytopes consisting of finite sample points, and orientation manifolds which arise from neurons tuned to respond to a continuous angle variable, such as object orientation. The effects of label sparsity on the classification capacity of manifolds are elucidated, revealing a scaling relation between label sparsity and manifold radius. Theoretical predictions are corroborated by numerical simulations using recently developed algorithms to compute maximum margin solutions for manifold dichotomies. Our theory and its extensions provide a powerful and rich framework for applying statistical mechanics of linear classification to data arising from neuronal responses to object stimuli, as well as to artificial deep networks trained for object recognition tasks.Comment: 24 pages, 12 figures, Supplementary Material
    • …
    corecore