185,553 research outputs found

    The architecture of DDMl: a recursively structured data driven machine

    Get PDF
    Journal ArticleAn architecture for a highly modular, recursively structured class of machines is presented. DDMl is an instance of such a machine structure, and is capable of executing machine language programs which are data driven (data flow) nets. These nets may represent arbitrary amounts of concurrency as well as arbitrary amounts of pipelining. DDMl is a fully distributed multi-processing system composed of completely asynchronous modules. The architecture allows for limitless physical extensibility without necessitating special programming or special hardware to support individual machines of widely varying sizes. DDMl is capable of automatically and dynamically allocating concurrent tasks to the available physical resources. The essential characteristics of the highly parallel, pipelined machine language are also described along with its method for execution on DDMl

    From Hydras to TACOs

    Get PDF
    Stanford University Library has a robust digital library system called the Stanford Digital Repository. This repository holds a little under 500 TB of materials in preservation and online for researchers, capture of scholarly output, and digitized cultural heritage materials. These materials are managed across 90+ codebases serving a variety of functions from self-deposit web applications, to a nearly 10 year old parallel processing framework, to a digital repository assets publication mechanism leading into our Blacklight, Spotlight, and Geoblacklight applications – among other services and needs. At the core of this system is a Fedora 3 store. With Fedora 3 now end-of-lifed, and our system suffering from limited to no horizontal scalability options, we’re revisiting our system and architecture. We are writing it from the start with a goal to have data-forward, distributed microservices and some event-driven processing components. TACO, our new core management API, is the heart of this new architecture, and is currently being developed as a prototype. This talk will walk through the process of analysing our current system via a dataflows analysis; designing a new architecture for our digital library with a wide ranging set of requirements and users; prototyping a core component of our new architecture to be horizontally scalable as well as data & specification driven; then planning how to create ‘seams’ in our current system to migrate towards our new system in an evolutionary fashion instead of a turn-key migration

    ADMM-SLPNet: A Model-Driven Deep Learning Framework for Symbol-Level Precoding

    Get PDF
    Constructive interference (CI)-based symbol-level precoding (SLP) is an emerging downlink transmission technique for multi-antenna communications systems, and its low-complexity implementations are of practical importance. In this paper, we propose an interpretable model-driven deep learning framework to accelerate the processing of SLP. Specifically, the network topology is carefully designed by unrolling a parallelizable algorithm based on the proximal Jacobian alternating direction method of multipliers (PJ-ADMM), attaining parallel and distributed architecture. Moreover, the parameters of the iterative PJ-ADMM algorithm are untied to parameterize the network. By incorporating the problem-domain knowledge into the loss function, an unsupervised learning strategy is further proposed to discriminatively train the learnable parameters using unlabeled training data. Simulation results demonstrate significant efficiency improvement of the proposed ADMM-SLPNet over benchmark schemes

    The "MIND" Scalable PIM Architecture

    Get PDF
    MIND (Memory, Intelligence, and Network Device) is an advanced parallel computer architecture for high performance computing and scalable embedded processing. It is a Processor-in-Memory (PIM) architecture integrating both DRAM bit cells and CMOS logic devices on the same silicon die. MIND is multicore with multiple memory/processor nodes on each chip and supports global shared memory across systems of MIND components. MIND is distinguished from other PIM architectures in that it incorporates mechanisms for efficient support of a global parallel execution model based on the semantics of message-driven multithreaded split-transaction processing. MIND is designed to operate either in conjunction with other conventional microprocessors or in standalone arrays of like devices. It also incorporates mechanisms for fault tolerance, real time execution, and active power management. This paper describes the major elements and operational methods of the MIND architecture

    A Taxonomy of Workflow Management Systems for Grid Computing

    Full text link
    With the advent of Grid and application technologies, scientists and engineers are building more and more complex applications to manage and process large data sets, and execute scientific experiments on distributed resources. Such application scenarios require means for composing and executing complex workflows. Therefore, many efforts have been made towards the development of workflow management systems for Grid computing. In this paper, we propose a taxonomy that characterizes and classifies various approaches for building and executing workflows on Grids. We also survey several representative Grid workflow systems developed by various projects world-wide to demonstrate the comprehensiveness of the taxonomy. The taxonomy not only highlights the design and engineering similarities and differences of state-of-the-art in Grid workflow systems, but also identifies the areas that need further research.Comment: 29 pages, 15 figure

    Programming MPSoC platforms: Road works ahead

    Get PDF
    This paper summarizes a special session on multicore/multi-processor system-on-chip (MPSoC) programming challenges. The current trend towards MPSoC platforms in most computing domains does not only mean a radical change in computer architecture. Even more important from a SW developer´s viewpoint, at the same time the classical sequential von Neumann programming model needs to be overcome. Efficient utilization of the MPSoC HW resources demands for radically new models and corresponding SW development tools, capable of exploiting the available parallelism and guaranteeing bug-free parallel SW. While several standards are established in the high-performance computing domain (e.g. OpenMP), it is clear that more innovations are required for successful\ud deployment of heterogeneous embedded MPSoC. On the other hand, at least for coming years, the freedom for disruptive programming technologies is limited by the huge amount of certified sequential code that demands for a more pragmatic, gradual tool and code replacement strategy

    ACOTES project: Advanced compiler technologies for embedded streaming

    Get PDF
    Streaming applications are built of data-driven, computational components, consuming and producing unbounded data streams. Streaming oriented systems have become dominant in a wide range of domains, including embedded applications and DSPs. However, programming efficiently for streaming architectures is a challenging task, having to carefully partition the computation and map it to processes in a way that best matches the underlying streaming architecture, taking into account the distributed resources (memory, processing, real-time requirements) and communication overheads (processing and delay). These challenges have led to a number of suggested solutions, whose goal is to improve the programmer’s productivity in developing applications that process massive streams of data on programmable, parallel embedded architectures. StreamIt is one such example. Another more recent approach is that developed by the ACOTES project (Advanced Compiler Technologies for Embedded Streaming). The ACOTES approach for streaming applications consists of compiler-assisted mapping of streaming tasks to highly parallel systems in order to maximize cost-effectiveness, both in terms of energy and in terms of design effort. The analysis and transformation techniques automate large parts of the partitioning and mapping process, based on the properties of the application domain, on the quantitative information about the target systems, and on programmer directives. This paper presents the outcomes of the ACOTES project, a 3-year collaborative work of industrial (NXP, ST, IBM, Silicon Hive, NOKIA) and academic (UPC, INRIA, MINES ParisTech) partners, and advocates the use of Advanced Compiler Technologies that we developed to support Embedded Streaming.Peer ReviewedPostprint (published version
    • …
    corecore