5,272 research outputs found

    Investigating Single Precision Floating General Matrix Multiply in Heterogeneous Hardware

    Get PDF
    The fundamental operation of matrix multiplication is ubiquitous across a myriad of disciplines. Yet, the identification of new optimizations for matrix multiplication remains relevant for emerging hardware architectures and heterogeneous systems. Frameworks such as OpenCL enable computation orchestration on existing systems, and its availability using the Intel High Level Synthesis compiler allows users to architect new designs for reconfigurable hardware using C/C++. Using the HARPv2 as a vehicle for exploration, we investigate the utility of several of the most notable matrix multiplication optimizations to better understand the performance portability of OpenCL and the implications for such optimizations on this and future heterogeneous architectures. Our results give targeted insights into the applicability of best practices that were for existing architectures when used on emerging heterogeneous systems

    Insights on the spatial configuration of collective spaces within forming dynamics: the relation between infrastructure and urban transformation in Plaça de les Glòries Catalanes

    Get PDF
    The research seeks to produce insights on the spatial configuration of collective spaces where large scale infrastructure propels urban transformation. Focusing on the meaning, character and programmatic qualities of urban spaces in transformation as outcomes of fluctuating processes, it deals with complex spatial forming dynamics of urban streetscapes: the non-traditional conjugations of spaces, boundaries and territories. These spaces foster unexpected notions of proximity, territoriality, permeability and critical boundaries, investigated by means of specific parameters manifesting and interacting in time. This can help upgrade the design of architecture and urban projects to innovative techno-cultural practices and improve their integration in the urban fabric; urgent matter within the hyper-complex conditions of contemporary urban realities. The case of Plaça de les Glòries Catalanes in Barcelona, where a car-oriented open-space based on a variety of spatial manifestations turns into a formalized urban centrality, is used to unveil the complex convergence of streetscapes and urban infrastructures in contemporary urban transformations

    Mapping Big Data into Knowledge Space with Cognitive Cyber-Infrastructure

    Full text link
    Big data research has attracted great attention in science, technology, industry and society. It is developing with the evolving scientific paradigm, the fourth industrial revolution, and the transformational innovation of technologies. However, its nature and fundamental challenge have not been recognized, and its own methodology has not been formed. This paper explores and answers the following questions: What is big data? What are the basic methods for representing, managing and analyzing big data? What is the relationship between big data and knowledge? Can we find a mapping from big data into knowledge space? What kind of infrastructure is required to support not only big data management and analysis but also knowledge discovery, sharing and management? What is the relationship between big data and science paradigm? What is the nature and fundamental challenge of big data computing? A multi-dimensional perspective is presented toward a methodology of big data computing.Comment: 59 page

    Scalably Verifiable Cache Coherence

    Get PDF
    <p>The correctness of a cache coherence protocol is crucial to the system since a subtle bug in the protocol may lead to disastrous consequences. However, the verification of a cache coherence protocol is never an easy task due to the complexity of the protocol. Moreover, as more and more cores are compressed into a single chip, there is an urge for the cache coherence protocol to have higher performance, lower power consumption, and less storage overhead. People perform various optimizations to meet these goals, which unfortunately, further exacerbate the verification problem. The current situation is that there are no efficient and universal methods for verifying a realistic cache coherence protocol for a many-core system. </p><p>We, as architects, believe that we can alleviate the verification problem by changing the traditional design paradigm. We suggest taking verifiability as a first-class design constraint, just as we do with other traditional metrics, such as performance, power consumption, and area overhead. To do this, we need to incorporate verification effort in the early design stage of a cache coherence protocol and make wise design decisions regarding the verifiability. Such a protocol will be amenable to verification and easier to be verified in a later stage. Specifically, we propose two methods in this thesis for designing scalably verifiable cache coherence protocols. </p><p>The first method is Fractal Coherence, targeting verifiable hierarchical protocols. Fractal Coherence leverages the fractal idea to design a cache coherence protocol. The self-similarity of the fractal enables the inductive verification of the protocol. Such a verification process is independent of the number of nodes and thus is scalable. We also design example protocols to show that Fractal Coherence protocols can attain comparable performance compared to a traditional snooping or directory protocol. </p><p>As a system scales hierarchically, Fractal Coherence can perfectly solve the verification problem of the implemented cache coherence protocol. However, Fractal Coherence cannot help if the system scales horizontally. Therefore, we propose the second method, PVCoherence, targeting verifiable flat protocols. PVCoherence is based on parametric verification, a widely used method for verifying the coherence of a flat protocol with infinite number of nodes. PVCoherence captures the fundamental requirements and limitations of parametric verification and proposes a set of guidelines for designing cache coherence protocols that are compatible with parametric verification. As long as designers follow these guidelines, their protocols can be easily verified. </p><p>We further show that Fractal Coherence and PVCoherence can also facilitate the verification of memory consistency, another extremely challenging problem. One piece of previous work proves that the verification of memory consistency can be decomposed into three steps. The most complex and non-scalable step is the verification of the cache coherence protocol. If we design the protocol following the design methodology of Fractal Coherence or PVCoherence, we can easily verify the cache coherence protocol and overcome the biggest obstacle in the verification of memory consistency. </p><p>As system expands and cache coherence protocols get more complex, the verification problem of the protocol becomes more prominent. We believe it is time to reconsider the traditional design flow in which verification is totally separated from the design stage. We show that by incorporating the verifiability in the early design stage and designing protocols to be scalably verifiable in the first place, we can greatly reduce the burden of verification. Meanwhile, we perform various experiments and show that we do not lose benefits in performance as well as in other metrics when we obtain the correctness guarantee.</p>Dissertatio

    Hierarchical Transactions for Hardware/Software Cosynthesis

    Get PDF
    Modern heterogeneous devices provide of a variety of computationally diverse components holding tremendous performance and power capability. Hardware-software cosynthesis offers system-level synthesis and optimization opportunities to realize the potential of these evolving architectures. Efficiently coordinating high-throughput data to make use of available computational resources requires a myriad of distributed local memories, caching structures, and data motion resources. In fact, storage, caching, and data transfer components comprise the majority of silicon real estate. Conventional automated approaches, unfortunately, do not effectively represent applications in a way that captures data motion and state management which dictate dominant system costs. Consequently, existing cosynthesis methods suffer from poor utility of computational resources. Automated cosynthesis tailored towards memory-centric optimizations can address the challenge, adapting partitioning, scheduling, mapping, and binding techniques to maximize overall system utility.This research presents a novel hierarchical transaction model that formalizes state and control management through an abstract data/control encapsulation semantic. It is designed from the ground-up to enable efficient synthesis across heterogeneous system components, with an emphasis on memory capacity constraints. It intrinsically encourages a high degree of concurrency and latency tolerance, and provides verification tools to ensure correctness. A unique data/execution hierarchical encapsulation framework guarantees scalable analysis, supporting a novel concept of state and control mobility. A front-end language allows concise expression of designer intent, and is structured with synthesis in mind. Designers express families of valid executions in a minimal format through high-level dependencies, type systems, and computational relationships, allowing synthesis tools to manage lower-level details. This dissertation introduces and exercises the model, discussing language construction, demonstrating control and data-dominated applications, and presenting a synthesis path that exhibits near-linear scalability with problem size

    Seventh Biennial Report : June 2003 - March 2005

    No full text

    Table Driven Adaptive Effectively Heterogeneous Multi-core Architectures

    Get PDF
    Exploiting flexibilities and scope of multi-core architectures for performance enhancement is one of the highly used approaches used by many researchers. However, with increasing dynamic nature of the workloads of everyday computing, even general multi-core architectures seem to just touch an upper limit on the deliverable performance. This has paved way for meticulous consideration of heterogeneous multi-core architectures. Such architectures can be further enhanced, by making the heterogeneity of the cores dynamic in nature. s work proposes techniques, which change configurations of these cores dynamically with workload. Thus, depending on requirements and pre-programmed preferences, each core can arrange itself to be power optimized or speed optimized. In addition, the project has been designed using RTL (Verilog) to provide completely realistic grip on the silicon investment. The project can be simulated using SPEC2000 and SPEC2006 benchmarks, and is completely synthesizable using IBM_LPE library for 65nm (IBM65LPE).School of Electrical & Computer Engineerin
    • …
    corecore