8,293 research outputs found
Fifty Years of ISCA: A data-driven retrospective on key trends
Computer Architecture, broadly, involves optimizing hardware and software for
current and future processing systems. Although there are several other top
venues to publish Computer Architecture research, including ASPLOS, HPCA, and
MICRO, ISCA (the International Symposium on Computer Architecture) is one of
the oldest, longest running, and most prestigious venues for publishing
Computer Architecture research. Since 1973, except for 1975, ISCA has been
organized annually. Accordingly, this year will be the 50th year of ISCA. Thus,
we set out to analyze the past 50 years of ISCA to understand who and what has
been driving and innovating computing systems thus far. Our analysis identifies
several interesting trends that reflect how ISCA, and Computer Architecture in
general, has grown and evolved in the past 50 years, including minicomputers,
general-purpose uniprocessor CPUs, multiprocessor and multi-core CPUs,
general-purpose GPUs, and accelerators.Comment: 17 pages, 11 figure
2018 International Symposium on Computer Architecture influential paper award
The International Symposium on Computer Architecture (ISCA) recognizes every year the most influential paper published in this conference 15 years earlier, based on its impact on research, development, products or ideas. This award is sponsored by the IEEEComputer Society Technical Committee on Computer Architecture (IEEE-CS TCCA) and the ACM Special Interest Group on Computer Architecture (ACM SIGARCH). In this year’s edition, the candidate papers were those papers published in ISCA 2003 proceedings.The selection process was chaired by Antonio González.
Candidate papers for the award were selected by the current year’s ISCA Pro-gram Committee. The final award selection was made by the Award Chair (Antonio González), the IEEE-CS TCCA Chair (Lieven Eeckhout) and the ACM SIGARCH Chair (Sarita Adve). The award includes an honorarium for the authors and a certificate.The 2018 award was presented to “Temperature-Aware Microarchitecture” by Kevin Skadron, Mircea R. Stan, Wei Huang, Sivakumar Velusamy, Karthik Sankaranarayanan and DavidTarjan.Peer ReviewedPostprint (author's final draft
Interconnection Networks for Scalable Quantum Computers
We show that the problem of communication in a quantum computer reduces to
constructing reliable quantum channels by distributing high-fidelity EPR pairs.
We develop analytical models of the latency, bandwidth, error rate and resource
utilization of such channels, and show that 100s of qubits must be distributed
to accommodate a single data communication. Next, we show that a grid of
teleportation nodes forms a good substrate on which to distribute EPR pairs. We
also explore the control requirements for such a network. Finally, we propose a
specific routing architecture and simulate the communication patterns of the
Quantum Fourier Transform to demonstrate the impact of resource contention.Comment: To appear in International Symposium on Computer Architecture 2006
(ISCA 2006
PS-Cache: an energy-efficient cache design for chip multiprocessors
The final publication is available at Springer via http://dx.doi.org/10.1007/s11227-014-1288-5Power consumption has become a major design concern in current high-performance chip multiprocessors, and this problem exacerbates with the number of core counts. A significant fraction of the total power budget is often consumed by on-chip caches, thus important research has focused on reducing energy consumption in these structures. To enhance performance, on-chip caches are being deployed with a high associativity degree. Consequently, accessing concurrently all the ways in the cache set is costly in terms of energy. This paper presents the PS-Cache architecture, an energy-efficient cache design that reduces the number of accessed ways without hurting the performance. The PS-Cache takes advantage of the private-shared knowledge of the referenced block to reduce energy by accessing only those ways holding the kind of block looked up. Experimental results show that, on average, the PS-Cache architecture can reduce the dynamic energy consumption of L1 and L2 caches by 22 and 40%, respectively.This work has been jointly supported by the MINECO and European Commission
(FEDER funds) under the project TIN2012-38341-C04-01 and the Fundaci’on Seneca-Agencia de Ciencia
y TecnologĂa de la RegiĂłn de Murcia under the project JĂłvenes LĂderes en InvestigaciĂłn 18956/JLI/13.Valls, JJ.; Ros Bardisa, A.; Sahuquillo Borrás, J.; GĂłmez Requena, ME. (2015). PS-Cache: an energy-efficient cache design for chip multiprocessors. Journal of Supercomputing. 71(1):67-86. https://doi.org/10.1007/s11227-014-1288-5S6786711Balasubramonian R, Jouppi NP, Muralimanohar N (2011) Multi-core cache hierarchies. In: Synthesis lectures on computer architecture. Morgan & Claypool Publishers, San RafaelHennessy JL, Patterson DA (2011) Computer architecture, fifth edition: a quantitative approach, 5th edn. Morgan Kaufmann Publishers Inc., San FranciscoSinharoy B, Kalla R, Starke WJ, Le HQ, Cargnoni R, Van Norstrand JA, Ronchetti BJ, Stuecheli J, Leenstra J, Guthrie GL, Nguyen DQ, Blaner B, Marino CF, Retter E, Williams P (2011) IBM POWER7 multicore server processor. IBM J Res Dev 5(3):1:1-1:29 doi: 10.1147/JRD.2011.2127330Kaxiras S, Hu Z, Martonosi M (2011) 28th International symposium on computer architecture (ISCA), pp 240–251Flautner K, Kim NS, Martin S, Blaauw D, Kaxiras TM, Hu Z, Martonosi M (2002) 29th International symposium on computer architecture (ISCA), pp 148–157Ghosh M, Ă–zer E, Ford S, Biles S, Lee HHS (2009) International symposium on low power electronics and design (ISLPED), pp 165–170Calder B, Grunwald D (1996) 2nd international symposium on high-performance computer architecture (HPCA) (1996), pp 244–253Hardavellas N, Ferdman M, Falsafi B, Ailamaki A (2009) 36th international symposium on computer architecture (ISCA), pp 184–195Cuesta B, Ros A, GĂłmez ME, Robles A, Duato J (2011) 38th international symposium on computer architecture (ISCA), pp 93–103Pugsley SH, Spjut JB, Nellans DW, Balasubramonian R (2010) 19th international conference on parallel architectures and compilation techniques (PACT), pp 465–476Hossain H, Dwarkadas S, Huang MC (2011) 20th international conference on parallel architectures and compilation techniques (PACT), pp 45–55Kim D, Kim JAJ, Huh J (2010) 19th international conference on parallel architectures and compilation techniques (PACT), pp 111–122Ros A, Kaxiras S (2012) 21st international conference on parallel architectures and compilation techniques (PACT), pp 241–252Sundararajan KT, Porpodas V, Jones TM, Topham NP, Franke B (2012) 18th international symposium on high-performance computer architecture (HPCA), pp 311–322Agarwal N, Peh LS, Jha NK (2009) 15th international symposium on high-performance computer architecture (HPCA), pp 67–78Cantin JF, Smith JE, Lipasti MH, Moshovos A, Falsafi B (2006) Coarse-grain coherence tracking: regiĂłn scout and region coherence arrays. IEEE Micro 26(1):70–95Ferdman M, Lotfi-Kamran P, Balet K, Falsafi B (2011) 17th international symposium on high-performance computer architecture (HPCA), pp 169–180Zebchuk J, Srinivasan V, Qureshi MK, Moshovos A (2009) 42nd IEEE/ACM international symposium on microarchitecture (MICRO), pp 423–434Powell M, Hyun Yang S, Falsafi B, Roy K, Vijaykumar TN (2000) International symposium on low power electronics and design (ISLPED), pp 90–95Albonesi DH (1999) 32nd IEEE/ACM international symposium on microarchitecture (MICRO), pp 248–259Zhang C, Vahid F, Yang J, Najjar W (2005) A way-halting cache for low-energy high-performance systems. ACM Transactions on Architecture and Code Optimization. 2(1):34–54Ghosh M, Ă–zer E, Biles S, Lee HHS (2006) 19th international conference on architecture of computing systems (ARCS), pp 283–297Lee J, Hong S, Kim S (2011) 17th international symposium on low power electronics and design (ISLPED), pp 85–90Kedzierski K, Cazorla FJ, Gioiosa R, Buyuktosunoglu A, Valero M (2010) 2nd international forum on next-generation multicore/manycore technologies, pp 1–12Alouani I, Niar S, Kurdahi F, Abid M (2012) 23rd IEEE international symposium on rapid system prototyping (RSP), pp 44–48Meng J, Skadron K (2009) International conference on computer design (ICCD), pp 282–288Li Y, Abousamra A, Melhem R, Jones AK (2010) 19th international conference on parallel architectures and compilation techniques (PACT), pp 501–512Li Y, Melhem RG, Jones AK (2012) 21st international conference on parallel architectures and compilation techniques (PACT), pp 231–240Alisafaee M (2012) 45th IEEE/ACM international symposium on microarchitecture (MICRO), pp 341–350Jiang G, Fen D, Tong L, Xiang L, Wang C, Chen T (2009) 8th international symposium on advanced parallel processing technologies. Springer, Berlin, pp 123–133Sundararajan K, Jones T, Topham N (2013) IEEE 31st international conference on computer design (ICCD), pp 294–301Valls JJ, Ros A, Sahuquillo J, GĂłmez ME, Duato J (2012) 21st international conference on parallel architectures and compilation techniques (PACT), pp 451–452Ros A, Cuesta B, GĂłmez ME, Robles A, Duato J (2013) 42nd international conference on parallel processing (ICPP), pp 562–571Jacob B, Ng S, Wang D (2007) Memory systems: cache, DRAM, disk, 4th edn. Morgan Kaufmann Publishers Inc., San FranciscoPatterson DA, Hennessy JL (2008) Computer organization and design: the hardware/software interface. The Morgan Kaufmann Series in Computer Architecture and Design, 4th edn. Morgan Kaufmann Publishers Inc., San FranciscoMagnusson PS, Christensson M, Eskilson J, Forsgren D, Hallberg G, Hogberg J, Larsson F, Moestedt A, Werner B (2002) Simics: a full system simulation platform. IEEE Comput 35(2):50–58Martin MM, Sorin DJ, Beckmann BM, Marty MR, Xu M, Alameldeen AR, Moore KE, Hill MD, Wood DA (2005) Multifacet’s general execution-driven multiprocessor simulator GEMS toolset. Comput Archit News 33(4):92–99Agarwal N, Krishna T, Peh LS, Jha NK (2009) IEEE international symposium on performance analysis of systems and software (ISPASS), pp 33–42Muralimanohar N, Balasubramonian R, Jouppi NP (2009) Cacti 6.0. Tech. Rep. HPL-2009-85, HP LabsWoo SC, Ohara M, Torrie E, Singh JP, Gupta A (1995) 22nd international symposium on computer architecture (ISCA), pp 24–36Li ML, Sasanka R, Adve SV, Chen YK, Debes E (2005) International symposium on workload characterization, pp 34–45Bienia C, Kumar S, Singh JP, Li K (2008) 17th international conference on parallel architectures and compilation techniques (PACT), pp 72–8
FPSA: A Full System Stack Solution for Reconfigurable ReRAM-based NN Accelerator Architecture
Neural Network (NN) accelerators with emerging ReRAM (resistive random access
memory) technologies have been investigated as one of the promising solutions
to address the \textit{memory wall} challenge, due to the unique capability of
\textit{processing-in-memory} within ReRAM-crossbar-based processing elements
(PEs). However, the high efficiency and high density advantages of ReRAM have
not been fully utilized due to the huge communication demands among PEs and the
overhead of peripheral circuits.
In this paper, we propose a full system stack solution, composed of a
reconfigurable architecture design, Field Programmable Synapse Array (FPSA) and
its software system including neural synthesizer, temporal-to-spatial mapper,
and placement & routing. We highly leverage the software system to make the
hardware design compact and efficient. To satisfy the high-performance
communication demand, we optimize it with a reconfigurable routing architecture
and the placement & routing tool. To improve the computational density, we
greatly simplify the PE circuit with the spiking schema and then adopt neural
synthesizer to enable the high density computation-resources to support
different kinds of NN operations. In addition, we provide spiking memory blocks
(SMBs) and configurable logic blocks (CLBs) in hardware and leverage the
temporal-to-spatial mapper to utilize them to balance the storage and
computation requirements of NN. Owing to the end-to-end software system, we can
efficiently deploy existing deep neural networks to FPSA. Evaluations show
that, compared to one of state-of-the-art ReRAM-based NN accelerators, PRIME,
the computational density of FPSA improves by 31x; for representative NNs, its
inference performance can achieve up to 1000x speedup.Comment: Accepted by ASPLOS 201
Retrospective: A Scalable Processing-in-Memory Accelerator for Parallel Graph Processing
Our ISCA 2015 paper provides a new programmable processing-in-memory (PIM)
architecture and system design that can accelerate key data-intensive
applications, with a focus on graph processing workloads. Our major idea was to
completely rethink the system, including the programming model, data
partitioning mechanisms, system support, instruction set architecture, along
with near-memory execution units and their communication architecture, such
that an important workload can be accelerated at a maximum level using a
distributed system of well-connected near-memory accelerators. We built our
accelerator system, Tesseract, using 3D-stacked memories with logic layers,
where each logic layer contains general-purpose processing cores and cores
communicate with each other using a message-passing programming model. Cores
could be specialized for graph processing (or any other application to be
accelerated).
To our knowledge, our paper was the first to completely design a near-memory
accelerator system from scratch such that it is both generally programmable and
specifically customizable to accelerate important applications, with a case
study on major graph processing workloads. Ensuing work in academia and
industry showed that similar approaches to system design can greatly benefit
both graph processing workloads and other applications, such as machine
learning, for which ideas from Tesseract seem to have been influential.
This short retrospective provides a brief analysis of our ISCA 2015 paper and
its impact. We briefly describe the major ideas and contributions of the work,
discuss later works that built on it or were influenced by it, and make some
educated guesses on what the future may bring on PIM and accelerator systems.Comment: Selected to the 50th Anniversary of ISCA (ACM/IEEE International
Symposium on Computer Architecture), Commemorative Issue, 202
- …