Search CORE

11 research outputs found

Domain-Specific Symbolic Compilation

Author: Chandra Kartik
Phothilimthana Phitchaya Mangpo
Yazdani Nathaniel
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 2nd Summit on Advances in Programming Languages (SNAPL 2017)
Publication date: 01/01/2017
Field of study

A symbolic compiler translates a program to symbolic constraints, automatically reducing model checking and synthesis to constraint solving. We show that new applications of constraint solving require domain-specific encodings that yield the required orders of magnitude improvements in solver efficiency. Unfortunately, these encodings cannot be obtained with today\u27s symbolic compilation. We introduce symbolic languages that encapsulate domain-specific encodings under abstractions that behave as their non-symbolic counterparts: client code using the abstractions can be tested and debugged on concrete inputs. When client code is symbolically compiled, the resulting constraints use domain-specific encodings. We demonstrate the idea on the first fully symbolic checker of type systems; a program partitioner; and a parallelizer of tree computations. In each of these case studies, symbolic languages improved on classical symbolic compilers by orders of magnitude

Dagstuhl Research Online Publication Server

Learning Large Graph Property Prediction via Graph Segment Training

Author: Abu-El-Haija Sami
Cao Kaidi
Leskovec Jure
Mendis Charith
Perozzi Bryan
Phothilimthana Phitchaya Mangpo
Zelle Dustin
Zhou Yanqi
Publication venue
Publication date: 28/05/2023
Field of study

Learning to predict properties of large graphs is challenging because each prediction requires the knowledge of an entire graph, while the amount of memory available during training is bounded. Here we propose Graph Segment Training (GST), a general framework that utilizes a divide-and-conquer approach to allow learning large graph property prediction with a constant memory footprint. GST first divides a large graph into segments and then backpropagates through only a few segments sampled per training iteration. We refine the GST paradigm by introducing a historical embedding table to efficiently obtain embeddings for segments not sampled for backpropagation. To mitigate the staleness of historical embeddings, we design two novel techniques. First, we finetune the prediction head to fix the input distribution shift. Second, we introduce Stale Embedding Dropout to drop some stale embeddings during training to reduce bias. We evaluate our complete method GST-EFD (with all the techniques together) on two large graph property prediction benchmarks: MalNet and TpuGraphs. Our experiments show that GST-EFD is both memory-efficient and fast, while offering a slight boost on test accuracy over a typical full graph training regime

arXiv.org e-Print Archive

Portable performance on heterogeneous architectures

Author: Fursin Grigori
Jason Ansel
Jonathan Ragan-Kelley
Katherine Yelick Im
Phitchaya Mangpo Phothilimthana
Saman Amarasinghe
Volkov V.
Vuduc Richard
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/03/2013
Field of study

Trends in both consumer and high performance computing are bringing not only more cores, but also increased heterogeneity among the computational resources within a single machine. In many machines, one of the greatest computational resources is now their graphics coprocessors (GPUs), not just their primary CPUs. But GPU programming and memory models differ dramatically from conventional CPUs, and the relative performance characteristics of the different processors vary widely between machines. Different processors within a system often perform best with different algorithms and memory usage patterns, and achieving the best overall performance may require mapping portions of programs across all types of resources in the machine. To address the problem of efficiently programming machines with increasingly heterogeneous computational resources, we propose a programming model in which the best mapping of programs to processors and memories is determined empirically. Programs define choices in how their individual algorithms may work, and the compiler generates further choices in how they can map to CPU and GPU processors and memory systems. These choices are given to an empirical autotuning framework that allows the space of possible implementations to be searched at installation time. The rich choice space allows the autotuner to construct poly-algorithms that combine many different algorithmic techniques, using both the CPU and the GPU, to obtain better performance than any one technique alone. Experimental results show that algorithmic changes, and the varied use of both CPUs and GPUs, are necessary to obtain up to a 16.5x speedup over using a single program configuration for all architectures.United States. Dept. of Energy (Award DE-SC0005288)United States. Defense Advanced Research Projects Agency (Award HR0011-10-9-0009)National Science Foundation (U.S.) (Award CCF-0632997

DSpace@MIT

Crossref

Neural Architecture Search using Property Guided Synthesis

Author: Jin Charles
Phothilimthana Phitchaya Mangpo
Roy Sudip
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 03/11/2022
Field of study

DSpace@MIT

Scaling up Superoptimization

Author: Aditya Thakur
Dinakar Dhurjati
Phitchaya Mangpo Phothilimthana
Rastislav Bodik
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

COMMUNICATION-MINIMIZING 2D CONVOLUTION IN GPU REGISTERS

Author: David Sheffield
Forrest N. I
Kurt Keutzer
Michael Anderson
Phitchaya Mangpo Phothilimthana
Publication venue
Publication date: 23/07/2013
Field of study

2D image convolution is ubiquitous in image processing and computer vision problems such as feature extraction. Exploiting parallelism is a common strategy for accelerating convolution. Parallel processors keep getting faster, but algorithms such as image convolution remain memory bounded on parallel processors such as GPUs. Therefore, reducing memory communication is fundamental to accelerating image convolution. To reduce memory communication, we reorganize the convolution algorithm to prefetch image regions to register, and we do more work per thread with fewer threads. To enable portability to future architectures, we implement a convolution autotuner that sweeps the design space of memory layouts and loop unrolling configurations. We focus on convolution with small filters (2x2–7x7), but our techniques can be extended to larger filter sizes. Depending on filter size, our speedups on two NVIDIA architectures range from 1.2x to 4.5x over state-of-the-art GPU libraries. Index Terms — Convolution, parallel, GPU, autotuning 1

CiteSeerX

Crossref

Paving the Way for NFV Acceleration

Author: Agarwal S.
Ajayan A. C.
Anthony A.
Barbette T.
Belay Adam
Cao L.
Chowdhury S. R.
Fan Bin
Fangming Liu
Fei Xincai
Ferkouss O. E.
Garzarella S.
Go Younghwan
Hai Jin
Hongxin Hu
Jamshed Muhammad Asim
Jang Keon
Jeong EunYoung
Jin Xin
Jose Lavanya
Kablan Murad
Kalia Anuj
Katsikas Georgios P.
Katsikas Georgios P.
Khalid Junaid
Kourtis M.
Le Yanfang
Li P.
Meng Z.
Nam J.
Neugebauer Rolf
Pall Mike
Panda Aurojit
Paolino M.
Pfaff Ben
Phothilimthana Phitchaya Mangpo
Qixia Zhang
Ram Kaushik Kumar
Rizzo Luigi
Rosa R. V.
Sabin G.
Sekar Vyas
Sultana Nik
Vasiliadis Giorgos
Woo Shinae
Xincai Fei
Xu Cong
Zhang Kai
Zhou Dong
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref