Search CORE

1,917 research outputs found

Redesigning OP2 Compiler to Use HPX Runtime Asynchronous Techniques

Author: Kaiser Hartmut
Khatami Zahra
Ramanujam J.
Publication venue
Publication date: 27/03/2017
Field of study

Maximizing parallelism level in applications can be achieved by minimizing overheads due to load imbalances and waiting time due to memory latencies. Compiler optimization is one of the most effective solutions to tackle this problem. The compiler is able to detect the data dependencies in an application and is able to analyze the specific sections of code for parallelization potential. However, all of these techniques provided with a compiler are usually applied at compile time, so they rely on static analysis, which is insufficient for achieving maximum parallelism and producing desired application scalability. One solution to address this challenge is the use of runtime methods. This strategy can be implemented by delaying certain amount of code analysis to be done at runtime. In this research, we improve the parallel application performance generated by the OP2 compiler by leveraging HPX, a C++ runtime system, to provide runtime optimizations. These optimizations include asynchronous tasking, loop interleaving, dynamic chunk sizing, and data prefetching. The results of the research were evaluated using an Airfoil application which showed a 40-50% improvement in parallel performance.Comment: 18th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC 2017

arXiv.org e-Print Archive

Crossref

Fast Differentially Private Matrix Factorization

Author: Ahn S.
Chen T.
Ding N.
Hartstein A.
Keshavan R.
Kyrola A.
Marsaglia G.
Meka R.
Mir D. J.
Neal R. M.
Niu F.
Sato I.
Srebro N.
Wang Y.-X.
Wang Y.-X.
Welling M.
Xin Y.
Zhao H.
Publication venue
Publication date: 07/05/2015
Field of study

Differentially private collaborative filtering is a challenging task, both in terms of accuracy and speed. We present a simple algorithm that is provably differentially private, while offering good performance, using a novel connection of differential privacy to Bayesian posterior sampling via Stochastic Gradient Langevin Dynamics. Due to its simplicity the algorithm lends itself to efficient implementation. By careful systems design and by exploiting the power law behavior of the data to maximize CPU cache bandwidth we are able to generate 1024 dimensional models at a rate of 8.5 million recommendations per second on a single PC

arXiv.org e-Print Archive

Crossref

Leveraging Program Analysis to Reduce User-Perceived Latency in Mobile Applications

Author: Mickens James W
Netravali Ravi
Ossa B De La
PRESTO
Ravindranath Lenin
Wang Xiao Sophia
Wu C.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 20/10/2018
Field of study

Reducing network latency in mobile applications is an effective way of improving the mobile user experience and has tangible economic benefits. This paper presents PALOMA, a novel client-centric technique for reducing the network latency by prefetching HTTP requests in Android apps. Our work leverages string analysis and callback control-flow analysis to automatically instrument apps using PALOMA's rigorous formulation of scenarios that address "what" and "when" to prefetch. PALOMA has been shown to incur significant runtime savings (several hundred milliseconds per prefetchable HTTP request), both when applied on a reusable evaluation benchmark we have developed and on real applicationsComment: ICSE 201

arXiv.org e-Print Archive

Crossref