Search CORE

33 research outputs found

A dataflow IR for memory efficient RIPL compilation to FPGAs

Author: A Muddukrishna
GR Bradski
J Hegarty
K Kennedy
M Cole
V Wieser
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Field programmable gate arrays (FPGAs) are fundamentally different to fixed processors architectures because their memory hierarchies can be tailored to the needs of an algorithm. FPGA compilers for high level languages are not hindered by fixed memory hierarchies. The constraint when compiling to FPGAs is the availability of resources. In this paper we describe how the dataflow intermediary of our declarative FPGA image processing DSL called RIPL (Rathlin Image Processing Language) enables us to constrain memory. We use five benchmarks to demonstrate that memory use with RIPL is comparable to the Vivado HLS OpenCV library without the need for language pragmas to guide hardware synthesis. The benchmarks also show that RIPL is more expressive than the Darkroom FPGA image processing language

Crossref

Heriot Watt Pure

Stirling Online Research Repository (RIOXX)

Sheffield Hallam University Research Archive

Stirling Online Research Repository

Topology-Aware Parallelism for NUMA Copying Collectors

Author: A Muddukrishna
Lokesh Gidra
Mohammad Dashti
Takeshi Ogasawara
Xianglong Huang
Y Chicha
Yefim Shuf
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Abstract. NUMA-aware parallel algorithms in runtime systems attempt to improve locality by allocating memory from local NUMA nodes. Re-searchers have suggested that the garbage collector should profile mem-ory access patterns or use object locality heuristics to determine the tar-get NUMA node before moving an object. However, these solutions are costly when applied to every live object in the reference graph. Our earlier research suggests that connected objects represented by the rooted sub-graphs provide abundant locality and they are appropriate for NUMA architecture. In this paper, we utilize the intrinsic locality of rooted sub-graphs to improve parallel copying collector performance. Our new topology-aware parallel copying collector preserves rooted sub-graph integrity by moving the connected objects as a unit to the target NUMA node. In addition, it distributes and assigns the copying tasks to appropriate (i.e. NUMA node local) GC threads. For load balancing, our solution enforces locality on the work-stealing mechanism by stealing from local NUMA nodes only. We evaluated our approach on SPECjbb2013, DaCapo 9.12 and Neo4j. Results show an improvement in GC performance by up to 2.5x speedup and 37 % better application performance

CiteSeerX

Crossref

Enlighten

PHARMA 4.0–IMPACT OF THE INTERNET OF THINGS ON HEALTH CARE

Author: A. T. PRAJWAL
B. S. MUDDUKRISHNA
VASANTHARAJU S. G.
Publication venue: 'Innovare Academic Sciences Pvt Ltd'
Publication date: 07/09/2020
Field of study

The IoT in health care is currently booming in the world of health care in particular. Industry has risen from generation 1.0 to 4.0 during the Internet of things period. As we remember, we came across the exact submission of the traditional health care system. Each time the patient has needed to visit the clinic/hospitals, even for small complications that may affect the patient's medical costs along with time and energy. One more significant factor is also an emergency; otherwise, she/he/older population was unable to demand urgent assistance from the older system of healthcare. And yet somehow, the situation has changed with the use of the cyber-physical world; we are heading out of the 4th phase of the health care industry means smart health care network. This paper offers an insight into different facets of how healthcare systems such as doctors, hospitals, and of course, patients are powered by the internet of things and how can it track and ensure fast, quality, and efficient use of less time also in a smart way. Here, a patient knows how to track patients by using a collection of different wearable sensor nodes for real-time monitoring and examination of specific patient criteria. One of the most boosting subjects characterizes the development of medical technology within their own homes, enabling older or physically weak people to stay as long as possible at home while being medically cared for and monitored. We searched literature and guidelines in Pubmed, Web of Science, Google Scholar, Scopus, CNKI, and Embase databases up to 2019. The following search terms alone or matched with the Boolean operators ‘AND’ or ‘OR’ were used: "Nanoparticles", “Anticancer treatment", ‘Bioflavonoids’, ‘Plant origin drugs’, ‘Nano formulations’, ‘Cancer’ and ‘Novel drug delivery systems’. We focused on full-text articles, but abstracts were considered if relevant

Innovare Academic Sciences: E-Journals

Locality-Aware Task Scheduling and Data Distribution for OpenMP Programs on NUMA Systems and Manycore Processors

Author: Ananya Muddukrishna
Mats Brorsson
Peter A. Jonsson
Publication venue: Hindawi Limited
Publication date: 01/01/2015
Field of study

Performance degradation due to nonuniform data access latencies has worsened on NUMA systems and can now be felt on-chip in manycore processors. Distributing data across NUMA nodes and manycore processor caches is necessary to reduce the impact of nonuniform latencies. However, techniques for distributing data are error-prone and fragile and require low-level architectural knowledge. Existing task scheduling policies favor quick load-balancing at the expense of locality and ignore NUMA node/manycore cache access latencies while scheduling. Locality-aware scheduling, in conjunction with or as a replacement for existing scheduling, is necessary to minimize NUMA effects and sustain performance. We present a data distribution and locality-aware scheduling technique for task-based OpenMP programs executing on NUMA systems and manycore processors. Our technique relieves the programmer from thinking of NUMA system/manycore processor architecture details by delegating data distribution to the runtime system and uses task data dependence information to guide the scheduling of OpenMP tasks to reduce data stall times. We demonstrate our technique on a four-socket AMD Opteron machine with eight NUMA nodes and on the TILEPro64 processor and identify that data distribution and locality-aware task scheduling improve performance up to 69% for scientific benchmarks compared to default policies and yet provide an architecture-oblivious approach for programmers

Directory of Open Access Journals

Improving Perfect Parallelism

Author: A Muddukrishna
E Polizzi
S Zhuravlev
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Characterizing task-based OpenMP programs.

Author: Ananya Muddukrishna
Mats Brorsson
Peter A Jonsson
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2015
Field of study

Programmers struggle to understand performance of task-based OpenMP programs since profiling tools only report thread-based performance. Performance tuning also requires task-based performance in order to balance per-task memory hierarchy utilization against exposed task parallelism. We provide a cost-effective method to extract detailed task-based performance information from OpenMP programs. We demonstrate the utility of our method by quickly diagnosing performance problems and characterizing exposed task parallelism and per-task instruction profiles of benchmarks in the widely-used Barcelona OpenMP Tasks Suite. Programmers can tune performance faster and understand performance tradeoffs more effectively than existing tools by using our method to characterize task-based performance

Directory of Open Access Journals

PubMed Central

anamud/mir-dev: MIR v1.0.0

Author: Ananya Muddukrishna
Peder Langdal
Peter A. Jonsson
Publication venue
Publication date
Field of study

This is the first release of MIR, hence v1.0.0. MIR is a task-based runtime system library specialized for high performance execution and detailed yet cost-effective profiling of OpenMP programs. Fork the latest development version and submit issues at https://github.com/anamud/mir-dev. Thank you for your interest in MIR

ZENODO

Data sizes to study BOTS input sensitivity.

Author: Ananya Muddukrishna (5662918)
Mats Brorsson (720584)
Peter A. Jonsson (720583)
Publication venue
Publication date
Field of study

<p>* UTS is a synthetic stress benchmark whose default inputs produce an extraordinary amount of tasks—approx. 1.5–4 billion—which cannot be profiled using our system. We have chosen input sets for UTS which produce approx. 100–300 thousand tasks and maintain stress.</p><p>Data sizes to study BOTS input sensitivity.</p

FigShare

Photodegradation of Methylcobalamin and Its Determination in a Commercial Formulation

Author: A. H. Chamle
A. Pai
B. S. Muddukrishna
N. L. J. Shane
Publication venue: 'OMICS Publishing Group'
Publication date: 01/01/2019
Field of study

Crossref

Diagnosing performance problems using thread-based performance metrics in BOTS Sort and Strassen.

Author: Ananya Muddukrishna (5662918)
Mats Brorsson (720584)
Peter A. Jonsson (720583)
Publication venue
Publication date
Field of study

<p>Sort input: array size = 64M elements, quicksort cutoff = {4096 (default), 262144}, sequential merge sort cutoff same as quicksort cutoff, insertion sort cutoff = 128. Strassen input: dimension = 4096, cutoff = 128 (default). Blocked matrix multiplication (blk-matmul) input: dimension = 4096, block size = 128. Executed on all cores of 48-core AMD Opteron 6172 machine running at highest frequency with frequency scaling turned off. (a) Speedup (b) Visualization of state traces from 6/48 threads executing Sort with default cutoffs. White bars indicate task creation, black bars, task execution and gray bars, task synchronization. The six threads are bound to cores on different dies.</p

FigShare