Search CORE

13 research outputs found

Scope is all you need: Transforming LLMs for HPC Code

Author: Ahmed Nesreen
Hasabnis Niranjan
Kadosh Tal
Krien Neva
Mattson Timothy
Oren Gal
Pinter Yuval
Schneider Nadav
Tamir Guy
Vo Vy A.
Wasay Abdul
Willke Ted
Publication venue
Publication date: 29/09/2023
Field of study

With easier access to powerful compute resources, there is a growing trend in the field of AI for software development to develop larger and larger language models (LLMs) to address a variety of programming tasks. Even LLMs applied to tasks from the high-performance computing (HPC) domain are huge in size (e.g., billions of parameters) and demand expensive compute resources for training. We found this design choice confusing - why do we need large LLMs trained on natural languages and programming languages unrelated to HPC for HPC-specific tasks? In this line of work, we aim to question design choices made by existing LLMs by developing smaller LLMs for specific domains - we call them domain-specific LLMs. Specifically, we start off with HPC as a domain and propose a novel tokenizer named Tokompiler, designed specifically for preprocessing code in HPC and compilation-centric tasks. Tokompiler leverages knowledge of language primitives to generate language-oriented tokens, providing a context-aware understanding of code structure while avoiding human semantics attributed to code structures completely. We applied Tokompiler to pre-train two state-of-the-art models, SPT-Code and Polycoder, for a Fortran code corpus mined from GitHub. We evaluate the performance of these models against the conventional LLMs. Results demonstrate that Tokompiler significantly enhances code completion accuracy and semantic understanding compared to traditional tokenizers in normalized-perplexity tests, down to ~1 perplexity score. This research opens avenues for further advancements in domain-specific LLMs, catering to the unique demands of HPC and compilation tasks

arXiv.org e-Print Archive

Devil is Virtual: Reversing Virtual Inheritance in C++ Binaries

Complexities that arise from implementation of object-oriented concepts in C++ such as virtual dispatch and dynamic type casting have attracted the attention of attackers and defenders alike. Binary-level defenses are dependent on full and precise recovery of class inheritance tree of a given program. While current solutions focus on recovering single and multiple inheritances from the binary, they are oblivious to virtual inheritance. Conventional wisdom among binary-level defenses is that virtual inheritance is uncommon and/or support for single and multiple inheritances provides implicit support for virtual inheritance. In this paper, we show neither to be true. Specifically, (1) we present an efficient technique to detect virtual inheritance in C++ binaries and show through a study that virtual inheritance can be found in non-negligible number (more than 10\% on Linux and 12.5\% on Windows) of real-world C++ programs including Mysql and libstdc++. (2) we show that failure to handle virtual inheritance introduces both false positives and false negatives in the hierarchy tree. These false positves and negatives either introduce attack surface when the hierarchy recovered is used to enforce CFI policies, or make the hierarchy difficult to understand when it is needed for program understanding (e.g., during decompilation). (3) We present a solution to recover virtual inheritance from COTS binaries. We recover a maximum of 95\% and 95.5\% (GCC -O0) and a minimum of 77.5\% and 73.8\% (Clang -O2) of virtual and intermediate bases respectively in the virtual inheritance tree.Comment: Accepted at CCS20. This is a technical report versio

arXiv.org e-Print Archive

Crossref

Entrainer-Based Reactive Distillation for Esterification of Glycerol with Acetic Acid

Author: HASABNIS A
MAHAJANI S
Publication venue: 'American Chemical Society (ACS)'
Publication date: 01/01/2010
Field of study

The applicability of reactive distillation for esterification of glycerol with acetic acid in the presence of Amberlyst-15 as catalyst and ethylene dichloride as an entrainer is evaluated through experiments and simulation. The reaction is studied in both semibatch and continuous reactive distillation systems. The effect of different parameters such as entrainer amount, catalyst loading, and reboiler duty is studied. The results indicate that entrainer-based semibatch reactive distillation can enhance the selectivity toward triacetin to about 100%, which is much greater than that offered by any conventional reactor with stoichiometric mole ratio of reactants. Simulations for both sernibatch and continuous reactive distillation are performed, and results agree reasonably well with those obtained by experiments. The best possible design and operating parameters are obtained through detailed simulation using an experimentally validated model. A column configuration is recommended for a continuous process

Dspace at IIT Bombay

Transacetalization of Glycerol with Methylal by Reactive Distillation

Author: HASABNIS A
MAHAJANI S
Publication venue: 'American Chemical Society (ACS)'
Publication date: 01/01/2012
Field of study

The applicability of reactive distillation (RD) for the transacetalization of glycerol with methylal in the presence of Amberlyst-15 is studied by experiments and simulation. On the basis of the batch kinetic runs a pseudohomogeneous kinetic model is proposed. The experiments are performed on a continuous reactive distillation column and are compared with the predictions of the equilibrium stage model. Various feasible configurations of reactive distillation are identified and the experimentally validated simulator is used to investigate the effect of different design and operating parameters such as number of rectifying stages, stripping stages, feed mole ratio, reboiler duty, etc. on the performance in each case. The RD process alternatives and the conventional process of reaction followed by distillation are compared

Dspace at IIT Bombay

Acetalization of Glycerol with Formaldehyde by Reactive Distillation

Author: HASABNIS A
MAHAJANI S
Publication venue: 'American Chemical Society (ACS)'
Publication date: 01/01/2014
Field of study

The feasibility of reactive distillation (RD) for the reversible acetalization of glycerol with formaldehyde is evaluated through experiments and simulations. Simultaneous removal of acetal and water from the reactive zone of the RD column helps shift the reaction in the forward direction and achieve close to quantitative conversion levels. The results of laboratory-scale RD experiments performed in this study are compared with the ones predicted by simulation using the kinetics developed in the present work. Since commercial formaldehyde is available in the form of its aqueous solution, a large amount of water has to be removed to achieve substantial conversion. An experimentally validated simulator is thus used to design an appropriate RD configuration that offers minimum energy consumption. Toluene is used as an entrainer to remove water from the RD column. The process is compared with the reported indirect route of transacetalization of glycerol with methylal

Dspace at IIT Bombay

Quantitative Detection of PEGylated Biomacromolecules in Biological Fluids by NMR

Author: Advait Hasabnis (2556205)
Peter M. Macdonald (1499731)
R. Scott Prosser (1749892)
Rohan D. A. Alvares (2556208)
Publication venue
Publication date
Field of study

The accumulation, biodistribution, and clearance profiles of therapeutic agents are key factors relevant to their efficacy. Determining these properties constitutes an ongoing experimental challenge. Many such therapeutics, including small molecules, peptides, proteins, tissue scaffolds, and drug delivery vehicles, are conjugated to poly(ethylene glycol) (PEG) as this improves their bioavailability and in vivo stability. We demonstrate here that 1H NMR spectroscopy can be used to quantify PEGylated species in complex biological fluids directly, rapidly, and with minimal sample preparation. PEG bears a large number of spectroscopically equivalent protons exhibiting a narrow NMR line width while resonating at a 1H NMR frequency distinct from most other biochemical signals. We demonstrate that PEG provides a robust signal allowing detection of concentrations as low as 10 μg/mL in blood. This PEG detection limit is lowered by another order of magnitude when background proton signals are minimized using 13C-enriched PEG in combination with a double quantum filter to remove 1H signals from non-13C-labeled species. Quantitative detection of PEG via these methods is shown in pig blood and goat serum as examples of complex biological fluids. More practically, we quantify the blood clearance of 13C-PEG and PEGylated-BSA (bovine serum albumin) following their intravenous injection in live rats. Given the relative insensitivity of line width to PEG size, we anticipate that the biodistribution and clearance profiles of virtually any PEGylated biomacromolecule from biological fluid samples can be routinely measured by 1H NMR without any filtering or treatment steps

FigShare

Acetalization of Glycerol with Formaldehyde by Reactive Distillation

Author: Agirre I.
Albert M.
Amit Hasabnis
Brandani V.
Carolina X.
Chopade S. P.
Crotti C.
Deutsch J.
Drunsel J.
Hasabnis A. C.
Kolah A. K.
Maurer G.
Perez-Pariente J.
Ruiz V. R.
Sanjay Mahajani
Silva P. H. R.
Zhou C. H.
Publication venue: 'American Chemical Society (ACS)'
Publication date
Field of study

Crossref

Phenotyping slow leaf rusting components and validation of adult plant resistance genes in exotic wheat germplasm

Author: A Riaz
Ashutosh K
D. Koujalagi
DA Johnson
E Duveiller
ES Lagudah
HW Ohm
I. K. Kalappanavar
JA Kolmer
JA Kolmer
JE Plank Van der
JG Ellis
JJ Doyle
K Suenaga
LM Joshi
M Sivasamy
MK Das
PK Singh
PL Dyck
R Khanna
RA McIntosh
RA McIntosh
RD Wilcoxson
RE Peterson
RP Singh
Rudranaik R.V
S Nagarajan
S. A. Desai
S. S. Biradar
Sathisha T.N
SC Bhardwaj
SC Drijepondt
SE German
SK Nayar
SN Hasabnis
SN Hasabnis
SS Sokhi
UK Bansal
YA Wamishe
Yashavanthakumar K. J
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Entrainer-Enhanced Reactive Distillation for the Production of Butyl Acetate

Author: Bessling B.
Chiang S. F.
Chien I.-L.
De Jong M. C.
Dimian A. C.
Douglas J. M.
Elliott T. R.
Gangadwala J.
Gangadwala J.
Gunhyung Kim
Hanika J.
Hasabnis A. C.
Hu S.
Janowsky R.
Kienle A.
Kim B.
Lin Y. D.
Minjeong Cho
Myungwan Han
Sanghwan Jo
Singh A.
Singh Ajay
Singh B. P.
Steinigeweg S.
Suman T.
Tang Y. T.
Venimadhavan G.
Wang S. J.
Wang S. J.
Zhang B. J.
Zhicai Y.
Publication venue: 'American Chemical Society (ACS)'
Publication date
Field of study

Crossref

Identification and characterization of pleiotropic and co-located resistance loci to leaf rust and stripe rust in bread wheat cultivar Sujata

Author: AM Wan
BE Huang
Bhoja R. Basnet
BR Basnet
BR Basnet
C Feuillet
Caixia Lan
CG Chu
CIMMYT
CR Wellings
CX Lan
DJ Somers
DL Fu
DR Knott
EN Yang
Evans S. Lagudah
F Dedryver
F Lin
GM Rosewarne
GM Rosewarne
GM Rosewarne
GM Rosewarne
HH Flor
HM William
HS Bariana
HX Xu
JD Faris
Julio Huerta-Espino
JW Ooijen Van
K Suenaga
M Khan
M Lillemo
ME Bjarko
MG Francki
MK Das
MM Messmer
PL Dyck
R Johnson
Ravi P. Singh
RE Voorrips
RF Peterson
RP Singh
RP Singh
RP Singh
RS Ren
SA Herrera-Foessel
SA Herrera-Foessel
SG Krattinger
SN Hasabnis
Sybil A. Herrera-Foessel
T Schnurbusch
XM Chen
XM Chen
Y Ren
Y Ren
Yelun Zhang
ZF Li
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref