Search CORE

12 research outputs found

Towards Safe, Secure, and Usable LLMs4Code

Author: Al-Kaswan A. (author)
Publication venue: IEEE
Publication date: 01/01/2024
Field of study

Large Language Models (LLMs) are gaining popularity in the field of Natural Language Processing (NLP) due to their remarkable accuracy in various NLP tasks. LLMs designed for coding are trained on massive datasets, which enables them to learn the structure and syntax of programming languages. These datasets are scraped from the web and LLMs memorise information in these datasets. LLMs for code are also growing, making them more challenging to execute and making users increasingly reliant on external infrastructure.We aim to explore the challenges faced by LLMs for code and propose techniques to measure and prevent memorisation. Additionally, we suggest methods to compress models and run them locally on consumer hardware.Software Engineerin

TU Delft Repository

Targeted Attack on GPT-Neo for the SATML Language Model Data Extraction Challenge [PRESENTATION]

Author: Al-Kaswan A.
Izadi M.
van Deursen A.
Publication venue
Publication date: 01/01/2023
Field of study

Previous work has shown that Large Language Models are susceptible to so-called data extraction attacks. This allows an attacker to extract a sample that was contained in the training data, which has massive privacy implications. The construction of data extraction attacks is challenging, current attacks are quite inefficient, and there exists a significant gap in the extraction capabilities of untargeted attacks and memorization. Thus, targeted attacks are proposed, which identify if a given sample from the training data, is extractable from a model. In this work, we apply a targeted data extraction attack to the SATML2023 Language Model Training Data Extraction Challenge. We apply a two-step approach. In the first step, we maximise the recall of the model and are able to extract the suffix for 69% of the samples. In the second step, we use a classifier-based Membership Inference Attack on the generations. Our AutoSklearn classifier achieves a precision of 0.841. The full approach reaches a score of 0.405 recall at a 10% false positive rate, which is an improvement of 34% over the baseline of 0.301

Targeted Attack on GPT-Neo for the SATML Language Model Data Extraction Challenge [PRESENTATION]

Author: Al-Kaswan A. (author)
Izadi M. (author)
van Deursen A. (author)
Publication venue
Publication date: 01/01/2023
Field of study

TU Delft Repository

STACC: Code Comment Classification using SentenceTransformers

Author: Al-Kaswan A. (author)
Izadi M. (author)
van Deursen A. (author)
Publication venue: IEEE
Publication date: 01/01/2023
Field of study

Code comments are a key resource for information about software artefacts. Depending on the use case, only some types of comments are useful. Thus, automatic approaches to clas-sify these comments have been proposed. In this work, we address this need by proposing, STACC, a set of SentenceTransformers- based binary classifiers. These lightweight classifiers are trained and tested on the NLBSE Code Comment Classification tool competition dataset, and surpass the baseline by a significant margin, achieving an average Fl score of 0.74 against the baseline of 0.31, which is an improvement of 139%. A replication package, as well as the models themselves, are publicly available.Green Open Access added to TU Delft Institutional Repository ‘You share, we take care!’ – Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.Software EngineeringSoftware Technolog

TU Delft Repository

Extending Source Code Pre-Trained Language Models to Summarise Decompiled Binarie

Author: Ahmed Toufique
Al-Kaswan A.
Ceballos Cristina
Devanbu Premkumar
Izadi M.
Sawant Anand Ashok
van Deursen A.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2023
Field of study

Reverse engineering binaries is required to understand and analyse programs for which the source code is unavailable. Decompilers can transform the largely unreadable binaries into a more readable source code-like representation. However, reverse engineering is time-consuming, much of which is taken up by labelling the functions with semantic information.While the automated summarisation of decompiled code can help Reverse Engineers understand and analyse binaries, current work mainly focuses on summarising source code, and no suitable dataset exists for this task.In this work, we extend large pre-trained language models of source code to summarise decompiled binary functions. Furthermore, we investigate the impact of input and data properties on the performance of such models. Our approach consists of two main components; the data and the model.We first build CAPYBARA, a dataset of 214K decompiled function-documentation pairs across various compiler optimisations. We extend CAPYBARA further by generating synthetic datasets and deduplicating the data.Next, we fine-tune the CodeT5 base model with CAPYBARA to create BinT5. BinT5 achieves the state-of-the-art BLEU-4 score of 60.83, 58.82, and 44.21 for summarising source, decompiled, and synthetically stripped decompiled code, respectively. This indicates that these models can be extended to decompiled binaries successfully.Finally, we found that the performance of BinT5 is not heavily dependent on the dataset size and compiler optimisation level. We recommend future research to further investigate transferring knowledge when working with less expressive input formats such as stripped binaries

TU Delft Repository

DataFlex: Educational game about data centers for children

Author: Al-Kaswan A. (author)
d' Abreu de Paulo G. (author)
El Attar B. (author)
Kronstadt L.J. (author)
Wiemers G. (author)
Publication venue
Publication date: 02/07/2020
Field of study

Women are largely underrepresented in IT, girls’ interest in STEM and IT fields tends to drop throughout secondary education. Educational games are a great tool to change the perception of certain topics, as well as changing the behavior of the players. Thus, this report describes the development of a game to make the field of IT more appealing to girls between the ages of 10 and 14.After collecting requirements with the client and doing a literature study a design is proposed. The final product is a two-player 2D Role-Playing-Game with puzzle elements, specifically designed to be played in a classroom environment. The game takes place in a data center and will show the players the societal importance of data centers as well as the diversity of the work in data centers. The gameplay consists of exploring a data center, talking with both male and female employees in various roles, helping them with their work through minigames, and solving a mystery. The game was designed to specifically cater to girls and to break stereotypes regarding women in IT. <br/

TU Delft Repository

A comprehensive and systematized review of energy-efficient routing protocols in wireless sensor networks

Author: Al-Ariki HDE
Al-Kiyumi RM
Deepa O
Dehghani S
Demirkol I
Ding X-X
Dutt S
Gherbi C
Guleria K
Haseeb K
Iabbassen D
Jadidoleslamy H.
Jayashree A
Kapil Gupta
Kaswan A
Khabiri M
Krishnamoorthy A
Kumar R
Laouid A
Li C
Liu T
Mann PS
Marappan P
Maratha P.
Md Zin S
Mohamed RE
Ogundoyin SO.
Orojloo H
Priti Maratha
Radi M
Rahat AAM
Rashid B
Riaz S
Rong F
Sajwan M
Sajwan M
Sarkar A
Senthil T
Sha K
Shamila Ebenezer A
Sharma A
Stavrou E
Sun X
Tabibi S
Tanessakulwattana S
Tarique M
Yadav S
Yan H
Publication venue: 'Informa UK Limited'
Publication date
Field of study

Crossref

Particle swarm optimization-based energy efficient clustering protocol in wireless sensor network

Author: A Datta
A Kaswan
A Tripathi
AS Rostami
AS Toor
BA Attea
BM Sahoo
C Gherbi
CF García-hernández
D Mehta
DR Edla
DR Edla
F Fanian
I Dietrich
J Wang
J Yick
JN Al-Karaki
K Akkaya
KA Darabkh
KS Arikumar
M Azharuddin
M Farsi
P Rawat
P Rawat
P Rawat
P Rawat
P Rawat
PCS Rao
R Priyadarshi
R Priyadarshi
RSY Elhabyan
S Arjunan
SK Singh
T Preethiya
T Shankar
V Anand
ZM Zahedi
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Analysis on the predictive value of different variables in pulp stones appearance frequency and its pulpal response to cold stimuli

Author: A Al-Hadi Hamasha
A Arys
A Gulsahi
A Tamse
AAM Moura
AC Edds
Camilo Abalos-Labruzzi
CY Hsieh
E Tarim Ertas
Elena Guerrero-Belizón
Francisco Javier López-Frías
FS Sayegh
G Bevelander
G Hillmann
GJ Siskos
H Çolak
Javier Gil-Flores
JM Van DenBerghe
JR Sundell
L Moss-Salentijn
M Ninomiya
M Syrynska
M Turkal
Manuela Herrera-Martínez
MP Mohan
O Kansu
R Jannati
R Näsström
S Kannan
S Kaswan
S Ranjitkar
S Ravanshad
S Sener
SR Patil
Victoria Bonilla-Represa
VS Baghdaly
Y Inagaki
Y Sisman
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

An efficient and green synthesis of novel highly functionalized nitrogen-fused pyrido[2′,3′:3,4]pyrazolo[1,5-a]pyrimidine derivatives using recyclable choline hydroxide

Author: A Kamal
A Reynolds
A Valentina
A Zhu
AG Al-Sehemi
AK Sanap
AM Jadhav
AZ Sayed
B Tanwar
BL Gadilohar
C Karthikeyan
C Karthikeyan
DN Kommi
E Sanna
EL Crossley
G Auzzi
H Hu
J Castillo-Sánchez
J-C Castillo
K Manabe
K Porèba
K Senga
KS Vadagaonkar
KW Weitzel
LJ Phillipson
LK Sharma
ME Fraley
ML James
MP Martin
MR Shaaban
MS Mohamed
MSA El-Gaby
MV Reddy
MV Reddy
N Azizi
NR Kumar
OM Ahmed
P Kaswan
P Saikia
PC Tsai
R Ding
RMN Kalla
S Selleri
SK Krishnammagari
Suresh Kumar Krishnammagari
T Elsaman
T Novinson
VV Shinde
W Liu
W Lu
WM Al-Adiwish
Yeon Tae Jeong
Z Jia
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref