27 research outputs found
LLM for SoC Security: A Paradigm Shift
As the ubiquity and complexity of system-on-chip (SoC) designs increase
across electronic devices, the task of incorporating security into an SoC
design flow poses significant challenges. Existing security solutions are
inadequate to provide effective verification of modern SoC designs due to their
limitations in scalability, comprehensiveness, and adaptability. On the other
hand, Large Language Models (LLMs) are celebrated for their remarkable success
in natural language understanding, advanced reasoning, and program synthesis
tasks. Recognizing an opportunity, our research delves into leveraging the
emergent capabilities of Generative Pre-trained Transformers (GPTs) to address
the existing gaps in SoC security, aiming for a more efficient, scalable, and
adaptable methodology. By integrating LLMs into the SoC security verification
paradigm, we open a new frontier of possibilities and challenges to ensure the
security of increasingly complex SoCs. This paper offers an in-depth analysis
of existing works, showcases practical case studies, demonstrates comprehensive
experiments, and provides useful promoting guidelines. We also present the
achievements, prospects, and challenges of employing LLM in different SoC
security verification tasks.Comment: 42 page
LLM for SoC Security: A Paradigm Shift
As the ubiquity and complexity of system-on-chip (SoC) designs increase across electronic devices, the task of incorporating security into an SoC design flow poses significant challenges. Existing security solutions are inadequate to provide effective verification of modern SoC designs due to their limitations in scalability, comprehensiveness, and adaptability. On the other hand, Large Language Models (LLMs) are celebrated for their remarkable success in natural language understanding, advanced reasoning, and program synthesis tasks. Recognizing an opportunity, our research delves into leveraging the emergent capabilities of Generative Pre-trained Transformers (GPTs) to address the existing gaps in SoC security, aiming for a more efficient, scalable, and adaptable methodology. By integrating LLMs into the SoC security verification paradigm, we open a new frontier of possibilities and challenges to ensure the security of increasingly complex SoCs. This paper offers an in-depth analysis of existing works, showcases practical case studies, demonstrates comprehensive experiments, and provides useful promoting guidelines. We also present the achievements, prospects, and challenges of employing LLM in different SoC security verification tasks
Unlocking Hardware Security Assurance: The Potential of LLMs
System-on-Chips (SoCs) form the crux of modern computing systems. SoCs enable
high-level integration through the utilization of multiple Intellectual
Property (IP) cores. However, the integration of multiple IP cores also
presents unique challenges owing to their inherent vulnerabilities, thereby
compromising the security of the entire system. Hence, it is imperative to
perform hardware security validation to address these concerns. The efficiency
of this validation procedure is contingent on the quality of the SoC security
properties provided. However, generating security properties with traditional
approaches often requires expert intervention and is limited to a few IPs,
thereby resulting in a time-consuming and non-robust process. To address this
issue, we, for the first time, propose a novel and automated Natural Language
Processing (NLP)-based Security Property Generator (NSPG). Specifically, our
approach utilizes hardware documentation in order to propose the first hardware
security-specific language model, HS-BERT, for extracting security properties
dedicated to hardware design. To evaluate our proposed technique, we trained
the HS-BERT model using sentences from RISC-V, OpenRISC, MIPS, OpenSPARC, and
OpenTitan SoC documentation. When assessedb on five untrained OpenTitan
hardware IP documents, NSPG was able to extract 326 security properties from
1723 sentences. This, in turn, aided in identifying eight security bugs in the
OpenTitan SoC design presented in the hardware hacking competition, Hack@DAC
2022
Survey of Approaches and Techniques for Security Verification of Computer Systems
This paper surveys the landscape of security verification approaches and techniques for computer systems at various levels: from a software-application level all the way to the physical hardware level. Different existing projects are compared, based on the tools used and security aspects being examined. Since many systems require both hardware and software components to work together to provide the system\u27s promised security protections, it is not sufficient to verify just the software levels or just the hardware levels in a mutually exclusive fashion. This survey especially highlights system levels that are verified by the different existing projects and presents to the readers the state of the art in hardware and software system security verification. Few approaches come close to providing full-system verification, and there is still much room for improvement
Learned interpreters : structural and learned systematicity in neural networks for program execution
Les architectures de rĂ©seaux de neurones profonds Ă usage gĂ©nĂ©ral ont fait des progrĂšs surprenants dans l'apprentissage automatique pour le code, permettant lâamĂ©lioration de la complĂ©tion de code, la programmation du langage naturel, la dĂ©tection et la rĂ©paration des bogues, et mĂȘme la rĂ©solution de problĂšmes de programmation compĂ©titifs Ă un niveau de performance humain. NĂ©anmoins, ces mĂ©thodes ont du mal Ă comprendre le processus d'exĂ©cution du code, mĂȘme lorsqu'il s'agit de code qu'ils Ă©crivent eux-mĂȘmes. Ă cette fin, nous explorons une architecture du rĂ©seau neuronal inspirĂ© dâinterprĂ©teur de code, via une nouvelle famille d'architecture appelĂ©e Instruction Pointer Attention Graph Neural Networks (IPA-GNN). Nous appliquons cette famille d'approches Ă plusieurs tĂąches nĂ©cessitant un raisonnement sur le comportement d'exĂ©cution du programme : apprendre Ă exĂ©cuter des programmes complets et partiels, prĂ©dire la couverture du code pour la vĂ©rification du matĂ©riel, et prĂ©dire les erreurs d'exĂ©cution dans des programmes de compĂ©tition. GrĂące Ă cette sĂ©rie de travaux, nous apportons plusieurs contributions et rencontrons de multiples rĂ©sultats surprenants et prometteurs. Nous introduisons une bibliothĂšque Python pour construire des reprĂ©sentations de graphes des programmes utiles dans la recherche sur l'apprentissage automatique, qui sert de fondement Ă la recherche dans cette thĂšse et dans la communautĂ© de recherche plus large. Nous introduisons Ă©galement de riches ensembles de donnĂ©es Ă grande Ă©chelle de programmes annotĂ©s avec le comportement du programme (les sorties et les erreurs soulevĂ©es lors de son exĂ©cution) pour faciliter la recherche dans ce domaine. Nous constatons que les mĂ©thodes IPA-GNN prĂ©sentent une forte gĂ©nĂ©ralisation amĂ©liorĂ©e par rapport aux mĂ©thodes Ă usage gĂ©nĂ©ral, fonctionnant bien lorsqu'ils sont entraĂźnĂ©s pour exĂ©cuter uniquement des programmes courts mais testĂ©s sur des programmes plus longs. En fait, nous constatons que les mĂ©thodes IPA-GNN surpassent les mĂ©thodes gĂ©nĂ©riques sur chacune des tĂąches de modĂ©lisation du comportement que nous considĂ©rons dans les domaines matĂ©riel et logiciel. Nous constatons mĂȘme que les mĂ©thodes inspirĂ©es par l'interprĂ©teur de code qui modĂ©lisent explicitement la gestion des exceptions ont une propriĂ©tĂ© interprĂ©tative souhaitable, permettant la prĂ©diction des emplacements d'erreur mĂȘme lorsqu'elles n'ont Ă©tĂ© entraĂźnĂ©es qu'Ă prĂ©dire la prĂ©sence d'erreur et le type d'erreur. Au total, les architectures inspirĂ©es des interprĂ©teurs de code comme l'IPA-GNN reprĂ©sentent un chemin prometteur Ă suivre pour imprĂ©gner des rĂ©seaux de neurones avec de nouvelles capacitĂ©s pour apprendre Ă raisonner sur les exĂ©cutions de programme.General purpose deep neural network architectures have made startling advances in machine learning for code, advancing code completion, enabling natural language programming, detecting and repairing bugs, and even solving competitive programming problems at a human level of performance. Nevertheless, these methods struggle to understand the execution behavior of code, even when it is code they write themselves. To this end, we explore interpreter-inspired neural network architectures, introducing a novel architecture family called instruction pointer attention graph neural networks (IPA-GNN). We apply this family of approaches to several tasks that require reasoning about the execution behavior of programs: learning to execute full and partial programs, code coverage prediction for hardware verification, and predicting runtime errors in competition programs. Through this series of works we make several contributions and encounter multiple surprising and promising results. We introduce a Python library for constructing graph representations of programs for use in machine learning research, which serves as a bedrock for the research in this thesis and in the broader research community. We also introduce rich large scale datasets of programs annotated with program behavior like outputs and errors raised to facilitate research in this domain. We find that IPA-GNN methods exhibit improved strong generalization over general purpose methods, performing well when trained to execute only on short programs and tested on significantly longer programs. In fact, we find that IPA-GNN methods outperform generic methods on each of the behavior modeling tasks we consider across both hardware and software domains. We even find that interpreter-inspired methods that model exception handling explicitly have a desirable interpretability property, enabling the prediction of error locations even when only trained on error presence and kind. In total, interpreter-inspired architectures like the IPA-GNN represent a promising path forward for imbuing neural networks with novel capabilities for learning to reason about program executions
Semantic Fuzzing with Zest
Programs expecting structured inputs often consist of both a syntactic
analysis stage, which parses raw input, and a semantic analysis stage, which
conducts checks on the parsed input and executes the core logic of the program.
Generator-based testing tools in the lineage of QuickCheck are a promising way
to generate random syntactically valid test inputs for these programs. We
present Zest, a technique which automatically guides QuickCheck-like
randominput generators to better explore the semantic analysis stage of test
programs. Zest converts random-input generators into deterministic parametric
generators. We present the key insight that mutations in the untyped parameter
domain map to structural mutations in the input domain. Zest leverages program
feedback in the form of code coverage and input validity to perform
feedback-directed parameter search. We evaluate Zest against AFL and QuickCheck
on five Java programs: Maven, Ant, BCEL, Closure, and Rhino. Zest covers
1.03x-2.81x as many branches within the benchmarks semantic analysis stages as
baseline techniques. Further, we find 10 new bugs in the semantic analysis
stages of these benchmarks. Zest is the most effective technique in finding
these bugs reliably and quickly, requiring at most 10 minutes on average to
find each bug.Comment: To appear in Proceedings of 28th ACM SIGSOFT International Symposium
on Software Testing and Analysis (ISSTA'19