Search CORE

1,459 research outputs found

Many bioinformatics programming tasks can be automated with ChatGPT

Author: Denny Paul
Luxton-Reilly Andrew
Payne Samuel
Piccolo Stephen R.
Ridge Perry G.
Publication venue
Publication date: 07/03/2023
Field of study

Computer programming is a fundamental tool for life scientists, allowing them to carry out many essential research tasks. However, despite a variety of educational efforts, learning to write code can be a challenging endeavor for both researchers and students in life science disciplines. Recent advances in artificial intelligence have made it possible to translate human-language prompts to functional code, raising questions about whether these technologies can aid (or replace) life scientists' efforts to write code. Using 184 programming exercises from an introductory-bioinformatics course, we evaluated the extent to which one such model -- OpenAI's ChatGPT -- can successfully complete basic- to moderate-level programming tasks. On its first attempt, ChatGPT solved 139 (75.5%) of the exercises. For the remaining exercises, we provided natural-language feedback to the model, prompting it to try different approaches. Within 7 or fewer attempts, ChatGPT solved 179 (97.3%) of the exercises. These findings have important implications for life-sciences research and education. For many programming tasks, researchers no longer need to write code from scratch. Instead, machine-learning models may produce usable solutions. Instructors may need to adapt their pedagogical approaches and assessment techniques to account for these new capabilities that are available to the general public.Comment: 13 pages, 4 figures, to be submitted for publicatio

arXiv.org e-Print Archive

Compiler and Runtime for Memory Management on Software Managed Manycore Processors

Author
Publication venue
Publication date: 01/01/2014
Field of study

abstract: We are expecting hundreds of cores per chip in the near future. However, scaling the memory architecture in manycore architectures becomes a major challenge. Cache coherence provides a single image of memory at any time in execution to all the cores, yet coherent cache architectures are believed will not scale to hundreds and thousands of cores. In addition, caches and coherence logic already take 20-50% of the total power consumption of the processor and 30-60% of die area. Therefore, a more scalable architecture is needed for manycore architectures. Software Managed Manycore (SMM) architectures emerge as a solution. They have scalable memory design in which each core has direct access to only its local scratchpad memory, and any data transfers to/from other memories must be done explicitly in the application using Direct Memory Access (DMA) commands. Lack of automatic memory management in the hardware makes such architectures extremely power-efficient, but they also become difficult to program. If the code/data of the task mapped onto a core cannot fit in the local scratchpad memory, then DMA calls must be added to bring in the code/data before it is required, and it may need to be evicted after its use. However, doing this adds a lot of complexity to the programmer's job. Now programmers must worry about data management, on top of worrying about the functional correctness of the program - which is already quite complex. This dissertation presents a comprehensive compiler and runtime integration to automatically manage the code and data of each task in the limited local memory of the core. We firstly developed a Complete Circular Stack Management. It manages stack frames between the local memory and the main memory, and addresses the stack pointer problem as well. Though it works, we found we could further optimize the management for most cases. Thus a Smart Stack Data Management (SSDM) is provided. In this work, we formulate the stack data management problem and propose a greedy algorithm for the same. Later on, we propose a general cost estimation algorithm, based on which CMSM heuristic for code mapping problem is developed. Finally, heap data is dynamic in nature and therefore it is hard to manage it. We provide two schemes to manage unlimited amount of heap data in constant sized region in the local memory. In addition to those separate schemes for different kinds of data, we also provide a memory partition methodology.Dissertation/ThesisPh.D. Computer Science 201

ASU Digital Repository

Purposes, concepts, misfits, and a redesign of git

Author: Brooks F. P.
Brooks F. P.
Church L.
Daniel Jackson
Garfinkel S.
IEEE Computer Society
O’Sullivan B.
Pilato C. M.
Press IEEE
Santiago Perez De Rosso
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/11/2016
Field of study

Git is a widely used version control system that is powerful but complicated. Its complexity may not be an inevitable consequence of its power but rather evidence of flaws in its design. To explore this hypothesis, we analyzed the design of Git using a theory that identifies concepts, purposes, and misfits. Some well-known difficulties with Git are described, and explained as misfits in which underlying concepts fail to meet their intended purpose. Based on this analysis, we designed a reworking of Git (called Gitless) that attempts to remedy these flaws. To correlate misfits with issues reported by users, we conducted a study of Stack Overflow questions. And to determine whether users experienced fewer complications using Gitless in place of Git, we conducted a small user study. Results suggest our approach can be profitable in identifying, analyzing, and fixing design problems.SUTD-MIT International Design Centre (IDC

DSpace@MIT

Crossref

ClickINC: In-network Computing as a Service in Heterogeneous Programmable Data-center Networks

Author: Chen Zhikang
Feng Yong
Liu Bin
Liu Guyue
Liu Shuxin
Song Haoyu
Tian Zerui
Wu Wenfei
Xu Wenquan
Zhang Yinchao
Zhang Zijian
Publication venue
Publication date: 21/07/2023
Field of study

In-Network Computing (INC) has found many applications for performance boosts or cost reduction. However, given heterogeneous devices, diverse applications, and multi-path network typologies, it is cumbersome and error-prone for application developers to effectively utilize the available network resources and gain predictable benefits without impeding normal network functions. Previous work is oriented to network operators more than application developers. We develop ClickINC to streamline the INC programming and deployment using a unified and automated workflow. ClickINC provides INC developers a modular programming abstractions, without concerning to the states of the devices and the network topology. We describe the ClickINC framework, model, language, workflow, and corresponding algorithms. Experiments on both an emulator and a prototype system demonstrate its feasibility and benefits

arXiv.org e-Print Archive

Protecting Systems From Exploits Using Language-Theoretic Security

Author: Anantharaman Prashant
Publication venue: Dartmouth Digital Commons
Publication date: 03/05/2022
Field of study

Any computer program processing input from the user or network must validate the input. Input-handling vulnerabilities occur in programs when the software component responsible for filtering malicious input---the parser---does not perform validation adequately. Consequently, parsers are among the most targeted components since they defend the rest of the program from malicious input. This thesis adopts the Language-Theoretic Security (LangSec) principle to understand what tools and research are needed to prevent exploits that target parsers. LangSec proposes specifying the syntactic structure of the input format as a formal grammar. We then build a recognizer for this formal grammar to validate any input before the rest of the program acts on it. To ensure that these recognizers represent the data format, programmers often rely on parser generators or parser combinators tools to build the parsers. This thesis propels several sub-fields in LangSec by proposing new techniques to find bugs in implementations, novel categorizations of vulnerabilities, and new parsing algorithms and tools to handle practical data formats. To this end, this thesis comprises five parts that tackle various tenets of LangSec. First, I categorize various input-handling vulnerabilities and exploits using two frameworks. First, I use the mismorphisms framework to reason about vulnerabilities. This framework helps us reason about the root causes leading to various vulnerabilities. Next, we built a categorization framework using various LangSec anti-patterns, such as parser differentials and insufficient input validation. Finally, we built a catalog of more than 30 popular vulnerabilities to demonstrate the categorization frameworks. Second, I built parsers for various Internet of Things and power grid network protocols and the iccMAX file format using parser combinator libraries. The parsers I built for power grid protocols were deployed and tested on power grid substation networks as an intrusion detection tool. The parser I built for the iccMAX file format led to several corrections and modifications to the iccMAX specifications and reference implementations. Third, I present SPARTA, a novel tool I built that generates Rust code that type checks Portable Data Format (PDF) files. The type checker I helped build strictly enforces the constraints in the PDF specification to find deviations. Our checker has contributed to at least four significant clarifications and corrections to the PDF 2.0 specification and various open-source PDF tools. In addition to our checker, we also built a practical tool, PDFFixer, to dynamically patch type errors in PDF files. Fourth, I present ParseSmith, a tool to build verified parsers for real-world data formats. Most parsing tools available for data formats are insufficient to handle practical formats or have not been verified for their correctness. I built a verified parsing tool in Dafny that builds on ideas from attribute grammars, data-dependent grammars, and parsing expression grammars to tackle various constructs commonly seen in network formats. I prove that our parsers run in linear time and always terminate for well-formed grammars. Finally, I provide the earliest systematic comparison of various data description languages (DDLs) and their parser generation tools. DDLs are used to describe and parse commonly used data formats, such as image formats. Next, I conducted an expert elicitation qualitative study to derive various metrics that I use to compare the DDLs. I also systematically compare these DDLs based on sample data descriptions available with the DDLs---checking for correctness and resilience

Dartmouth Digital Commons (Dartmouth College)

Understanding and Supporting Debugging Workflows in Multiverse Analysis

Author: Althoff Tim
Gu Ken
Jun Eunice
Publication venue
Publication date: 07/10/2022
Field of study

Multiverse analysis-a paradigm for statistical analysis that considers all combinations of reasonable analysis choices in parallel-promises to improve transparency and reproducibility. Although recent tools help analysts specify multiverse analyses, they remain difficult to use in practice. In this work, we conduct a formative study with four multiverse researchers, which identifies debugging as a key barrier. We find debugging is challenging because of the latency between running analyses and detecting bugs, and the scale of metadata needed to be processed to diagnose a bug. To address these challenges, we prototype a command-line interface tool, Multiverse Debugger, which helps diagnose bugs in the multiverse and propagate fixes. In a second, focused study (n=13), we use Multiverse Debugger as a probe to develop a model of debugging workflows and identify challenges, including the difficulty in understanding the composition of a multiverse. We conclude with design implications for future multiverse analysis authoring systems

arXiv.org e-Print Archive

Software reverse engineering education

Author: Cipresso Teodoro
Publication venue: SJSU ScholarWorks
Publication date: 01/01/2009
Field of study

Software Reverse Engineering (SRE) is the practice of analyzing a software system, either in whole or in part, to extract design and implementation information. A typical SRE scenario would involve a software module that has worked for years and carries several rules of a business in its lines of code. Unfortunately the source code of the application has been lost; what remains is “native ” or “binary ” code. Reverse engineering skills are also used to detect and neutralize viruses and malware as well as to protect intellectual property. It became frighteningly apparent during the Y2K crisis that reverse engineering skills were not commonly held amongst programmers. Since that time, much research has been undertaken to formalize the types of activities that fall into the category of reverse engineering so that these skills can be taught to computer programmers and testers. To help address the lack of software reverse engineering education, several peer-reviewed articles on software reverse engineering, re-engineering, reuse, maintenance, evolution, and security were gathered with the objective of developing relevant, practical exercises for instructional purposes. The research revealed that SRE is fairly well described and most of the related activities fall into one of tw

CiteSeerX

SJSU ScholarWorks

ProQuest OAI Repository

Quantifying and Predicting the Influence of Execution Platform on Software Component Performance

Author: Kuperberg Michael
Publication venue: KIT Scientific Publishing, Karlsruhe
Publication date: 01/01/2010
Field of study

The performance of software components depends on several factors, including the execution platform on which the software components run. To simplify cross-platform performance prediction in relocation and sizing scenarios, a novel approach is introduced in this thesis which separates the application performance profile from the platform performance profile. The approach is evaluated using transparent instrumentation of Java applications and with automated benchmarks for Java Virtual Machines

KITopen

Pattern-Based Vulnerability Discovery

Author: Yamaguchi Fabian
Publication venue
Publication date: 30/10/2015
Field of study

Georg-August-University Göttingen