2,349 research outputs found
An Empirical Analysis of Vulnerabilities in Python Packages for Web Applications
This paper examines software vulnerabilities in common Python packages used
particularly for web development. The empirical dataset is based on the PyPI
package repository and the so-called Safety DB used to track vulnerabilities in
selected packages within the repository. The methodological approach builds on
a release-based time series analysis of the conditional probabilities for the
releases of the packages to be vulnerable. According to the results, many of
the Python vulnerabilities observed seem to be only modestly severe; input
validation and cross-site scripting have been the most typical vulnerabilities.
In terms of the time series analysis based on the release histories, only the
recent past is observed to be relevant for statistical predictions; the
classical Markov property holds.Comment: Forthcoming in: Proceedings of the 9th International Workshop on
Empirical Software Engineering in Practice (IWESEP 2018), Nara, IEE
A Closer Look at the Security Risks in the Rust Ecosystem
Rust is an emerging programming language designed for the development of
systems software. To facilitate the reuse of Rust code, crates.io, as a central
package registry of the Rust ecosystem, hosts thousands of third-party Rust
packages. The openness of crates.io enables the growth of the Rust ecosystem
but comes with security risks by severe security advisories. Although Rust
guarantees a software program to be safe via programming language features and
strict compile-time checking, the unsafe keyword in Rust allows developers to
bypass compiler safety checks for certain regions of code. Prior studies
empirically investigate the memory safety and concurrency bugs in the Rust
ecosystem, as well as the usage of unsafe keywords in practice. Nonetheless,
the literature lacks a systematic investigation of the security risks in the
Rust ecosystem.
In this paper, we perform a comprehensive investigation into the security
risks present in the Rust ecosystem, asking ``what are the characteristics of
the vulnerabilities, what are the characteristics of the vulnerable packages,
and how are the vulnerabilities fixed in practice?''. To facilitate the study,
we first compile a dataset of 433 vulnerabilities, 300 vulnerable code
repositories, and 218 vulnerability fix commits in the Rust ecosystem, spanning
over 7 years. With the dataset, we characterize the types, life spans, and
evolution of the disclosed vulnerabilities. We then characterize the
popularity, categorization, and vulnerability density of the vulnerable Rust
packages, as well as their versions and code regions affected by the disclosed
vulnerabilities. Finally, we characterize the complexity of vulnerability fixes
and localities of corresponding code changes, and inspect how practitioners fix
vulnerabilities in Rust packages with various localities.Comment: preprint of accepted TOSEM pape
Exploring Security Commits in Python
Python has become the most popular programming language as it is friendly to
work with for beginners. However, a recent study has found that most security
issues in Python have not been indexed by CVE and may only be fixed by 'silent'
security commits, which pose a threat to software security and hinder the
security fixes to downstream software. It is critical to identify the hidden
security commits; however, the existing datasets and methods are insufficient
for security commit detection in Python, due to the limited data variety,
non-comprehensive code semantics, and uninterpretable learned features. In this
paper, we construct the first security commit dataset in Python, namely
PySecDB, which consists of three subsets including a base dataset, a pilot
dataset, and an augmented dataset. The base dataset contains the security
commits associated with CVE records provided by MITRE. To increase the variety
of security commits, we build the pilot dataset from GitHub by filtering
keywords within the commit messages. Since not all commits provide commit
messages, we further construct the augmented dataset by understanding the
semantics of code changes. To build the augmented dataset, we propose a new
graph representation named CommitCPG and a multi-attributed graph learning
model named SCOPY to identify the security commit candidates through both
sequential and structural code semantics. The evaluation shows our proposed
algorithms can improve the data collection efficiency by up to 40 percentage
points. After manual verification by three security experts, PySecDB consists
of 1,258 security commits and 2,791 non-security commits. Furthermore, we
conduct an extensive case study on PySecDB and discover four common security
fix patterns that cover over 85% of security commits in Python, providing
insight into secure software maintenance, vulnerability detection, and
automated program repair.Comment: Accepted to 2023 IEEE International Conference on Software
Maintenance and Evolution (ICSME
An Analysis of Voter Fraud and Proposed Methodology for Recording Numerical Anomalies in Polling Data
Debate surrounding the security and accessibility of voting procedures in the United States has caused the matter of electoral integrity to become a point of contention. Confusion surrounding the Federal government\u27s response to allegations of fraud has decreased the population’s trust in the government and increased voter apathy. The context for an empirical methodology is established with a history of e-voting, voter security, and voter-fraud sensationalism. This research methodology is then described and implemented with a software application that analyzes the proximity of voters to polling locations to study the way people vote and how fraud may be committed at polling stations. The development cycle, design, and use cases of this software are described, and a variety of improvements are suggested as motivation for future application
Malicious Source Code Detection Using Transformer
Open source code is considered a common practice in modern software
development. However, reusing other code allows bad actors to access a wide
developers' community, hence the products that rely on it. Those attacks are
categorized as supply chain attacks. Recent years saw a growing number of
supply chain attacks that leverage open source during software development,
relaying the download and installation procedures, whether automatic or manual.
Over the years, many approaches have been invented for detecting vulnerable
packages. However, it is uncommon to detect malicious code within packages.
Those detection approaches can be broadly categorized as analyzes that use
(dynamic) and do not use (static) code execution. Here, we introduce Malicious
Source code Detection using Transformers (MSDT) algorithm. MSDT is a novel
static analysis based on a deep learning method that detects real-world code
injection cases to source code packages. In this study, we used MSDT and a
dataset with over 600,000 different functions to embed various functions and
applied a clustering algorithm to the resulting vectors, detecting the
malicious functions by detecting the outliers. We evaluated MSDT's performance
by conducting extensive experiments and demonstrated that our algorithm is
capable of detecting functions that were injected with malicious code with
precision@k values of up to 0.909
- …