11 research outputs found
Reduce API Debugging Overhead via Knowledge Prepositioning
OpenAPI indicates a behavior where producers offer Application Programming
Interfaces (APIs) to help end-users access their data, resources, and services.
Generally, API has many parameters that need to be entered. However, it is
challenging for users to understand and document these parameters correctly.
This paper develops an API workbench to help users learn and debug APIs. Based
on this workbench, much exploratory work has been proposed to reduce the
overhead of learning and debugging APIs. We explore the knowledge, such as
parameter characteristics (e.g., enumerability) and constraints (e.g.,
maximum/minimum value), from the massive API call logs to narrow the range of
parameter values. Then, we propose a fine-grained approach to enrich the API
documentation by extracting dependency knowledge between APIs. Finally, we
present a learning-based prediction method to predict API execution results
before the API is called, significantly reducing user debugging cycles. The
experiments evaluated on the online system show that this work's approach
substantially improves the user experience of debugging OpenAPIs.Comment: arXiv admin note: text overlap with arXiv:1509.01626,
arXiv:1502.01710 by other author
Coverage-Based Debloating for Java Bytecode
Software bloat is code that is packaged in an application but is actually not
necessary to run the application. The presence of software bloat is an issue
for security, for performance, and for maintenance. In this paper, we introduce
a novel technique for debloating Java bytecode, which we call coverage-based
debloating. We leverage a combination of state-of-the-art Java bytecode
coverage tools to precisely capture what parts of a project and its
dependencies are used at runtime. Then, we automatically remove the parts that
are not covered to generate a debloated version of the compiled project. We
successfully generate debloated versions of 220 open-source Java libraries,
which are syntactically correct and preserve their original behavior according
to the workload. Our results indicate that 68.3% of the libraries' bytecode and
20.5% of their total dependencies can be removed through coverage-based
debloating. Meanwhile, we present the first experiment that assesses the
utility of debloated libraries with respect to client applications that reuse
them. We show that 80.9% of the clients with at least one test that uses the
library successfully compile and pass their test suite when the original
library is replaced by its debloated version
Active Learning of Discriminative Subgraph Patterns for API Misuse Detection
A common cause of bugs and vulnerabilities are the violations of usage
constraints associated with Application Programming Interfaces (APIs). API
misuses are common in software projects, and while there have been techniques
proposed to detect such misuses, studies have shown that they fail to reliably
detect misuses while reporting many false positives. One limitation of prior
work is the inability to reliably identify correct patterns of usage. Many
approaches confuse a usage pattern's frequency for correctness. Due to the
variety of alternative usage patterns that may be uncommon but correct, anomaly
detection-based techniques have limited success in identifying misuses. We
address these challenges and propose ALP (Actively Learned Patterns),
reformulating API misuse detection as a classification problem. After
representing programs as graphs, ALP mines discriminative subgraphs. While
still incorporating frequency information, through limited human supervision,
we reduce the reliance on the assumption relating frequency and correctness.
The principles of active learning are incorporated to shift human attention
away from the most frequent patterns. Instead, ALP samples informative and
representative examples while minimizing labeling effort. In our empirical
evaluation, ALP substantially outperforms prior approaches on both MUBench, an
API Misuse benchmark, and a new dataset that we constructed from real-world
software projects
An Empirical Exploration of Python Machine Learning API Usage
Machine learning is becoming an increasingly important part of many domains, both inside and outside of computer science. With this has come an increase in developers learning to write machine learning applications in languages like Python, using application programming interfaces (APIs) such as pandas and scikit-learn. However, given the complexity of these APIs, they can be challenging to learn, especially for new programmers. To create better tools for assisting developers with machine learning APIs, we need to understand how these APIs are currently used. In this thesis, we present a study of machine learning API usage in Python code in a corpus of machine learning projects hosted on Kaggle, a machine learning education and competition community site. We analyzed the most frequently used machine learning related libraries and the sub-modules of those libraries. Next, we studied the usage of different calls used by the developers to solve machine learning tasks. We also found information about which libraries are used in combination and discovered a number of cases where the libraries were imported but never used. We end by discussing potential next steps for further research and developments based on our work results
API Knowledge Guided Test Generation for Machine Learning Libraries
This thesis proposes MUTester to generate test cases for APIs of machine learning libraries by leveraging the API constraints mined from the corresponding API documentation and the API usage patterns mined from code fragments in Stack Overflow (SO). First, we propose a set of 18 linguistic rules for mining API constraints from the API documents. Then, we use the frequent itemset mining technique to mine the API usage patterns from a large corpus of machine learning API related code fragments collected from SO. Finally, we use the above two types of API knowledge to guide the test generation of existing test generators, for machine learning libraries.
To evaluate the performance of MUTester, we first collected 2,889 APIs from five widely used machine learning libraries (i.e., Scikit-learn, Pandas, Numpy, Scipy, and PyTorch),then for each API, we further extract their API knowledge, i.e., API constraints and API usage patterns. Given an API, MUTester combines its API knowledge with existing test generators (e.g., search-based test generator PyEvosuite and random test generator PyRandoop) to generate test cases to test the API. Results of our experiment show that MUTester can significantly improve the corresponding test generation methods. And the improvement in code coverage ranges from 18.0% to 41.9% on average.In addition, it also reduced 21% of invalid tests generated by the existing test generators
API Knowledge Guided Test Generation for Machine Learning Libraries
This thesis proposes MUTester to generate test cases for APIs of machine learning libraries by leveraging the API constraints mined from the corresponding API documentation and the API usage patterns mined from code fragments in Stack Overflow (SO). First, we propose a set of 18 linguistic rules for mining API constraints from the API documents. Then, we use the frequent itemset mining technique to mine the API usage patterns from a large corpus of machine learning API related code fragments collected from SO. Finally, we use the above two types of API knowledge to guide the test generation of existing test generators, for machine learning libraries.
To evaluate the performance of MUTester, we first collected 2,889 APIs from five widely used machine learning libraries (i.e., Scikit-learn, Pandas, Numpy, Scipy, and PyTorch),then for each API, we further extract their API knowledge, i.e., API constraints and API usage patterns. Given an API, MUTester combines its API knowledge with existing test generators (e.g., search-based test generator PyEvosuite and random test generator PyRandoop) to generate test cases to test the API. Results of our experiment show that MUTester can significantly improve the corresponding test generation methods. And the improvement in code coverage ranges from 18.0% to 41.9% on average.In addition, it also reduced 21% of invalid tests generated by the existing test generators
Studying and Leveraging API Usage Patterns
Software projects make use of libraries extensively. Libraries have intended API surfaces—sets of exposed library interfaces that library developers expect clients to use. However, in practice, clients only use small fractions of intended API surfaces of libraries. Clients also use libraries in unexpected ways sometimes. Understanding usage patterns of library APIs by clients is beneficial to both client and library developers—targeting issues such as version upgrades, breaking changes and software bloating. We have implemented a tool to study both static and dynamic interactions between clients, the libraries they use, and those libraries’ direct dependencies. We use this tool to carry out a detailed study of API usage patterns on 90 clients and 11 libraries. We present a classification framework for developers to classify API uses. We then describe two additional developer-focussed applications of the data that our tool produces: a secondary visualization tool VizAPI, as well as the concept of library fission. Conceivably, VizAPI can allow client and library developers to answer the following queries about the interaction of their code and the libraries they depend on: Will my client code be affected by breaking changes in library APIs? Which APIs in my library’s source code are commonly used by clients? The concept of library fission, by which we mean the splitting of libraries into sub-modules, is based on the usage patterns that we observe. This can potentially help library developers release backward compatible versions of their libraries. It could also help client developers isolate breaking changes and reduce the likelihood of vulnerabilities and version conflicts that may be introduced through direct or transitive dependencies
IoT integrated antenna rotator
This project deals with the design and fabrication of an IoT Integrated Antenna Rotator to upgrade an existing antenna rotator by integrating it with Internet of Things (IoT) technology. The objective of this thesis is to integrate Google Maps into web server with the antenna rotator system in order to achieve a speedy and high accuracy standalone antenna pointing system. This thesis describes the software development of the rotator system which includes the specification, system and software design, implementation and unit testing, integration and system testing, and finally the operation. NodeMCU ESP32 is implemented as the microcontroller in this SDP to control the system by using its central processor. A web server is developed in order to act as a user interface to interact with the antenna rotator system. Digital map is integrated with the system to attain IoT technology by using Google Maps API. The IoT Integrated Antenna Rotator is tested upon the integration and fabrication process. The antenna rotator is able to rotate to a desire location precisely through web server by using Google Maps API. The outcomes can result in significant cost and time savings, as well as increase the product reliability and user confidence in using IoT Integrated antenna pointing system
Dependency Management 2.0 – A Semantic Web Enabled Approach
Software development and evolution are highly distributed processes that involve a multitude of supporting tools and resources. Application programming interfaces are commonly used by software developers to reduce development cost and complexity by reusing code developed by third-parties or published by the open source community. However, these application programming interfaces have also introduced new challenges to the Software Engineering community (e.g., software vulnerabilities, API incompatibilities, and software license violations) that not only extend beyond the traditional boundaries of individual projects but also involve different software artifacts. As a result, there is the need for a technology-independent representation of software dependency semantics and the ability to seamlessly integrate this representation with knowledge from other software artifacts.
The Semantic Web and its supporting technology stack have been widely promoted to model, integrate, and support interoperability among heterogeneous data sources. This dissertation takes advantage of the Semantic Web and its enabling technology stack for knowledge modeling and integration. The thesis introduces five major contributions: (1) We present a formal Software Build System Ontology – SBSON, which captures concepts and properties for software build and dependency management systems. This formal knowledge representation allows us to take advantage of Semantic Web inference services forming the basis for a more flexibility API dependency analysis compared to traditional proprietary analysis approaches. (2) We conducted a user survey which involved 53 open source developers to allow us to gain insights on how actual developers manage API breaking changes. (3) We introduced a novel approach which integrates our SBSON model with knowledge about source code usage and changes within the Maven ecosystem to support API consumers and producers in managing (assessing and minimizing) the impacts of breaking changes. (4) A Security Vulnerability Analysis Framework (SV-AF) is introduced, which integrates builds system, source code, versioning system, and vulnerability ontologies to trace and assess the impact of security vulnerabilities across project boundaries. (5) Finally, we introduce an Ontological Trustworthiness Assessment Model (OntTAM). OntTAM is an integration of our build, source code, vulnerability and license ontologies which supports a holistic analysis and assessment of quality attributes related to the trustworthiness of libraries and APIs in open source systems.
Several case studies are presented to illustrate the applicability and flexibility of our modelling approach, demonstrating that our knowledge modeling approach can seamlessly integrate and reuse knowledge extracted from existing build and dependency management systems with other existing heterogeneous data sources found in the software engineering domain. As part of our case studies, we also demonstrate how this unified knowledge model can enable new types of project dependency analysis