11 research outputs found

    Reduce API Debugging Overhead via Knowledge Prepositioning

    Full text link
    OpenAPI indicates a behavior where producers offer Application Programming Interfaces (APIs) to help end-users access their data, resources, and services. Generally, API has many parameters that need to be entered. However, it is challenging for users to understand and document these parameters correctly. This paper develops an API workbench to help users learn and debug APIs. Based on this workbench, much exploratory work has been proposed to reduce the overhead of learning and debugging APIs. We explore the knowledge, such as parameter characteristics (e.g., enumerability) and constraints (e.g., maximum/minimum value), from the massive API call logs to narrow the range of parameter values. Then, we propose a fine-grained approach to enrich the API documentation by extracting dependency knowledge between APIs. Finally, we present a learning-based prediction method to predict API execution results before the API is called, significantly reducing user debugging cycles. The experiments evaluated on the online system show that this work's approach substantially improves the user experience of debugging OpenAPIs.Comment: arXiv admin note: text overlap with arXiv:1509.01626, arXiv:1502.01710 by other author

    Coverage-Based Debloating for Java Bytecode

    Full text link
    Software bloat is code that is packaged in an application but is actually not necessary to run the application. The presence of software bloat is an issue for security, for performance, and for maintenance. In this paper, we introduce a novel technique for debloating Java bytecode, which we call coverage-based debloating. We leverage a combination of state-of-the-art Java bytecode coverage tools to precisely capture what parts of a project and its dependencies are used at runtime. Then, we automatically remove the parts that are not covered to generate a debloated version of the compiled project. We successfully generate debloated versions of 220 open-source Java libraries, which are syntactically correct and preserve their original behavior according to the workload. Our results indicate that 68.3% of the libraries' bytecode and 20.5% of their total dependencies can be removed through coverage-based debloating. Meanwhile, we present the first experiment that assesses the utility of debloated libraries with respect to client applications that reuse them. We show that 80.9% of the clients with at least one test that uses the library successfully compile and pass their test suite when the original library is replaced by its debloated version

    Active Learning of Discriminative Subgraph Patterns for API Misuse Detection

    Full text link
    A common cause of bugs and vulnerabilities are the violations of usage constraints associated with Application Programming Interfaces (APIs). API misuses are common in software projects, and while there have been techniques proposed to detect such misuses, studies have shown that they fail to reliably detect misuses while reporting many false positives. One limitation of prior work is the inability to reliably identify correct patterns of usage. Many approaches confuse a usage pattern's frequency for correctness. Due to the variety of alternative usage patterns that may be uncommon but correct, anomaly detection-based techniques have limited success in identifying misuses. We address these challenges and propose ALP (Actively Learned Patterns), reformulating API misuse detection as a classification problem. After representing programs as graphs, ALP mines discriminative subgraphs. While still incorporating frequency information, through limited human supervision, we reduce the reliance on the assumption relating frequency and correctness. The principles of active learning are incorporated to shift human attention away from the most frequent patterns. Instead, ALP samples informative and representative examples while minimizing labeling effort. In our empirical evaluation, ALP substantially outperforms prior approaches on both MUBench, an API Misuse benchmark, and a new dataset that we constructed from real-world software projects

    An Empirical Exploration of Python Machine Learning API Usage

    Get PDF
    Machine learning is becoming an increasingly important part of many domains, both inside and outside of computer science. With this has come an increase in developers learning to write machine learning applications in languages like Python, using application programming interfaces (APIs) such as pandas and scikit-learn. However, given the complexity of these APIs, they can be challenging to learn, especially for new programmers. To create better tools for assisting developers with machine learning APIs, we need to understand how these APIs are currently used. In this thesis, we present a study of machine learning API usage in Python code in a corpus of machine learning projects hosted on Kaggle, a machine learning education and competition community site. We analyzed the most frequently used machine learning related libraries and the sub-modules of those libraries. Next, we studied the usage of different calls used by the developers to solve machine learning tasks. We also found information about which libraries are used in combination and discovered a number of cases where the libraries were imported but never used. We end by discussing potential next steps for further research and developments based on our work results

    API Knowledge Guided Test Generation for Machine Learning Libraries

    Get PDF
    This thesis proposes MUTester to generate test cases for APIs of machine learning libraries by leveraging the API constraints mined from the corresponding API documentation and the API usage patterns mined from code fragments in Stack Overflow (SO). First, we propose a set of 18 linguistic rules for mining API constraints from the API documents. Then, we use the frequent itemset mining technique to mine the API usage patterns from a large corpus of machine learning API related code fragments collected from SO. Finally, we use the above two types of API knowledge to guide the test generation of existing test generators, for machine learning libraries. To evaluate the performance of MUTester, we first collected 2,889 APIs from five widely used machine learning libraries (i.e., Scikit-learn, Pandas, Numpy, Scipy, and PyTorch),then for each API, we further extract their API knowledge, i.e., API constraints and API usage patterns. Given an API, MUTester combines its API knowledge with existing test generators (e.g., search-based test generator PyEvosuite and random test generator PyRandoop) to generate test cases to test the API. Results of our experiment show that MUTester can significantly improve the corresponding test generation methods. And the improvement in code coverage ranges from 18.0% to 41.9% on average.In addition, it also reduced 21% of invalid tests generated by the existing test generators

    API Knowledge Guided Test Generation for Machine Learning Libraries

    Get PDF
    This thesis proposes MUTester to generate test cases for APIs of machine learning libraries by leveraging the API constraints mined from the corresponding API documentation and the API usage patterns mined from code fragments in Stack Overflow (SO). First, we propose a set of 18 linguistic rules for mining API constraints from the API documents. Then, we use the frequent itemset mining technique to mine the API usage patterns from a large corpus of machine learning API related code fragments collected from SO. Finally, we use the above two types of API knowledge to guide the test generation of existing test generators, for machine learning libraries. To evaluate the performance of MUTester, we first collected 2,889 APIs from five widely used machine learning libraries (i.e., Scikit-learn, Pandas, Numpy, Scipy, and PyTorch),then for each API, we further extract their API knowledge, i.e., API constraints and API usage patterns. Given an API, MUTester combines its API knowledge with existing test generators (e.g., search-based test generator PyEvosuite and random test generator PyRandoop) to generate test cases to test the API. Results of our experiment show that MUTester can significantly improve the corresponding test generation methods. And the improvement in code coverage ranges from 18.0% to 41.9% on average.In addition, it also reduced 21% of invalid tests generated by the existing test generators

    Studying and Leveraging API Usage Patterns

    Get PDF
    Software projects make use of libraries extensively. Libraries have intended API surfaces—sets of exposed library interfaces that library developers expect clients to use. However, in practice, clients only use small fractions of intended API surfaces of libraries. Clients also use libraries in unexpected ways sometimes. Understanding usage patterns of library APIs by clients is beneficial to both client and library developers—targeting issues such as version upgrades, breaking changes and software bloating. We have implemented a tool to study both static and dynamic interactions between clients, the libraries they use, and those libraries’ direct dependencies. We use this tool to carry out a detailed study of API usage patterns on 90 clients and 11 libraries. We present a classification framework for developers to classify API uses. We then describe two additional developer-focussed applications of the data that our tool produces: a secondary visualization tool VizAPI, as well as the concept of library fission. Conceivably, VizAPI can allow client and library developers to answer the following queries about the interaction of their code and the libraries they depend on: Will my client code be affected by breaking changes in library APIs? Which APIs in my library’s source code are commonly used by clients? The concept of library fission, by which we mean the splitting of libraries into sub-modules, is based on the usage patterns that we observe. This can potentially help library developers release backward compatible versions of their libraries. It could also help client developers isolate breaking changes and reduce the likelihood of vulnerabilities and version conflicts that may be introduced through direct or transitive dependencies

    IoT integrated antenna rotator

    Get PDF
    This project deals with the design and fabrication of an IoT Integrated Antenna Rotator to upgrade an existing antenna rotator by integrating it with Internet of Things (IoT) technology. The objective of this thesis is to integrate Google Maps into web server with the antenna rotator system in order to achieve a speedy and high accuracy standalone antenna pointing system. This thesis describes the software development of the rotator system which includes the specification, system and software design, implementation and unit testing, integration and system testing, and finally the operation. NodeMCU ESP32 is implemented as the microcontroller in this SDP to control the system by using its central processor. A web server is developed in order to act as a user interface to interact with the antenna rotator system. Digital map is integrated with the system to attain IoT technology by using Google Maps API. The IoT Integrated Antenna Rotator is tested upon the integration and fabrication process. The antenna rotator is able to rotate to a desire location precisely through web server by using Google Maps API. The outcomes can result in significant cost and time savings, as well as increase the product reliability and user confidence in using IoT Integrated antenna pointing system

    Dependency Management 2.0 – A Semantic Web Enabled Approach

    Get PDF
    Software development and evolution are highly distributed processes that involve a multitude of supporting tools and resources. Application programming interfaces are commonly used by software developers to reduce development cost and complexity by reusing code developed by third-parties or published by the open source community. However, these application programming interfaces have also introduced new challenges to the Software Engineering community (e.g., software vulnerabilities, API incompatibilities, and software license violations) that not only extend beyond the traditional boundaries of individual projects but also involve different software artifacts. As a result, there is the need for a technology-independent representation of software dependency semantics and the ability to seamlessly integrate this representation with knowledge from other software artifacts. The Semantic Web and its supporting technology stack have been widely promoted to model, integrate, and support interoperability among heterogeneous data sources. This dissertation takes advantage of the Semantic Web and its enabling technology stack for knowledge modeling and integration. The thesis introduces five major contributions: (1) We present a formal Software Build System Ontology – SBSON, which captures concepts and properties for software build and dependency management systems. This formal knowledge representation allows us to take advantage of Semantic Web inference services forming the basis for a more flexibility API dependency analysis compared to traditional proprietary analysis approaches. (2) We conducted a user survey which involved 53 open source developers to allow us to gain insights on how actual developers manage API breaking changes. (3) We introduced a novel approach which integrates our SBSON model with knowledge about source code usage and changes within the Maven ecosystem to support API consumers and producers in managing (assessing and minimizing) the impacts of breaking changes. (4) A Security Vulnerability Analysis Framework (SV-AF) is introduced, which integrates builds system, source code, versioning system, and vulnerability ontologies to trace and assess the impact of security vulnerabilities across project boundaries. (5) Finally, we introduce an Ontological Trustworthiness Assessment Model (OntTAM). OntTAM is an integration of our build, source code, vulnerability and license ontologies which supports a holistic analysis and assessment of quality attributes related to the trustworthiness of libraries and APIs in open source systems. Several case studies are presented to illustrate the applicability and flexibility of our modelling approach, demonstrating that our knowledge modeling approach can seamlessly integrate and reuse knowledge extracted from existing build and dependency management systems with other existing heterogeneous data sources found in the software engineering domain. As part of our case studies, we also demonstrate how this unified knowledge model can enable new types of project dependency analysis
    corecore