603 research outputs found

    An Introduction to Software Ecosystems

    Full text link
    This chapter defines and presents different kinds of software ecosystems. The focus is on the development, tooling and analytics aspects of software ecosystems, i.e., communities of software developers and the interconnected software components (e.g., projects, libraries, packages, repositories, plug-ins, apps) they are developing and maintaining. The technical and social dependencies between these developers and software components form a socio-technical dependency network, and the dynamics of this network change over time. We classify and provide several examples of such ecosystems. The chapter also introduces and clarifies the relevant terms needed to understand and analyse these ecosystems, as well as the techniques and research methods that can be used to analyse different aspects of these ecosystems.Comment: Preprint of chapter "An Introduction to Software Ecosystems" by Tom Mens and Coen De Roover, published in the book "Software Ecosystems: Tooling and Analytics" (eds. T. Mens, C. De Roover, A. Cleve), 2023, ISBN 978-3-031-36059-6, reproduced with permission of Springer. The final authenticated version of the book and this chapter is available online at: https://doi.org/10.1007/978-3-031-36060-

    Automatic Specialization of Third-Party Java Dependencies

    Full text link
    Modern software systems rely on a multitude of third-party dependencies. This large-scale code reuse reduces development costs and time, and it poses new challenges with respect to maintenance and security. Techniques such as tree shaking or shading can remove dependencies that are completely unused by a project, which partly address these challenges. Yet, the remaining dependencies are likely to be used only partially, leaving room for further reduction of third-party code. In this paper, we propose a novel technique to specialize dependencies of Java projects, based on their actual usage. For each dependency, we systematically identify the subset of its functionalities that is necessary to build the project, and remove the rest. Each specialized dependency is repackaged. Then, we generate specialized dependency trees where the original dependencies are replaced by the specialized versions and we rebuild the project. We implement our technique in a tool called DepTrim, which we evaluate with 30 notable open-source Java projects. DepTrim specializes a total of 343 (86.6%) dependencies across these projects, and successfully rebuilds each project with a specialized dependency tree. Moreover, through this specialization, DepTrim removes a total of 60,962 (47.0%) classes from the dependencies, reducing the ratio of dependency classes to project classes from 8.7x in the original projects to 4.4x after specialization. These results indicate the relevance of dependency specialization to significantly reduce the share of third-party code in Java projects.Comment: 17 pages, 2 figures, 4 tables, 1 algorithm, 2 code listings, 3 equation

    Coverage-Based Debloating for Java Bytecode

    Full text link
    Software bloat is code that is packaged in an application but is actually not necessary to run the application. The presence of software bloat is an issue for security, for performance, and for maintenance. In this paper, we introduce a novel technique for debloating Java bytecode, which we call coverage-based debloating. We leverage a combination of state-of-the-art Java bytecode coverage tools to precisely capture what parts of a project and its dependencies are used at runtime. Then, we automatically remove the parts that are not covered to generate a debloated version of the compiled project. We successfully generate debloated versions of 220 open-source Java libraries, which are syntactically correct and preserve their original behavior according to the workload. Our results indicate that 68.3% of the libraries' bytecode and 20.5% of their total dependencies can be removed through coverage-based debloating. Meanwhile, we present the first experiment that assesses the utility of debloated libraries with respect to client applications that reuse them. We show that 80.9% of the clients with at least one test that uses the library successfully compile and pass their test suite when the original library is replaced by its debloated version

    Jolie Microservices: An Experiment

    Get PDF
    Os microsserviços estão cada vez mais presentes no mundo das tecnologias de informação, por providenciarem uma nova forma construir sistemas mais escaláveis, ágeis e flexíveis. Apesar disto, estes trazem consigo o problema da complexidade de comunicação entre microsserviços, fazendo com que o sistema seja difícil de manter e de se perceber. Linguagens de programação específicas a microsserviços como Jolie entram em cena para tentar resolver este problema e simplificar a construção de sistemas com arquiteturas de microsserviços. Este trabalho fornece uma visão ampla do estado da arte da linguagem de programação Jolie onde é primeiramente detalhado o porquê de surgirem linguagens específicas a microsserviços e como a linguagem Jolie está construída de maneira a coincidir com as arquiteturas de microsserviços através de recursos nativos. Para demonstrar todas as vantagens de usar esta linguagem em comparação com as abordagens mais mainstream é pensado um experimento de desenvolvimento de um sistema de microsserviços no âmbito de uma aplicação de e-commerce. Este sistema é construído de forma igual usando duas bases tecnológicas – Jolie e Spring Boot. O Spring Boot é considerado a tecnologia mais usada para desenvolver sistemas de microsserviços sendo o candidato ideal para comparação. É pensada toda a análise e design deste experimento. Em seguida, a implementação da solução é detalhada a partir das configurações do sistema, escolhas arquitetónicas e como elas são implementadas. Componentes como API gateway, mediadores de mensagens, bases de dados, orquestração de microsserviços, e conteinerização para cada microsserviço e outros componentes do sistema. Pol último as soluções são comparadas e analisadas com base na abordagem Goals, Questions, Metrics (GQM). São analisadas relativamente a atributos de qualidade como manutenção, escalabilidade, desempenho e testabilidade. Após esta análise pode-se concluir que a solução construída com Jolie apresenta diferenças na manutenção sendo significativamente superior à solução baseada em Spring Boot e apresenta diferenças em termos de performance sendo ligeiramente inferior à solução construída com Spring Boot. O trabalho termina com a indicação das conquistas, dificuldades, ameaças à validade, possíveis trabalhos futuros e observações finais.Microservices are increasingly present in the world of information technologies, as they provide a new way to build more scalable, agile, and flexible systems. Despite this, they bring with them the problem of communication complexity between microservices, making the system difficult to maintain and understand. Microservices-specific programming languages like Jolie come into play to try to solve this problem and simplify the construction of systems with microservices architectures. This work provides a broad view of the State of Art of the Jolie programming language, where it is first detailed why microservices-specific languages emerge and how the Jolie language is built to match microservices architectures through native resources. To demonstrate all the advantages of using this language compared to more mainstream approaches, an experiment is designed to develop a microservices system within an e-commerce application. This system is built equally using two technological foundations – Jolie and Spring Boot. Spring Boot is considered the most used technology to develop microservices systems and is an ideal candidate for comparison. The entire analysis and design of this experiment are thought through. Then the implementation of the solution is detailed from system configurations, architectural choices, and how they are implemented. Components such as API gateway, message brokers, databases, microservices orchestration, and containerization for each microservice and other components of the system. Finally, the solutions are compared and analyzed based on the Goals, Questions, Metrics (GQM) approach. They are analyzed for quality attributes such as maintainability, scalability, performance, and testability. After this analysis, it can be concluded that the solution built with Jolie presents differences in maintenance being significant superior to the solution based on Spring Boot, and it presents differences in terms of performance being slightly inferior to the solution built with Spring Boot. The work ends with an indication of the achievements, difficulties, threats to validity, possible future work, and final observations

    Analyzing 2.3 Million Maven Dependencies to Reveal an Essential Core in APIs

    Full text link
    This paper addresses the following question: does a small, essential, core set of API members emerges from the actual usage of the API by client applications? To investigate this question, we study the 99 most popular libraries available in Maven Central and the 865,560 client programs that declare dependencies towards them, summing up to 2.3M dependencies. Our key findings are as follows: 43.5% of the dependencies declared by the clients are not used in the bytecode; all APIs contain a large part of rarely used types and a few frequently used types, and the ratio varies according to the nature of the API, its size and its design; we can systematically extract a reuse-core from APIs that is sufficient to provide for most clients, the median size of this subset is 17% of the API that can serve 83% of the clients. This study is novel both in its scale and its findings about unused dependencies and the reuse-core of APIs. Our results provide concrete insights to improve Maven's build process with a mechanism to detect unused dependencies. They also support the need to reduce the size of APIs to facilitate API learning and maintenance.Comment: 15 pages, 13 figures, 3 tables, 2 listing

    Simplified vector-thread architectures for flexible and efficient data-parallel accelerators

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2010.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Cataloged from student submitted PDF version of thesis.Includes bibliographical references (p. 165-170).This thesis explores a new approach to building data-parallel accelerators that is based on simplifying the instruction set, microarchitecture, and programming methodology for a vector-thread architecture. The thesis begins by categorizing regular and irregular data-level parallelism (DLP), before presenting several architectural design patterns for data-parallel accelerators including the multiple-instruction multiple-data (MIMD) pattern, the vector single-instruction multiple-data (vector-SIMD) pattern, the single-instruction multiple-thread (SIMT) pattern, and the vector-thread (VT) pattern. Our recently proposed VT pattern includes many control threads that each manage their own array of microthreads. The control thread uses vector memory instructions to efficiently move data and vector fetch instructions to broadcast scalar instructions to all microthreads. These vector mechanisms are complemented by the ability for each microthread to direct its own control flow. In this thesis, I introduce various techniques for building simplified instances of the VT pattern. I propose unifying the VT control-thread and microthread scalar instruction sets to simplify the microarchitecture and programming methodology. I propose a new single-lane VT microarchitecture based on minimal changes to the vector-SIMD pattern.(cont.) Single-lane cores are simpler to implement than multi-lane cores and can achieve similar energy efficiency. This new microarchitecture uses control processor embedding to mitigate the area overhead of single-lane cores, and uses vector fragments to more efficiently handle both regular and irregular DLP as compared to previous VT architectures. I also propose an explicitly data-parallel VT programming methodology that is based on a slightly modified scalar compiler. This methodology is easier to use than assembly programming, yet simpler to implement than an automatically vectorizing compiler. To evaluate these ideas, we have begun implementing the Maven data-parallel accelerator. This thesis compares a simplified Maven VT core to MIMD, vector-SIMD, and SIMT cores. We have implemented these cores with an ASIC methodology, and I use the resulting gate-level models to evaluate the area, performance, and energy of several compiled microbenchmarks. This work is the first detailed quantitative comparison of the VT pattern to other patterns. My results suggest that future data-parallel accelerators based on simplified VT architectures should be able to combine the energy efficiency of vector-SIMD accelerators with the flexibility of MIMD accelerators.by Christopher Francis Batten.Ph.D

    Tracing Community Genealogy: How New Communities Emerge from the Old

    Full text link
    The process by which new communities emerge is a central research issue in the social sciences. While a growing body of research analyzes the formation of a single community by examining social networks between individuals, we introduce a novel community-centered perspective. We highlight the fact that the context in which a new community emerges contains numerous existing communities. We reveal the emerging process of communities by tracing their early members' previous community memberships. Our testbed is Reddit, a website that consists of tens of thousands of user-created communities. We analyze a dataset that spans over a decade and includes the posting history of users on Reddit from its inception to April 2017. We first propose a computational framework for building genealogy graphs between communities. We present the first large-scale characterization of such genealogy graphs. Surprisingly, basic graph properties, such as the number of parents and max parent weight, converge quickly despite the fact that the number of communities increases rapidly over time. Furthermore, we investigate the connection between a community's origin and its future growth. Our results show that strong parent connections are associated with future community growth, confirming the importance of existing community structures in which a new community emerges. Finally, we turn to the individual level and examine the characteristics of early members. We find that a diverse portfolio across existing communities is the most important predictor for becoming an early member in a new community.Comment: 10 pages, 7 figures, to appear in Proceedings of ICWSM 2018, data and more at https://chenhaot.com/papers/community-genealogy.htm
    corecore