539 research outputs found

    Identification of microservices from monolithic applications through topic modelling

    Get PDF
    Microservices emerged as one of the most popular architectural patterns in the recent years given the increased need to scale, grow and flexibilize software projects accompanied by the growth in cloud computing and DevOps. Many software applications are being submitted to a process of migration from its monolithic architecture to a more modular, scalable and flexible architecture of microservices. This process is slow and, depending on the project’s complexity, it may take months or even years to complete. This paper proposes a new approach on microservice identification by resorting to topic modelling in order to identify services according to domain terms. This approach in combination with clustering techniques produces a set of services based on the original software. The proposed methodology is implemented as an open-source tool for exploration of monolithic architectures and identification of microservices. A quantitative analysis using the state of the art metrics on independence of functionality and modularity of services was conducted on 200 open-source projects collected from GitHub. Cohesion at message and domain level metrics’ showed medians of roughly 0.6. Interfaces per service exhibited a median of 1.5 with a compact interquartile range. Structural and conceptual modularity revealed medians of 0.2 and 0.4 respectively. Our first results are positive demonstrating beneficial identification of services due to overall metrics’ resultsNational Funds through the Portuguese funding agency, FCT - Fundação para a Ciência e a Tecnologia within project UIDB/50014/202

    Cooperative Based Software Clustering on Dependency Graphs

    Get PDF
    The organization of software systems into subsystems is usually based on the constructs of packages or modules and has a major impact on the maintainability of the software. However, during software evolution, the organization of the system is subject to continual modification, which can cause it to drift away from the original design, often with the effect of reducing its quality. A number of techniques for evaluating a system's maintainability and for controlling the effort required to conduct maintenance activities involve software clustering. Software clustering refers to the partitioning of software system components into clusters in order to obtain both exterior and interior connectivity between these components. It helps maintainers enhance the quality of software modularization and improve its maintainability. Research in this area has produced numerous algorithms with a variety of methodologies and parameters. This thesis presents a novel ensemble approach that synthesizes a new solution from the outcomes of multiple constituent clustering algorithms. The main principle behind this approach derived from machine learning, as applied to document clustering, but it has been modified, both conceptually and empirically, for use in software clustering. The conceptual modifications include working with a variable number of clusters produced by the input algorithms and employing graph structures rather than feature vectors. The empirical modifications include experiments directed at the selection of the optimal cluster merging criteria. Case studies based on open source software systems show that establishing cooperation between leading state-of-the-art algorithms produces better clustering results compared with those achieved using only one of any of the algorithms considered

    Software module clustering: An in-depth literature analysis

    Get PDF
    Software module clustering is an unsupervised learning method used to cluster software entities (e.g., classes, modules, or files) with similar features. The obtained clusters may be used to study, analyze, and understand the software entities' structure and behavior. Implementing software module clustering with optimal results is challenging. Accordingly, researchers have addressed many aspects of software module clustering in the past decade. Thus, it is essential to present the research evidence that has been published in this area. In this study, 143 research papers from well-known literature databases that examined software module clustering were reviewed to extract useful data. The obtained data were then used to answer several research questions regarding state-of-the-art clustering approaches, applications of clustering in software engineering, clustering processes, clustering algorithms, and evaluation methods. Several research gaps and challenges in software module clustering are discussed in this paper to provide a useful reference for researchers in this field

    A CASE STUDY INVESTIGATING RULE BASED DESIGN IN AN INDUSTRIAL SETTING

    Get PDF
    This thesis presents a case study on the implementation of a rule based design (RBD) process for an engineer-to-order (ETO) company. The time taken for programming and challenges associated with this process are documented in order to understand the benefits and limitations of RBD. These times are obtained while developing RBD programs for grid assemblies of bottle packaging machines that are designed and manufactured by Hartness International (HI). In this project, commercially available computer-aided design (CAD) and RBD software are integrated to capture the design and manufacturing knowledge used to automate the grid design process of HI. The stages involved in RBD automation are identified as CAD modeling, knowledge acquisition, capturing parameters, RBD programming, debugging, and testing, and production deployment. The stages and associated times in RBD program development process are recorded for eighteen different grid products. Empirical models are developed to predict development times of RBD program, specifically enabling HI to estimate their return on investment. The models are demonstrated for an additional grid product where the predicted time is compared to actual RBD program time, falling within 20% of each other. This builds confidence in the accuracy of the models. Modeling guidelines for preparing CAD models are also presented to help in RBD program development. An important observation from this case study is that a majority of the time is spent capturing information about product during the knowledge acquisition stage, where the programmer\u27s development of a RBD program is dependent upon the designer\u27s product knowledge. Finally, refining these models to include other factors such as time for building CAD models, programmers experience with the RBD software (learning curve), and finally extending these models to other product domains are identified possible areas of future work

    SArF:依存関係に基づいてフィーチャーを集めるソフトウェアクラスタリング

    Full text link
    ソフトウェアを複数の小単位に分割するソフトウェアクラスタリング技術は,ソフトウェアシステムを理解するうえで重要な役割を果たす.本論文では,静的な依存関係情報に基づいてソフトウェアのフィーチャをクラスタに集めるソフトウェアクラスタリングアルゴリズム「SArF」を提案する.SArFの特徴はフィーチャを集めることと,自動化である.ソフトウェアクラスタリング作業の多くは,遍在モジュールの除去作業など人間の支援を必要とするが,SArFはこれら人間の作業が不要である.SArFの特徴を実現するために,依存関係に対してフィーチャを共有する確からしさを表す専念度(Dedication)スコアを定義し,スコアで重み付けられた依存関係グラフをクラスタリングするためにモジュラリティ最大化法と組み合わせた.ケーススタディでフィーチャが集められることを示し,公開リポジトリから利用度順に収集した35ソフトウェア304バージョンからなるデータセットを用いてSArFの評価を行い,クラスタリング品質,安定度,実行時間の面でSArFが既存のアルゴリズムに対し優れることを示した.小林, 健一, 松尾, 昭彦, 松下, 誠, 他. SArF:依存関係に基づいてフィーチャーを集めるソフトウェアクラスタリング. 情報処理学会論文誌 64, 831 (2023); http://doi.org/10.20729/00225492

    Interactive Exploration of Multitask Dependency Networks

    Get PDF
    Scientists increasingly depend on machine learning algorithms to discover patterns in complex data. Two examples addressed in this dissertation are identifying how information sharing among regions of the brain develops due to learning; and, learning dependency networks of blood proteins associated with cancer. Dependency networks, or graphical models, are learned from the observed data in order to make comparisons between the sub-populations of the dataset. Rarely is there sufficient data to infer robust individual networks for each sub-population. The multiple networks must be considered simultaneously; exploding the hypothesis space of the learning problem. Exploring this complex solution space requires input from the domain scientist to refine the objective function. This dissertation introduces a framework to incorporate domain knowledge in transfer learning to facilitate the exploration of solutions. The framework is a generalization of existing algorithms for multiple network structure identification. Solutions produced with human input narrow down the variance of solutions to those that answer questions of interest to domain scientists. Patterns, such as identifying differences between networks, are learned with higher confidence using transfer learning than through the standard method of bootstrapping. Transfer learning may be the ideal method for making comparisons among dependency networks, whether looking for similarities or differences. Domain knowledge input and visualization of solutions are combined in an interactive tool that enables domain scientists to explore the space of solutions efficiently

    A Smart Products Lifecycle Management (sPLM) Framework - Modeling for Conceptualization, Interoperability, and Modularity

    Get PDF
    Autonomy and intelligence have been built into many of today’s mechatronic products, taking advantage of low-cost sensors and advanced data analytics technologies. Design of product intelligence (enabled by analytics capabilities) is no longer a trivial or additional option for the product development. The objective of this research is aimed at addressing the challenges raised by the new data-driven design paradigm for smart products development, in which the product itself and the smartness require to be carefully co-constructed. A smart product can be seen as specific compositions and configurations of its physical components to form the body, its analytics models to implement the intelligence, evolving along its lifecycle stages. Based on this view, the contribution of this research is to expand the “Product Lifecycle Management (PLM)” concept traditionally for physical products to data-based products. As a result, a Smart Products Lifecycle Management (sPLM) framework is conceptualized based on a high-dimensional Smart Product Hypercube (sPH) representation and decomposition. First, the sPLM addresses the interoperability issues by developing a Smart Component data model to uniformly represent and compose physical component models created by engineers and analytics models created by data scientists. Second, the sPLM implements an NPD3 process model that incorporates formal data analytics process into the new product development (NPD) process model, in order to support the transdisciplinary information flows and team interactions between engineers and data scientists. Third, the sPLM addresses the issues related to product definition, modular design, product configuration, and lifecycle management of analytics models, by adapting the theoretical frameworks and methods for traditional product design and development. An sPLM proof-of-concept platform had been implemented for validation of the concepts and methodologies developed throughout the research work. The sPLM platform provides a shared data repository to manage the product-, process-, and configuration-related knowledge for smart products development. It also provides a collaborative environment to facilitate transdisciplinary collaboration between product engineers and data scientists
    corecore