5,777 research outputs found

    Towards Collaborative Scientific Workflow Management System

    Get PDF
    The big data explosion phenomenon has impacted several domains, starting from research areas to divergent of business models in recent years. As this intensive amount of data opens up the possibilities of several interesting knowledge discoveries, over the past few years divergent of research domains have undergone the shift of trend towards analyzing those massive amount data. Scientific Workflow Management System (SWfMS) has gained much popularity in recent years in accelerating those data-intensive analyses, visualization, and discoveries of important information. Data-intensive tasks are often significantly time-consuming and complex in nature and hence SWfMSs are designed to efficiently support the specification, modification, execution, failure handling, and monitoring of the tasks in a scientific workflow. As far as the complexity, dimension, and volume of data are concerned, their effective analysis or management often become challenging for an individual and requires collaboration of multiple scientists instead. Hence, the notion of 'Collaborative SWfMS' was coined - which gained significant interest among researchers in recent years as none of the existing SWfMSs directly support real-time collaboration among scientists. In terms of collaborative SWfMSs, consistency management in the face of conflicting concurrent operations of the collaborators is a major challenge for its highly interconnected document structure among the computational modules - where any minor change in a part of the workflow can highly impact the other part of the collaborative workflow for the datalink relation among them. In addition to the consistency management, studies show several other challenges that need to be addressed towards a successful design of collaborative SWfMSs, such as sub-workflow composition and execution by different sub-groups, relationship between scientific workflows and collaboration models, sub-workflow monitoring, seamless integration and access control of the workflow components among collaborators and so on. In this thesis, we propose a locking scheme to facilitate consistency management in collaborative SWfMSs. The proposed method works by locking workflow components at a granular attribute level in addition to supporting locks on a targeted part of the collaborative workflow. We conducted several experiments to analyze the performance of the proposed method in comparison to related existing methods. Our studies show that the proposed method can reduce the average waiting time of a collaborator by up to 36% while increasing the average workflow update rate by up to 15% in comparison to existing descendent modular level locking techniques for collaborative SWfMSs. We also propose a role-based access control technique for the management of collaborative SWfMSs. We leverage the Collaborative Interactive Application Methodology (CIAM) for the investigation of role-based access control in the context of collaborative SWfMSs. We present our proposed method with a use-case of Plant Phenotyping and Genotyping research domain. Recent study shows that the collaborative SWfMSs often different sets of opportunities and challenges. From our investigations on existing research works towards collaborative SWfMSs and findings of our prior two studies, we propose an architecture of collaborative SWfMSs. We propose - SciWorCS - a Collaborative Scientific Workflow Management System as a proof of concept of the proposed architecture; which is the first of its kind to the best of our knowledge. We present several real-world use-cases of scientific workflows using SciWorCS. Finally, we conduct several user studies using SciWorCS comprising different real-world scientific workflows (i.e., from myExperiment) to understand the user behavior and styles of work in the context of collaborative SWfMSs. In addition to evaluating SciWorCS, the user studies reveal several interesting facts which can significantly contribute in the research domain, as none of the existing methods considered such empirical studies, and rather relied only on computer generated simulated studies for evaluation

    Analyzing Clone Evolution for Identifying the Important Clones for Management

    Get PDF
    Code clones (identical or similar code fragments in a code-base) have dual but contradictory impacts (i.e., both positive and negative impacts) on the evolution and maintenance of a software system. Because of the negative impacts (such as high change-proneness, bug-proneness, and unintentional inconsistencies), software researchers consider code clones to be the number one bad-smell in a code-base. Existing studies on clone management suggest managing code clones through refactoring and tracking. However, a software system's code-base may contain a huge number of code clones, and it is impractical to consider all these clones for refactoring or tracking. In these circumstances, it is essential to identify code clones that can be considered particularly important for refactoring and tracking. However, no existing study has investigated this matter. We conduct our research emphasizing this matter, and perform five studies on identifying important clones by analyzing clone evolution history. In our first study we detect evolutionary coupling of code clones by automatically investigating clone evolution history from thousands of commits of software systems downloaded from on-line SVN repositories. By analyzing evolutionary coupling of code clones we identify a particular clone change pattern, Similarity Preserving Change Pattern (SPCP), such that code clones that evolve following this pattern should be considered important for refactoring. We call these important clones the SPCP clones. We rank SPCP clones considering their strength of evolutionary coupling. In our second study we further analyze evolutionary coupling of code clones with an aim to assist clone tracking. The purpose of clone tracking is to identify the co-change (i.e. changing together) candidates of code clones to ensure consistency of changes in the code-base. Our research in the second study identifies and ranks the important co-change candidates by analyzing their evolutionary coupling. In our third study we perform a deeper analysis on the SPCP clones and identify their cross-boundary evolutionary couplings. On the basis of such couplings we separate the SPCP clones into two disjoint subsets. While one subset contains the non-cross-boundary SPCP clones which can be considered important for refactoring, the other subset contains the cross-boundary SPCP clones which should be considered important for tracking. In our fourth study we analyze the bug-proneness of different types of SPCP clones in order to identify which type(s) of code clones have high tendencies of experiencing bug-fixes. Such clone-types can be given high priorities for management (refactoring or tracking). In our last study we analyze and compare the late propagation tendencies of different types of code clones. Late propagation is commonly regarded as a harmful clone evolution pattern. Findings from our last study can help us prioritize clone-types for management on the basis of their tendencies of experiencing late propagations. We also find that late propagation can be considerably minimized by managing the SPCP clones. On the basis of our studies we develop an automatic system called AMIC (Automatic Mining of Important Clones) that identifies the important clones for management (refactoring and tracking) and ranks these clones considering their evolutionary coupling, bug-proneness, and late propagation tendencies. We believe that our research findings have the potential to assist clone management by pin-pointing the important clones to be managed, and thus, considerably minimizing clone management effort

    Animating the evolution of software

    Get PDF
    The use and development of open source software has increased significantly in the last decade. The high frequency of changes and releases across a distributed environment requires good project management tools in order to control the process adequately. However, even with these tools in place, the nature of the development and the fact that developers will often work on many other projects simultaneously, means that the developers are unlikely to have a clear picture of the current state of the project at any time. Furthermore, the poor documentation associated with many projects has a detrimental effect when encouraging new developers to contribute to the software. A typical version control repository contains a mine of information that is not always obvious and not easy to comprehend in its raw form. However, presenting this historical data in a suitable format by using software visualisation techniques allows the evolution of the software over a number of releases to be shown. This allows the changes that have been made to the software to be identified clearly, thus ensuring that the effect of those changes will also be emphasised. This then enables both managers and developers to gain a more detailed view of the current state of the project. The visualisation of evolving software introduces a number of new issues. This thesis investigates some of these issues in detail, and recommends a number of solutions in order to alleviate the problems that may otherwise arise. The solutions are then demonstrated in the definition of two new visualisations. These use historical data contained within version control repositories to show the evolution of the software at a number of levels of granularity. Additionally, animation is used as an integral part of both visualisations - not only to show the evolution by representing the progression of time, but also to highlight the changes that have occurred. Previously, the use of animation within software visualisation has been primarily restricted to small-scale, hand generated visualisations. However, this thesis shows the viability of using animation within software visualisation with automated visualisations on a large scale. In addition, evaluation of the visualisations has shown that they are suitable for showing the changes that have occurred in the software over a period of time, and subsequently how the software has evolved. These visualisations are therefore suitable for use by developers and managers involved with open source software. In addition, they also provide a basis for future research in evolutionary visualisations, software evolution and open source development
    corecore