6,635 research outputs found
Improving Software Project Health Using Machine Learning
In recent years, systems that would previously live on different platforms have been integrated under a single umbrella. The increased use of GitHub, which offers pull-requests, issue trackingand version history, and its integration with other solutions such as Gerrit, or Travis, as well as theresponse from competitors, created development environments that favour agile methodologiesby increasingly automating non-coding tasks: automated build systems, automated issue triagingetc. In essence, source-code hosting platforms shifted to continuous integration/continuousdelivery (CI/CD) as a service. This facilitated a shift in development paradigms, adherents ofagile methodology can now adopt a CI/CD infrastructure more easily. This has also created large,publicly accessible sources of source-code together with related project artefacts: GHTorrent andsimilar datasets now offer programmatic access to the whole of GitHub. Project health encompasses traceability, documentation, adherence to coding conventions,tasks that reduce maintenance costs and increase accountability, but may not directly impactfeatures. Overfocus on health can slow velocity (new feature delivery) so the Agile Manifestosuggests developers should travel light — forgo tasks focused on a project health in favourof higher feature velocity. Obviously, injudiciously following this suggestion can undermine aproject’s chances for success. Simultaneously, this shift to CI/CD has allowed the proliferation of Natural Language orNatural Language and Formal Language textual artefacts that are programmatically accessible:GitHub and their competitors allow API access to their infrastructure to enable the creation ofCI/CD bots. This suggests that approaches from Natural Language Processing and MachineLearning are now feasible and indeed desirable. This thesis aims to (semi-)automate tasks forthis new paradigm and its attendant infrastructure by bringing to the foreground the relevant NLPand ML techniques. Under this umbrella, I focus on three synergistic tasks from this domain: (1) improving theissue-pull-request traceability, which can aid existing systems to automatically curate the issuebacklog as pull-requests are merged; (2) untangling commits in a version history, which canaid the beforementioned traceability task as well as improve the usability of determining a faultintroducing commit, or cherry-picking via tools such as git bisect; (3) mixed-text parsing, to allowbetter API mining and open new avenues for project-specific code-recommendation tools
Mining and Visualizing Research Networks using the Artefact-Actor-Network Approach
Reinhardt, W., Wilke, A., Moi, M., Drachsler, H., & Sloep, P. B. (2012). Mining and Visualizing Research Networks using the Artefact-Actor-Network Approach. In A. Abraham (Ed.), Computational Social Networks. Mining and Visualization (pp. 233-268). Springer. Also available at http://www.springer.com/computer/communication+networks/book/978-1-4471-4053-5Virtual communities are increasingly relying on technologies and tools of the so-called Web 2.0. In the context of scientific events and topical Research Networks, researchers use Social Media as one main communication channel. This raises the question, how to monitor and analyze such Research Networks. In this chapter we argue that Artefact-Actor-Networks (AANs) serve well for modeling, storing and mining the social interactions around digital learning resources originating from various learning services. In order to deepen the model of AANs and its application to Research Networks, a relevant theoretical background as well as clues for a prototypical reference implementation are provided. This is followed by the analysis of six Research Networks and a detailed inspection of the results. Moreover, selected networks are visualized. Research Networks of the same type show similar descriptive measures while different types are not directly comparable to each other. Further, our analysis shows that narrowness of a Research Network's subject area can be predicted using the connectedness of semantic similarity networks. Finally conclusions are drawn and implications for future research are discussed
Natural Language Processing in-and-for Design Research
We review the scholarly contributions that utilise Natural Language
Processing (NLP) methods to support the design process. Using a heuristic
approach, we collected 223 articles published in 32 journals and within the
period 1991-present. We present state-of-the-art NLP in-and-for design research
by reviewing these articles according to the type of natural language text
sources: internal reports, design concepts, discourse transcripts, technical
publications, consumer opinions, and others. Upon summarizing and identifying
the gaps in these contributions, we utilise an existing design innovation
framework to identify the applications that are currently being supported by
NLP. We then propose a few methodological and theoretical directions for future
NLP in-and-for design research
Automated recommendation, reuse, and generation of unit tests for software systems
This thesis presents a body of work relating to the automated discovery, reuse, and generation of unit tests for software systems with the goal of improving the efficiency of the software engineering process and the quality of the produced software.
We start with a novel approach to test-to-code traceability link establishment, called TCTracer, which utilises multilevel information and an ensemble of static and dynamic techniques to achieve state-of-the-art accuracy when establishing links between tests and tested functions and test classes and tested classes. This approach is utilised to provide test-to-code traceability links which facilitate multiple other parts of the work.
We then move on to test reuse where we first define an abstract framework, called Rashid, for using connections between artefacts to identify new artefacts for reuse and utilise this framework in Relatest, an approach for producing test recommendations for new functions. Relatest instantiates Rashid by using TCTracer to establish connections between tests and functions and code similarity measures to establish connections between similar functions. This information is used to create lists of recommendations for new functions.
We then present an investigation into the automated transplantation of tests which attempts to remove the manual effort required to transform Relatest recommendations and insert them into another project.
Finally, we move on to test generation where we utilise neural networks to generate unit test code by learning from existing function-to-test pairs. The first approach, TestNMT, investigates using recurrent neural networks to generate whole JUnit tests and the second approach, ReAssert, utilises a transformer-based architecture to generate JUnit asserts.
In total, this thesis addresses the problem by developing approaches for the discovery, reuse, and utilisation of existing functions and tests, including the establishment of relationships between these artefacts, developing mechanisms to aid automated test reuse and learning from existing tests to generate new tests
Requirements Analysis for an Open Research Knowledge Graph
Current science communication has a number of drawbacks and bottlenecks which
have been subject of discussion lately: Among others, the rising number of
published articles makes it nearly impossible to get an overview of the state
of the art in a certain field, or reproducibility is hampered by fixed-length,
document-based publications which normally cannot cover all details of a
research work. Recently, several initiatives have proposed knowledge graphs
(KGs) for organising scientific information as a solution to many of the
current issues. The focus of these proposals is, however, usually restricted to
very specific use cases. In this paper, we aim to transcend this limited
perspective by presenting a comprehensive analysis of requirements for an Open
Research Knowledge Graph (ORKG) by (a) collecting daily core tasks of a
scientist, (b) establishing their consequential requirements for a KG-based
system, (c) identifying overlaps and specificities, and their coverage in
current solutions. As a result, we map necessary and desirable requirements for
successful KG-based science communication, derive implications and outline
possible solutions.Comment: Accepted for publishing in 24th International Conference on Theory
and Practice of Digital Libraries, TPDL 202
Report on shape analysis and matching and on semantic matching
In GRAVITATE, two disparate specialities will come together in one working platform for the archaeologist: the fields of shape analysis, and of metadata search. These fields are relatively disjoint at the moment, and the research and development challenge of GRAVITATE is precisely to merge them for our chosen tasks. As shown in chapter 7 the small amount of literature that already attempts join 3D geometry and semantics is not related to the cultural heritage domain. Therefore, after the project is done, there should be a clear ‘before-GRAVITATE’ and ‘after-GRAVITATE’ split in how these two aspects of a cultural heritage artefact are treated.This state of the art report (SOTA) is ‘before-GRAVITATE’. Shape analysis and metadata description are described separately, as currently in the literature and we end the report with common recommendations in chapter 8 on possible or plausible cross-connections that suggest themselves. These considerations will be refined for the Roadmap for Research deliverable.Within the project, a jargon is developing in which ‘geometry’ stands for the physical properties of an artefact (not only its shape, but also its colour and material) and ‘metadata’ is used as a general shorthand for the semantic description of the provenance, location, ownership, classification, use etc. of the artefact. As we proceed in the project, we will find a need to refine those broad divisions, and find intermediate classes (such as a semantic description of certain colour patterns), but for now the terminology is convenient – not least because it highlights the interesting area where both aspects meet.On the ‘geometry’ side, the GRAVITATE partners are UVA, Technion, CNR/IMATI; on the metadata side, IT Innovation, British Museum and Cyprus Institute; the latter two of course also playing the role of internal users, and representatives of the Cultural Heritage (CH) data and target user’s group. CNR/IMATI’s experience in shape analysis and similarity will be an important bridge between the two worlds for geometry and metadata. The authorship and styles of this SOTA reflect these specialisms: the first part (chapters 3 and 4) purely by the geometry partners (mostly IMATI and UVA), the second part (chapters 5 and 6) by the metadata partners, especially IT Innovation while the joint overview on 3D geometry and semantics is mainly by IT Innovation and IMATI. The common section on Perspectives was written with the contribution of all
Ecosystem-inspired enterprise modelling framework for collaborative and networked manufacturing systems
Rapid changes in the open manufacturing environment are imminent due to the increase of customer demand, global competition, and digital fusion. This has exponentially increased both complexity and uncertainty in the manufacturing landscape, creating serious challenges for competitive enterprises. For enterprises to remain competitive, analysing manufacturing activities and designing systems to address emergent needs, in a timely and efficient manner, is understood to be crucial. However, existing analysis and design approaches adopt a narrow diagnostic focus on either managerial or engineering aspects and neglect to consider the holistic complex behaviour of enterprises in a collaborative manufacturing network (CMN). It has been suggested that reflecting upon ecosystem theory may bring a better understanding of how to analyse the CMN. The research presented in this paper draws on a theoretical discussion with aim to demonstrate a facilitating approach to those analysis and design tasks. This approach was later operationalised using enterprise modelling (EM) techniques in a novel, developed framework that enhanced systematic analysis, design, and business-IT alignment. It is expected that this research view is opening a new field of investigation
- …