Search CORE

arXiv.org e-Print Archive

On the Effect of Semantically Enriched Context Models on Software Modularization

Author: Hage Jurriaan
Jansen Slinger
Khadka Ravi
Saeidi Amir
Publication venue: 'Aspect-Oriented Software Association (AOSA)'
Publication date: 04/08/2017
Field of study

Many of the existing approaches for program comprehension rely on the linguistic information found in source code, such as identifier names and comments. Semantic clustering is one such technique for modularization of the system that relies on the informal semantics of the program, encoded in the vocabulary used in the source code. Treating the source code as a collection of tokens loses the semantic information embedded within the identifiers. We try to overcome this problem by introducing context models for source code identifiers to obtain a semantic kernel, which can be used for both deriving the topics that run through the system as well as their clustering. In the first model, we abstract an identifier to its type representation and build on this notion of context to construct contextual vector representation of the source code. The second notion of context is defined based on the flow of data between identifiers to represent a module as a dependency graph where the nodes correspond to identifiers and the edges represent the data dependencies between pairs of identifiers. We have applied our approach to 10 medium-sized open source Java projects, and show that by introducing contexts for identifiers, the quality of the modularization of the software systems is improved. Both of the context models give results that are superior to the plain vector representation of documents. In some cases, the authoritativeness of decompositions is improved by 67%. Furthermore, a more detailed evaluation of our approach on JEdit, an open source editor, demonstrates that inferred topics through performing topic analysis on the contextual representations are more meaningful compared to the plain representation of the documents. The proposed approach in introducing a context model for source code identifiers paves the way for building tools that support developers in program comprehension tasks such as application and domain concept location, software modularization and topic analysis

ZENODO

arXiv.org e-Print Archive

FigShare

Applications of Multi-view Learning Approaches for Software Comprehension

Author: Hage Jurriaan
Jansen Slinger
Khadka Ravi
Saeidi Amir
Publication venue: 'Aspect-Oriented Software Association (AOSA)'
Publication date: 01/02/2019
Field of study

Program comprehension concerns the ability of an individual to make an understanding of an existing software system to extend or transform it. Software systems comprise of data that are noisy and missing, which makes program understanding even more difficult. A software system consists of various views including the module dependency graph, execution logs, evolutionary information and the vocabulary used in the source code, that collectively defines the software system. Each of these views contain unique and complementary information; together which can more accurately describe the data. In this paper, we investigate various techniques for combining different sources of information to improve the performance of a program comprehension task. We employ state-of-the-art techniques from learning to 1) find a suitable similarity function for each view, and 2) compare different multi-view learning techniques to decompose a software system into high-level units and give component-level recommendations for refactoring of the system, as well as cross-view source code search. The experiments conducted on 10 relatively large Java software systems show that by fusing knowledge from different views, we can guarantee a lower bound on the quality of the modularization and even improve upon it. We proceed by integrating different sources of information to give a set of high-level recommendations as to how to refactor the software system. Furthermore, we demonstrate how learning a joint subspace allows for performing cross-modal retrieval across views, yielding results that are more aligned with what the user intends by the query. The multi-view approaches outlined in this paper can be employed for addressing problems in software engineering that can be encoded in terms of a learning problem, such as software bug prediction and feature location

An Evaluation Of Service Frameworks For The Management Of Service Ecosystems

Author: Hage Jurriaan
Helms Remko
Jansen Slinger
Khadka Ravi
Saeidi Amir
Publication venue: AIS Electronic Library (AISeL)
Publication date: 01/01/2011
Field of study

A service ecosystem is a marketplace for trading services in which services are developed, published, sold and used. Service ecosystems have changed the way of service delivery and service consumption among actors/parties, who perform specific roles for the operation of the ecosystems. Such actors, being service providers, consumers, mediators and intermediaries, ensure the livelihood of the ecosystem. However, the role of the service infrastructure provider, one of the actors of the service ecosystem, is still not being explored sufficiently. The service infrastructure provider provides service infrastructures/frameworks upon which other actors of the service ecosystem operate. In this paper, an evaluation framework for the service framework is defined, which is based on the features that are required for a service ecosystem to thrive. The evaluation framework is used to evaluate three opensource service frameworks. The evaluation framework facilities the selection process of a service framework among the largely available ones

CiteSeerX

AIS Electronic Library (AISeL)

Migrating a Large Scale Legacy Application to SOA: Challenges and Lessons Learned

Author: Amir Saeidi
Geer P. Haas
Jurriaan Hage
Ravi Khadka
Slinger Jansen
Publication venue
Publication date: 01/10/2013
Field of study

Abstract—This paper presents the findings of a case study of a large scale legacy to service-oriented architecture migration process in the payments domain of a Dutch bank. The paper presents the business drivers that initiated the migration, and describes a 4-phase migration process. For each phase, the paper details benefits of using the techniques, best practices that contribute to the success, and possible challenges that are faced during migration. Based on these observations, the findings are discussed as lessons learned, including the implications of using reverse engineering techniques to facilitate the migration process, adopting a pragmatic migration realization approach, emphasizing the organizational and business perspectives, and harvesting knowledge of the system throughout the system’s life cycle. I

CiteSeerX

Plagiarism in Take-home Exams: Help-seeking, Collaboration, and Systematic Cheating

Author: Aasheim Cheryl L
Hage Jurriaan
Hattingh Frederik
Martins Vítor T
Prechelt Lutz
Yudelson Michael
Publication venue: ACM
Publication date: 28/06/2017
Field of study

Due to the increased enrollments in Computer Science education programs, institutions have sought ways to automate and streamline parts of course assessment in order to be able to invest more time in guiding students' work. This article presents a study of plagiarism behavior in an introductory programming course, where a traditional pen-and-paper exam was replaced with multiple take-home exams. The students who took the take-home exam enabled a software plugin that recorded their programming process. During an analysis of the students' submissions, potential plagiarism cases were highlighted, and students were invited to interviews. The interviews with the candidates for plagiarism highlighted three types of plagiarism behaviors: help-seeking, collaboration, and systematic cheating. Analysis of programming process traces indicates that parts of such behavior are detectable directly from programming process data.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Choosing Code Segments to Exclude from Code Similarity Detection

Author: Abelson H
Allyson França B
Cole Jason R
Dijkstra Edsger Wybe
Hage Jurriaan
Karnalim Oscar
Le Nguyen Thanh Tri
Mann Samuel
Misi Marko J
Myers Trina
Papert Seymour
Prechelt Lutz
Sheard Judy
Sheard Judy
Zhang Michael
Publication venue: ACM
Publication date: 01/06/2020
Field of study

When student programs are compared for similarity as a step in the detection of academic misconduct, certain segments of code are always sure to be similar but are no cause for suspicion. Some of these segments are boilerplate code (e.g. public static void main String [] args) and some will be code that was provided to students as part of the assessment specification. This working group explores these and other types of code that are legitimately common in student assessments and can therefore be excluded from similarity checking. From their own institutions, working group members collected assessment submissions that together encompass a wide variety of assessment tasks in a wide variety of programming languages. The submissions were analysed to determine what sorts of code segment arose frequently in each assessment task. The group has found that common code can arise in programming assessment tasks when it is required for compilation purposes; when it reflects an intuitive way to undertake part or all of the task in question; when it can be legitimately copied from external sources; and when it has been suggested by people with whom many of the students have been in contact. A further finding is that the nature and size of the common code fragments vary with course level and with task complexity. An informal survey of programming educators confirms the group's findings and gives some reasons why various educators include code when setting programming assignments.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Clinical consequences of diagnostic variability in the histopathological evaluation of early rectal cancer

Author: Bosker Robbert J.I.
Crobach Stijn
de Graaf Eelco J.R.
Hage Mariska
Laclé Miangela M.
Moll Freek C.P.
Moons Leon M.G.
Peeters Koen C.M.J.
Smits Lisanne J.H.
Tuynman Jurriaan B.
van Grieken Nicole C.T.
van Lieshout Annabel S.
van Westreenen Henderik L.
Publication venue
Publication date: 01/07/2023
Field of study

Introduction: In early rectal cancer, organ sparing treatment strategies such as local excision have gained popularity. The necessity of radical surgery is based on the histopathological evaluation of the local excision specimen. This study aimed to describe diagnostic variability between pathologists, and its impact on treatment allocation in patients with locally excised early rectal cancer. Materials and methods: Patients with locally excised pT1-2 rectal cancer were included in this prospective cohort study. Both quantitative measures and histopathological risk factors (i.e. poor differentiation, deep submucosal invasion, and lymphatic- or venous invasion) were evaluated. Interobserver variability was reported by both percentages and Fleiss’ Kappa- (ĸ) or intra-class correlation coefficients. Results: A total of 126 patients were included. Ninety-four percent of the original histopathological reports contained all required parameters. In 73 of the 126 (57.9%) patients, at least one discordant parameter was observed, which regarded histopathological risk factors for lymph node metastases in 36 patients (28.6%). Interobserver agreement among different variables varied between 74% and 95% or ĸ 0.530–0.962. The assessment of lymphovascular invasion showed discordances in 26% (ĸ = 0.530, 95% CI 0.375–0.684) of the cases. In fourteen (11%) patients, discordances led to a change in treatment strategy. Conclusion: This study demonstrated that there is substantial interobserver variability between pathologists, especially in the assessment of lymphovascular invasion. Pathologists play a key role in treatment allocation after local excision of early rectal cancer, therefore interobserver variability needs to be reduced to decrease the number of patients that are over- or undertreated.</p

EUR Research Repository

Clinical consequences of diagnostic variability in the histopathological evaluation of early rectal cancer

Author: Bosker Robbert J.I.
Crobach Stijn
de Graaf Eelco J.R.
Hage Mariska
Laclé Miangela M.
Moll Freek C.P.
Moons Leon M.G.
Peeters Koen C.M.J.
Smits Lisanne J.H.
Tuynman Jurriaan B.
van Grieken Nicole C.T.
van Lieshout Annabel S.
van Westreenen Henderik L.
Publication venue
Publication date: 01/07/2023
Field of study

EUR Research Repository

Improving type error messages for generic Java

Author: B. Heeren
B. Lerner
B.J. McAdam
B.J. McAdam
C. Haack
D. Duggan
D. Smith
F. Huch
G. Hedin
J. Gosling
J. Hage
J. Hage
J. Yang
J. Yang
J. Yang
J.A. Walz
Jurriaan Hage
L. Damas
M. Torgersen
M. Wand
M.C. Jadud
N. Boustani el
N. Boustani el
Nabil el Boustani
O. Lee
O. Lee
P.J. Stuckey
P.J. Stuckey
R. Milner
S. Russell
T. Ekman
T.B. Dinesh
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study