Search CORE

11 research outputs found

Identifying Output Interactions Among Is Projects - A Text Mining Approach

Author: Meier Christian
Publication venue: AIS Electronic Library (AISeL)
Publication date: 01/07/2013
Field of study

AIS Electronic Library (AISeL)

Recommended from our members

Improving the build architecture of legacy C/C++ software systems

Author: Dayani-Fard Homayoun
Mylopoulos John
Periklis Andritsos
Yu Yijun
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2005
Field of study

The build architecture of legacy C/C++ software systems, groups program files in directories to represent logical components. The interfaces of these components are loosely defined by a set of header files that are typically grouped in one common include directory. As legacy systems evolve, these interfaces decay, which contribute to an increase in the build time and the number of conflict in parallel developments. This paper presents an empirical study of the build architecture of large commercial software systems, introduces a restructuring approach, based on Reflexion models and automatic clustering, and reports on a case study using VIM open source editor

Open Research Online (The Open University)

Using heuristics to estimate an appropriate number of latent topics in source code analysis

Author: Antoniol
Asuncion
Bartholomew
Blei
Bollen
Bradford
Comon
Cordy
David B. Skillicorn
Grant
Griffiths
Heinrich
Hindle
James R. Cordy
Kuhn
Linstead
Linstead
Lukins
Maletic
Maletic
Marcus
Maskeri
Oliveto
Roy
Roy
Schölkopf
Scott Grant
Steinwart
Thomas
Thomas
Wallach
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

A Survey of Recent Methods on Deriving Topics from Twitter : Algorithm to Evaluation

Author: Nepal Surya
Paris Cecile
R. Setiawan Aji Nugroho
Yang Jian
Zhao Weiliang
Publication venue
Publication date
Field of study

Unika Repository

Attaching Social Interactions Surrounding Software Changes to the Release History of an Evolving Software System

Author: Baysal Olga
Publication venue: 'University of Waterloo'
Publication date: 01/01/2006
Field of study

Open source software is designed, developed and maintained by means of electronic media. These media include discussions on a variety of issues reflecting the evolution of a software system, such as reports on bugs and their fixes, new feature requests, design change, refactoring tasks, test plans, etc. Often this valuable information is simply buried as plain text in the mailing archives. We believe that email interactions collected prior to a product release are related to its source code modifications, or if they do not immediately correlate to change events of the current release, they might affect changes happening in future revisions. In this work, we propose a method to reason about the nature of software changes by mining and correlating electronic mailing list archives. Our approach is based on the assumption that developers use meaningful names and their domain knowledge in defining source code identifiers, such as classes and methods. We employ natural language processing techniques to find similarity between source code change history and history of public interactions surrounding these changes. Exact string matching is applied to find a set of common concepts between discussion vocabulary and changed code vocabulary. We apply our correlation method on two software systems, LSEdit and Apache Ant. The results of these exploratory case studies demonstrate the evidence of similarity between the content of free-form text emails among developers and the actual modifications in the code. We identify a set of correlation patterns between discussion and changed code vocabularies and discover that some releases referred to as minor should instead fall under the major category. These patterns can be used to give estimations about the type of a change and time needed to implement it

CiteSeerX

University of Waterloo's Institutional Repository

A Survey of Recent Methods on Deriving Topics from Twitter : Algorithm to Evaluation

Author: Nepal Surya
Paris Cecile
R. Setiawan Aji Nugroho
Yang Jian
Zhao Weiliang
Publication venue
Publication date
Field of study

Unika Repository

An approach to source-code plagiarism detection investigation using latent semantic analysis

Author: Cosma Georgina
Publication venue
Publication date
Field of study

This thesis looks at three aspects of source-code plagiarism. The first aspect of the thesis is concerned with creating a definition of source-code plagiarism; the second aspect is concerned with describing the findings gathered from investigating the Latent Semantic Analysis information retrieval algorithm for source-code similarity detection; and the final aspect of the thesis is concerned with the proposal and evaluation of a new algorithm that combines Latent Semantic Analysis with plagiarism detection tools. A recent review of the literature revealed that there is no commonly agreed definition of what constitutes source-code plagiarism in the context of student assignments. This thesis first analyses the findings from a survey carried out to gather an insight into the perspectives of UK Higher Education academics who teach programming on computing courses. Based on the survey findings, a detailed definition of source-code plagiarism is proposed. Secondly, the thesis investigates the application of an information retrieval technique, Latent Semantic Analysis, to derive semantic information from source-code files. Various parameters drive the effectiveness of Latent Semantic Analysis. The performance of Latent Semantic Analysis using various parameter settings and its effectiveness in retrieving similar source-code files when optimising those parameters are evaluated. Finally, an algorithm for combining Latent Semantic Analysis with plagiarism detection tools is proposed and a tool is created and evaluated. The proposed tool, PlaGate, is a hybrid model that allows for the integration of Latent Semantic Analysis with plagiarism detection tools in order to enhance plagiarism detection. In addition, PlaGate has a facility for investigating the importance of source-code fragments with regards to their contribution towards proving plagiarism. PlaGate provides graphical output that indicates the clusters of suspicious files and source-code fragments

Warwick Research Archives Portal Repository

Supporting Text Retrieval Query Formulation In Software Engineering

Author: Haiduc Sonia Cristina
Publication venue: DigitalCommons@WayneState
Publication date: 01/01/2013
Field of study

The text found in software artifacts captures important information. Text Retrieval (TR) techniques have been successfully used to leverage this information. Despite their advantages, the success of TR techniques strongly depends on the textual queries given as input. When poorly chosen queries are used, developers can waste time investigating irrelevant results. The quality of a query indicates the relevance of the results returned by TR in response to the query and can give an indication if the results are worth investigating or a reformulation of the query should be sought instead. Knowing the quality of the query could lead to time saved when irrelevant results are returned. However, the only way to determine if a query led to the wanted artifacts is by manually inspecting the list of results. This dissertation introduces novel approaches to measure and predict the quality of queries automatically in the context of SE tasks, based on a set of statistical properties of the queries. The approaches are evaluated for the task of concept location in source code. The results reveal that the proposed approaches are able to accurately capture and predict the quality of queries for SE tasks supported by TR. When a query has low quality, the developer can reformulate it and improve it. However, this is just as hard as formulating the query in the first place. This dissertation presents two approaches for partial and complete automation of the query reformulation process. The semi-automatic approach relies on developer feedback about the relevance of TR results and uses this information to automatically reformulate the query. The automatic approach learns and applies the best reformulation approach for a query and relies on a set of training queries and their statistical properties to achieve this. Both approaches are evaluated for concept location and the results show that the techniques are able to improve the results of the original queries in the majority of the cases. We expect that on the long run the proposed approaches will contribute directly to the reduction of developer effort and implicitly the reduction of software evolution costs

Digital Commons@Wayne State University

An approach to source-code plagiarism detection investigation using latent semantic analysis

Author: Cosma Georgina
Publication venue
Publication date: 01/01/2008
Field of study

OpenGrey Repository