9 research outputs found
The Python user interface of the elsA cfd software: a coupling framework for external steering layers
The Python--elsA user interface of the elsA cfd (Computational Fluid
Dynamics) software has been developed to allow users to specify simulations
with confidence, through a global context of description objects grouped inside
scripts. The software main features are generated documentation, context
checking and completion, and helpful error management. Further developments
have used this foundation as a coupling framework, allowing (thanks to the
descriptive approach) the coupling of external algorithms with the cfd solver
in a simple and abstract way, leading to more success in complex simulations.
Along with the description of the technical part of the interface, we try to
gather the salient points pertaining to the psychological viewpoint of user
experience (ux). We point out the differences between user interfaces and pure
data management systems such as cgns
Big Code != Big Vocabulary: Open-Vocabulary Models for Source Code
Statistical language modeling techniques have successfully been applied to
large source code corpora, yielding a variety of new software development
tools, such as tools for code suggestion, improving readability, and API
migration. A major issue with these techniques is that code introduces new
vocabulary at a far higher rate than natural language, as new identifier names
proliferate. Both large vocabularies and out-of-vocabulary issues severely
affect Neural Language Models (NLMs) of source code, degrading their
performance and rendering them unable to scale.
In this paper, we address this issue by: 1) studying how various modelling
choices impact the resulting vocabulary on a large-scale corpus of 13,362
projects; 2) presenting an open vocabulary source code NLM that can scale to
such a corpus, 100 times larger than in previous work; and 3) showing that such
models outperform the state of the art on three distinct code corpora (Java, C,
Python). To our knowledge, these are the largest NLMs for code that have been
reported.
All datasets, code, and trained models used in this work are publicly
available.Comment: 13 pages; to appear in Proceedings of ICSE 202
On the Influence of Representation Type and Gender on Recognition Tasks of Program Comprehension
RĂSUMĂ
Lâobjectif de la maintenance logicielle est dâamĂ©liorer les logiciels existants en prĂ©servant leur intĂ©gritĂ©. La maintenance peut reprĂ©senter jusquâĂ 60% du budget dâun logiciel. Ainsi, amĂ©liorer la maintenabilitĂ© des logiciels est bĂ©nĂ©fique aussi bien pour les fournisseurs que les utilisateurs de logiciels.
Les dĂ©veloppeurs de logiciels consacrent un effort considĂ©rable Ă la comprĂ©hension des programmes, qui est une Ă©tape primordiale Ă la maintenance logicielle. Nous faisons lâhypothĂšque que le genre des dĂ©veloppeurs de logiciels et le type de reprĂ©sentation peut affecter leur effort et leur efficacitĂ©. Ces facteurs doivent ĂȘtre considĂ©rĂ©s et minutieusement analysĂ©s dans la mesure oĂč ils peuvent cacher certains effets significatifs pouvant ĂȘtre identifiĂ©s en analysant le processus de comprĂ©hension.
Dans cette thĂšse, nous nous inspirons de lâutilisation de lâocculomĂštre pour lâĂ©tude du processus cognitif lors de la rĂ©solution des problĂšmes. Nous avons effectuĂ© une Ă©tude fonctionnelle pour Ă©valuer tous les travaux de recherche faisant usage de lâocculomĂštre en gĂ©nie logiciel. Les rĂ©sultats obtenus nous ont motivĂ© Ă utiliser lâocculomĂštre pour effectuer un ensemble dâĂ©tudes afin analyser lâeffet de deux facteurs importants sur la comprĂ©hension des programmes : le type de reprĂ©sentation (textuelle ou graphique) et le genre du dĂ©veloppeur. Afin de comprendre comment les diffĂ©rents types de reprĂ©sentations et le genre influencent les stratĂ©gies de visualisation, nous avons Ă©tudiĂ© la diffĂ©rence de stratĂ©gie entre dĂ©veloppeurs.
Les rĂ©sultats obtenus montrent que, comparĂ© Ă une reprĂ©sentation graphique, la reprĂ©sentation sous forme de texte structurĂ© aide mieux le dĂ©veloppeur dans son processus cognitif lors de la comprĂ©hension des programmes de petite taille. Ainsi, la reprĂ©sentation textuelle requiert moins de temps et dâeffort aux participants. Par contre, la reprĂ©sentation graphique est celle prĂ©fĂ©rĂ©e par les dĂ©veloppeurs. Nos rĂ©sultats montrent que la structure topologique de la reprĂ©sentation graphique aide les dĂ©veloppeurs Ă mĂ©moriser lâemplacement des Ă©lĂ©ments et Ă retrouver plus rapidement les Ă©lĂ©ments pertinents comparĂ© Ă la reprĂ©sentation textuelle.
En plus, la structure hiĂ©rarchique de la reprĂ©sentation graphique guide les dĂ©veloppeurs Ă suivre une stratĂ©gie de visualisation spĂ©cifique. Nous avons observĂ© que les femmes et les hommes ont des stratĂ©gies de visualisation diffĂ©rentes lors de la lecture du code ou de la mĂ©morisation des noms des identificateurs. Les femmes ont tendance Ă inspecter minutieusement toutes les options afin de procĂ©der Ă lâĂ©limination de la mauvaise rĂ©ponse. Au contraire, les hommes ont tendance Ă inspecter briĂšvement certaines rĂ©ponses. Pendant que les femmes consacrent plus de temps Ă analyser chaque type dâentitĂ© lâun aprĂšs lâautre, les hommes alternent leur attention entre diffĂ©rents type dâentitĂ©.----------ABSTRACT
The purpose of software maintenance is to correct and enhance an existing software system while preserving its integrity. Software maintenance can cost more than 60% of the budget of a software system, thus improving the maintainability of software is important for both the software industry and its customers. Program comprehension is the initial step of software maintenance that requires the major amount of maintenanceâs time and effort. We conjuncture that developersâ gender and the type of representations that developers utilize to perform program comprehension impact their efficiency and effectiveness. These factors must be considered and carefully studied, because they may hide some significant effects to be found by analyzing the comprehension process.
In this dissertation, inspired by the literature on the usefulness of eye-trackers to study the cognitive process involved in problem solving activities, we perform a mapping study and evaluate all research relevant to the use of eye-tracking technique in software engineering. The results motivate us to perform a set of eye-tracking studies to analyze the impact of two important factors on program comprehension: representation type (textual vs. graphical) and developersâ gender. Moreover, we investigate and compare viewing strategies variability amongst developers to understand how the representation type and gender differences influence viewing strategies.
Overall, our results indicate that structured text provides more cognitive support for developers while performing program comprehension with small systems compared to a graphical representation. Developers spend less time and effort working with textual representations. However, developers mostly preferred to use graphical representations and our results confirm that the topological structure of graphical representations helps developers to memorize the location of the elements and to find the relevant ones faster in comparison with textual representation. Moreover, the hierarchical structure of the representation guides developers to follow specific viewing strategies while working with representations.
Regarding the impact of gender, our results emphasize that male and female developers exploit different viewing strategies while reading source code or recalling the names of identifiers. Female developers seem to carefully weigh all options and rule out wrong answers, while male developers seem to quickly set their minds on some answers and move forward. Moreover, female developers spend more time on each source code entity and analyze it before going to the next one. In contrast, male developers utilize several attention switching strategies between different source code entities
Learning natural coding conventions
Coding conventions are ubiquitous in software engineering practice. Maintaining a uniform
coding style allows software development teams to communicate through code by
making the code clear and, thus, readable and maintainableâtwo important properties
of good code since developers spend the majority of their time maintaining software
systems. This dissertation introduces a set of probabilistic machine learning models
of source code that learn coding conventions directly from source code written in a
mostly conventional style. This alleviates the coding convention enforcement problem,
where conventions need to first be formulated clearly into unambiguous rules and then
be coded in order to be enforced; a tedious and costly process.
First, we introduce the problem of inferring a variableâs name given its usage context
and address this problem by creating Naturalize â a machine learning framework
that learns to suggest conventional variable names. Two machine learning models, a
simple n-gram language model and a specialized neural log-bilinear context model are
trained to understand the role and function of each variable and suggest new stylistically
consistent variable names. The neural log-bilinear model can even suggest previously
unseen names by composing them from subtokens (i.e. sub-components of code identifiers).
The suggestions of the models achieve 90% accuracy when suggesting variable
names at the top 20% most confident locations, rendering the suggestion system usable
in practice.
We then turn our attention to the significantly harder method naming problem.
Learning to name methods, by looking only at the code tokens within their body, requires
a good understating of the semantics of the code contained in a single method.
To achieve this, we introduce a novel neural convolutional attention network that learns
to generate the name of a method by sequentially predicting its subtokens. This is
achieved by focusing on different parts of the code and potentially directly using body
(sub)tokens even when they have never been seen before. This model achieves an F1
score of 51% on the top five suggestions when naming methods of real-world open-source
projects.
Learning about naming code conventions uses the syntactic structure of the code
to infer names that implicitly relate to code semantics. However, syntactic similarities
and differences obscure code semantics. Therefore, to capture features of semantic
operations with machine learning, we need methods that learn semantic continuous
logical representations. To achieve this ambitious goal, we focus our investigation on
logic and algebraic symbolic expressions and design a neural equivalence network architecture
that learns semantic vector representations of expressions in a syntax-driven
way, while solely retaining semantics. We show that equivalence networks learn significantly
better semantic vector representations compared to other, existing, neural
network architectures.
Finally, we present an unsupervised machine learning model for mining syntactic
and semantic code idioms. Code idioms are conventional âmental chunksâ of code that
serve a single semantic purpose and are commonly used by practitioners. To achieve
this, we employ Bayesian nonparametric inference on tree substitution grammars. We
present a wide range of evidence that the resulting syntactic idioms are meaningful,
demonstrating that they do indeed recur across software projects and that they occur
more frequently in illustrative code examples collected from a Q&A site. These syntactic
idioms can be used as a form of automatic documentation of coding practices
of a programming language or an API. We also mine semantic loop idioms, i.e. highly
abstracted but semantic-preserving idioms of loop operations. We show that semantic
idioms provide data-driven guidance during the creation of software engineering tools
by mining common semantic patterns, such as candidate refactoring locations. This
gives data-based evidence to tool, API and language designers about general, domain
and project-specific coding patterns, who instead of relying solely on their intuition, can
use semantic idioms to achieve greater coverage of their tool or new API or language
feature. We demonstrate this by creating a tool that suggests loop refactorings into
functional constructs in LINQ. Semantic loop idioms also provide data-driven evidence
for introducing new APIs or programming language features
Preselection of Electronic Services by Given Business Service Based on Measuring Semantic Heterogeneity within the Application Area of Logistics
According to the service orientation design paradigm there are business (BS) and electronic services (ES). BS encapsulate business concerns. ES encapsulate computing systems, information systems and software applications. In environments with a high number of BS and ES the decision on which ES provides the most suitable support for a certain BS is not a trivial task. The objective of the thesis is to provide models, methods, and techniques for preselection of ES for a given BS. Preselection is about reducing the large amount of ES to a significant smaller amount under the consideration of a particular BS