530 research outputs found

    An Empirical Study on the Classification of Python Language Features Using Eye-Tracking

    Get PDF
    Python, currently one of the most popular programming languages, is an object-oriented language that also provides language feature support for other programmingparadigms, such as functional and procedural. It is not currently understood howsupport for multiple paradigms affects the ability of developers to comprehend thatcode. Understanding the predominant paradigm in code, and how developers classifythe predominant paradigm, can benefit future research in program comprehension asthe paradigm may factor into how people comprehend that code. Other researchersmay want to look at how the paradigms in the code interact with various code smells.To investigate how developers classify the predominant paradigm in Python code,we performed an empirical study while utilizing an eye-tracker. The goal was tosee if developers gaze at specific language features while classifying the predominantparadigm and debugging code samples. The study includes both qualitative andquantitative data from 29 Python developers, including their gaze fixations during thetasks. We observed that participants seem to confuse the functional and proceduralparadigms, possibly due to confusing terminology used in Python, though they dogaze at specific language features. Overall, participants took more time classifyingfunctional code. The predominant paradigm did not affect their ability to debug code,though they gave lower confidence ratings for functional code. Adviser: Robert Dye

    InfraPhenoGrid: A scientific workflow infrastructure for Plant Phenomics on the Grid

    Get PDF
    International audiencePlant phenotyping consists in the observation of physical and biochemical traits of plant genotypes in response to environmental conditions. Challenges , in particular in context of climate change and food security, are numerous. High-throughput platforms have been introduced to observe the dynamic growth of a large number of plants in different environmental conditions. Instead of considering a few genotypes at a time (as it is the case when phenomic traits are measured manually), such platforms make it possible to use completely new kinds of approaches. However, the data sets produced by such widely instrumented platforms are huge, constantly augmenting and produced by increasingly complex experiments, reaching a point where distributed computation is mandatory to extract knowledge from data. In this paper, we introduce InfraPhenoGrid, the infrastructure we designed and deploy to efficiently manage data sets produced by the PhenoArch plant phenomics platform in the context of the French Phenome Project. Our solution consists in deploying scientific workflows on a Grid using a middle-ware to pilot workflow executions. Our approach is user-friendly in the sense that despite the intrinsic complexity of the infrastructure, running scientific workflows and understanding results obtained (using provenance information) is kept as simple as possible for end-users

    Stella: A Python-based Domain-Specific Language for Simulations

    Get PDF
    Stella is a domain-specific language that (1) has single thread performance competitive with low-level languages, (2) supports object-oriented programming (OOP) to properly structure the code, and (3) is very easy to use. Instead of prototyping in a high-level language and then rewriting in a lower-level language, Stella is embedded in Python, is transparently usable, retains some OOP features, compiles to machine code, and executes at speed similar to C. Stella\u27s source code is compatible with Python, and allows easy integration of C libraries. Its features are focused on the needs of scientific simulations. Other projects to speed up Python focus on easy integration, and smaller critical sections. In contrast, Stella supports translating larger programs in their entirety, and does not allow interaction with the Python run-time, to ensure predictable performance. My experience developing Stella shows that by carefully selecting language features, high run-time performance can be achieved in a high-level language that has in practice very few restrictions

    Using multiclass classification algorithms to improve text categorization tool:NLoN

    Get PDF
    Abstract. Natural language processing (NLP) and machine learning techniques have been widely utilized in the mining software repositories (MSR) field in recent years. Separating natural language from source code is a pre-processing step that is needed in both NLP and the MSR domain for better data quality. This paper presents the design and implementation of a multi-class classification approach that is based on the existing open-source R package Natural Language or Not (NLoN). This article also reviews the existing literature on MSR and NLP. The review classified the information sources and approaches of MSR in detail, and also focused on the text representation and classification tasks of NLP. In addition, the design and implementation methods of the original paper are briefly introduced. Regarding the research methodology, since the research goal is technology-oriented, i.e., to improve the design and implementation of existing technologies, this article adopts the design science research methodology and also describes how the methodology was adopted. This research implements an open-source Python library, namely NLoN-PY. This is an open-source library hosted on GitHub, and users can also directly use the tools published to the PyPI library. Since NLoN has achieved comparable performance on two-class classification tasks with the Lasso regression model, this study evaluated other multi-class classification algorithms, i.e., Naive Bayes, k-Nearest Neighbours, and Support Vector Machine. Using 10-fold cross-validation, the expanded classifier achieved AUC performance of 0.901 for the 5-class classification task and the AUC performance of 0.92 for the 2-class task. Although the design of this study did not show a significant performance improvement compared to the original design, the impact of unbalanced data distribution on performance was detected and the category of the classification problem was also refined in the process. These findings on the multi-class classification design can provide a research foundation or direction for future research

    The Chinese pine genome and methylome unveil key features of conifer evolution

    Get PDF
    Conifers dominate the world's forest ecosystems and are the most widely planted tree species. Their giant and complex genomes present great challenges for assembling a complete reference genome for evolutionary and genomic studies. We present a 25.4-Gb chromosome-level assembly of Chinese pine (Pinus tabuliformis) and revealed that its genome size is mostly attributable to huge intergenic regions and long introns with high transposable element (TE) content. Large genes with long introns exhibited higher expressions levels. Despite a lack of recent whole-genome duplication, 91.2% of genes were duplicated through dispersed duplication, and expanded gene families are mainly related to stress responses, which may underpin conifers' adaptation, particularly in cold and/or arid conditions. The reproductive regulation network is distinct compared with angiosperms. Slow removal of TEs with high-level methylation may have contributed to genomic expansion. This study provides insights into conifer evolution and resources for advancing research on conifer adaptation and development

    Towards Automated Software Evolution of Data-Intensive Applications

    Full text link
    Recent years have witnessed an explosion of work on Big Data. Data-intensive applications analyze and produce large volumes of data typically terabyte and petabyte in size. Many techniques for facilitating data processing are integrated into data-intensive applications. API is a software interface that allows two applications to communicate with each other. Streaming APIs are widely used in today\u27s Object-Oriented programming development that can support parallel processing. In this dissertation, an approach that automatically suggests stream code run in parallel or sequentially is proposed. However, using streams efficiently and properly needs many subtle considerations. The use and misuse patterns for stream codes are proposed in this dissertation. Modern software, especially for highly transactional software systems, generates vast logging information every day. The huge amount of information prevents developers from receiving useful information effectively. Log-level could be used to filter run-time information. This dissertation proposes an automated evolution approach for alleviating logging information overload by rejuvenating log levels according to developers\u27 interests. Machine Learning (ML) systems are pervasive in today\u27s software society. They are always complex and can process large volumes of data. Due to the complexity of ML systems, they are prone to classic technical debt issues, but how ML systems evolve is still a puzzling problem. This dissertation introduces ML-specific refactoring and technical debt for solving this problem

    Safe Automated Refactoring for Intelligent Parallelization of Java 8 Streams

    Full text link
    Streaming APIs are becoming more pervasive in mainstream Object-Oriented programming languages and platforms. For example, the Stream API introduced in Java 8 allows for functional-like, MapReduce-style operations in processing both finite, e.g., collections, and infinite data structures. However, using this API efficiently involves subtle considerations such as determining when it is best for stream operations to run in parallel, when running operations in parallel can be less efficient, and when it is safe to run in parallel due to possible lambda expression side-effects. ics-preserving fashion. The approach, based on a novel data ordering and typestate analysis, consists of preconditions and transformations for automatically determining when it is safe and possibly advantageous to convert sequential streams to parallel and unorder or de-parallelize already parallel streams. The approach was implemented as a plug-in to the popular Eclipse IDE, uses the WALA and SAFE analysis frameworks, and was evaluated on 18 Java projects consisting of ∌1.65M lines of code. We found that 116 of 419 candidate streams (27.68%) were refactorable, and an average speedup of 3.49 on performance tests was observed. The results indicate that the approach is useful in optimizing stream code to their full potential
    • 

    corecore