581 research outputs found

    An Introduction to Programming for Bioscientists: A Python-based Primer

    Full text link
    Computing has revolutionized the biological sciences over the past several decades, such that virtually all contemporary research in the biosciences utilizes computer programs. The computational advances have come on many fronts, spurred by fundamental developments in hardware, software, and algorithms. These advances have influenced, and even engendered, a phenomenal array of bioscience fields, including molecular evolution and bioinformatics; genome-, proteome-, transcriptome- and metabolome-wide experimental studies; structural genomics; and atomistic simulations of cellular-scale molecular assemblies as large as ribosomes and intact viruses. In short, much of post-genomic biology is increasingly becoming a form of computational biology. The ability to design and write computer programs is among the most indispensable skills that a modern researcher can cultivate. Python has become a popular programming language in the biosciences, largely because (i) its straightforward semantics and clean syntax make it a readily accessible first language; (ii) it is expressive and well-suited to object-oriented programming, as well as other modern paradigms; and (iii) the many available libraries and third-party toolkits extend the functionality of the core language into virtually every biological domain (sequence and structure analyses, phylogenomics, workflow management systems, etc.). This primer offers a basic introduction to coding, via Python, and it includes concrete examples and exercises to illustrate the language's usage and capabilities; the main text culminates with a final project in structural bioinformatics. A suite of Supplemental Chapters is also provided. Starting with basic concepts, such as that of a 'variable', the Chapters methodically advance the reader to the point of writing a graphical user interface to compute the Hamming distance between two DNA sequences.Comment: 65 pages total, including 45 pages text, 3 figures, 4 tables, numerous exercises, and 19 pages of Supporting Information; currently in press at PLOS Computational Biolog

    Personalized Approaches to Supporting the Learning Needs of Lifelong Professional Learners

    Get PDF
    Advanced learning technology research has begun to take on a complex challenge: supporting lifelong learning. Professional learning is an essential subset of lifelong learning that is more tractable than the full lifelong learning challenge. Professionals do not always have access to professional teachers to provide input to the problems they encounter, so they rely on their peers in an online learning community (OLC) to help meet their learning needs. Supporting professional learners within an OLC is a difficult problem as the learning needs of each learner continuously evolve, often in different ways from other learners. Hence, there is a need to provide personalized support to learners adapted to their individual learning needs. This thesis explores personalized approaches for detecting the unperceived learning needs and meeting the expressed learning needs of learners in an OLC. The experimental test bed for this research is Stack Overflow (SO), an OLC used by software professionals. To date, seven experiments have been carried out mining SO peer-peer interaction data. Knowing that question-answerers play a huge role in meeting the learning needs of the question-askers, the first experiment aimed to detect the learning needs of the answerers. Results from experiment 1 show that reputable answerers themselves demonstrate unperceived learning needs as revealed by a decline in quality answers in SO. Of course, a decline in quality answers could impact the help-seeking experience of question-askers; hence experiment 2 sought to understand the effects of the help-seeking experience of question-askers on their enthusiasm to continuously participate within the OLC. As expected, negative help-seeking experiences of question-askers had a large impact on their propensity to seek further help within the OLC. To improve the help-seeking experience of question-askers, it is important to proactively detect the learning needs of the question-answerers before they provide poor quality answers. Thus, in experiment 3 the goal was to predict whether a question-answerer would give a poor answer to a question based on their past peer-peer interactions. Under various assumptions, accuracies ranging from 84.57% to 94.54% were achieved. Next, experiment 4 attempted to detect the unperceived learning needs of question-askers even before they are aware of such needs. Using information about a learner’s interactions over a 5-month period, a prediction was made as to what they would be asking about during the next month, achieving recall and precision values of 0.93 and 0.81. Knowing the learning needs of question-askers early creates an opportunity to predict prospective answerers who could provide timely and quality answers to their question. The goal of experiment 5 was thus to predict the actual answerers for questions based only on information known at the time the question was asked. The iv success rate was at best 63.15%, which would only be marginally useful to inform a real-life peer recommender system. Thus, experiment 6 explored new measures in predicting the answerers, boosting the success rate to 89.64%. Of course, a peer recommender system would be deemed to be especially useful if it can provide prompt interventions, especially to get answers to questions that would otherwise not be answered quickly. To this end, experiment 7 attempted to predict the question-askers whose questions would be answered late or even remain unanswered, and a success rate of 68.4% was achieved. Results from these experiments suggest that modelling the activities of learners in an OLC is key in providing support to them to meet their learning needs. Perhaps, the most important lesson learned in this research is that lightweight approaches can be developed to help meet the evolving learning needs of professionals, even as knowledge changes within a profession. Metrics based on the experiments above are exactly such lightweight methodologies and could be the basis for useful tools to support professional learners

    FDDetector: A Tool for Deduplicating Features in Software Product Lines

    Get PDF
    Duplication is one of the model defects that affect software product lines during their evolution. Many approaches have been proposed to deal with duplication in code level while duplication in features hasn’t received big interest in literature. At the aim of reducing maintenance cost and improving product quality in an early stage of a product line, we have proposed in previous work a tool support based on a conceptual framework. The main objective of this tool called FDDetector is to detect and correct duplication in product line models. In this paper, we recall the motivation behind creating a solution for feature deduplication and we present progress done in the design and implementation of FDDetector

    Software development process mining: discovery, conformance checking and enhancement

    Get PDF
    Context. Modern software projects require the proper allocation of human, technical and financial resources. Very often, project managers make decisions supported only by their personal experience, intuition or simply by mirroring activities performed by others in similar contexts. Most attempts to avoid such practices use models based on lines of code, cyclomatic complexity or effort estimators, thus commonly supported by software repositories which are known to contain several flaws. Objective. Demonstrate the usefulness of process data and mining methods to enhance the software development practices, by assessing efficiency and unveil unknown process insights, thus contributing to the creation of novel models within the software development analytics realm. Method. We mined the development process fragments of multiple developers in three different scenarios by collecting Integrated Development Environment (IDE) events during their development sessions. Furthermore, we used process and text mining to discovery developers’ workflows and their fingerprints, respectively. Results. We discovered and modeled with good quality developers’ processes during programming sessions based on events extracted from their IDEs. We unveiled insights from coding practices in distinct refactoring tasks, built accurate software complexity forecast models based only on process metrics and setup a method for characterizing coherently developers’ behaviors. The latter may ultimately lead to the creation of a catalog of software development process smells. Conclusions. Our approach is agnostic to programming languages, geographic location or development practices, making it suitable for challenging contexts such as in modern global software development projects using either traditional IDEs or sophisticated low/no code platforms.Contexto. Projetos de software modernos requerem a correta alocação de recursos humanos, técnicos e financeiros. Frequentemente, os gestores de projeto tomam decisões suportadas apenas na sua própria experiência, intuição ou simplesmente espelhando atividades executadas por terceiros em contextos similares. As tentativas para evitar tais práticas baseiam-se em modelos que usam linhas de código, a complexidade ciclomática ou em estimativas de esforço, sendo estes tradicionalmente suportados por repositórios de software conhecidos por conterem várias limitações. Objetivo. Demonstrar a utilidade dos dados de processo e respetivos métodos de análise na melhoria das práticas de desenvolvimento de software, colocando o foco na análise da eficiência e revelando aspetos dos processos até então desconhecidos, contribuindo para a criação de novos modelos no contexto de análises avançadas para o desenvolvimento de software. Método. Explorámos os fragmentos de processo de vários programadores em três cenários diferentes, recolhendo eventos durante as suas sessões de desenvolvimento no IDE. Adicionalmente, usámos métodos de descoberta e análise de processos e texto no sentido de modelar o fluxo de trabalho dos programadores e as suas características individuais, respetivamente. Resultados. Descobrimos e modelámos com boa qualidade os processos dos programadores durante as suas sessões de trabalho, usando eventos provenientes dos seus IDEs. Revelámos factos desconhecidos sobre práticas de refabricação, construímos modelos de previsão da complexidade ciclomática usando apenas métricas de processo e criámos um método para caracterizar coerentemente os comportamentos dos programadores. Este último, pode levar à criação de um catálogo de boas/más práticas no processo de desenvolvimento de software. Conclusões. A nossa abordagem é agnóstica em termos de linguagens de programação, localização geográfica ou prática de desenvolvimento, tornando-a aplicável em contextos complexos tal como em projetos modernos de desenvolvimento global que utilizam tanto os IDEs tradicionais como as atuais e sofisticadas plataformas "low/no code"

    Including Everyone, Everywhere:Understanding Opportunities and Challenges of Geographic Gender-Inclusion in OSS

    Get PDF
    The gender gap is a significant concern facing the software industry as the development becomes more geographically distributed. Widely shared reports indicate that gender differences may be specific to each region. However, how complete can these reports be with little to no research reflective of the Open Source Software (OSS) process and communities software is now commonly developed in? Our study presents a multi-region geographical analysis of gender inclusion on GitHub. This mixed-methods approach includes quantitatively investigating differences in gender inclusion in projects across geographic regions and investigate these trends over time using data from contributions to 21,456 project repositories. We also qualitatively understand the unique experiences of developers contributing to these projects through a survey that is strategically targeted to developers in various regions worldwide. Our findings indicate that gender diversity is low across all parts of the world, with no substantial difference across regions. However, there has been statistically significant improvement in diversity worldwide since 2014, with certain regions such as Africa improving at faster pace. We also find that most motivations and barriers to contributions (e.g., lack of resources to contribute and poor working environment) were shared across regions, however, some insightful differences, such as how to make projects more inclusive, did arise. From these findings, we derive and present implications for tools that can foster inclusion in open source software communities and empower contributions from everyone, everywhere

    Representational Learning Approach for Predicting Developer Expertise Using Eye Movements

    Get PDF
    The thesis analyzes an existing eye-tracking dataset collected while software developers were solving bug fixing tasks in an open-source system. The analysis is performed using a representational learning approach namely, Multi-layer Perceptron (MLP). The novel aspect of the analysis is the introduction of a new feature engineering method based on the eye-tracking data. This is then used to predict developer expertise on the data. The dataset used in this thesis is inherently more complex because it is collected in a very dynamic environment i.e., the Eclipse IDE using an eye-tracking plugin, iTrace. Previous work in this area only worked on short code snippets that do not represent how developers usually program in a realistic setting. A comparative analysis between representational learning and non-representational learning (Support Vector Machine, Naive Bayes, Decision Tree, and Random Forest) is also presented. The results are obtained from an extensive set of experiments (with an 80/20 training and testing split) which show that representational learning (MLP) works well on our dataset reporting an average higher accuracy of 30% more for all tasks. Furthermore, a state-of-the-art method for feature engineering is proposed to extract features from the eye-tracking data. The average accuracy on all the tasks is 93.4% with a recall of 78.8% and an F1 score of 81.6%. We discuss the implications of these results on the future of automated prediction of developer expertise. Adviser: Bonita Shari
    • …
    corecore