581 research outputs found
An Introduction to Programming for Bioscientists: A Python-based Primer
Computing has revolutionized the biological sciences over the past several
decades, such that virtually all contemporary research in the biosciences
utilizes computer programs. The computational advances have come on many
fronts, spurred by fundamental developments in hardware, software, and
algorithms. These advances have influenced, and even engendered, a phenomenal
array of bioscience fields, including molecular evolution and bioinformatics;
genome-, proteome-, transcriptome- and metabolome-wide experimental studies;
structural genomics; and atomistic simulations of cellular-scale molecular
assemblies as large as ribosomes and intact viruses. In short, much of
post-genomic biology is increasingly becoming a form of computational biology.
The ability to design and write computer programs is among the most
indispensable skills that a modern researcher can cultivate. Python has become
a popular programming language in the biosciences, largely because (i) its
straightforward semantics and clean syntax make it a readily accessible first
language; (ii) it is expressive and well-suited to object-oriented programming,
as well as other modern paradigms; and (iii) the many available libraries and
third-party toolkits extend the functionality of the core language into
virtually every biological domain (sequence and structure analyses,
phylogenomics, workflow management systems, etc.). This primer offers a basic
introduction to coding, via Python, and it includes concrete examples and
exercises to illustrate the language's usage and capabilities; the main text
culminates with a final project in structural bioinformatics. A suite of
Supplemental Chapters is also provided. Starting with basic concepts, such as
that of a 'variable', the Chapters methodically advance the reader to the point
of writing a graphical user interface to compute the Hamming distance between
two DNA sequences.Comment: 65 pages total, including 45 pages text, 3 figures, 4 tables,
numerous exercises, and 19 pages of Supporting Information; currently in
press at PLOS Computational Biolog
Personalized Approaches to Supporting the Learning Needs of Lifelong Professional Learners
Advanced learning technology research has begun to take on a complex challenge: supporting
lifelong learning. Professional learning is an essential subset of lifelong learning that is more
tractable than the full lifelong learning challenge. Professionals do not always have access to
professional teachers to provide input to the problems they encounter, so they rely on their
peers in an online learning community (OLC) to help meet their learning needs. Supporting
professional learners within an OLC is a difficult problem as the learning needs of each
learner continuously evolve, often in different ways from other learners. Hence, there is a
need to provide personalized support to learners adapted to their individual learning needs.
This thesis explores personalized approaches for detecting the unperceived learning needs
and meeting the expressed learning needs of learners in an OLC. The experimental test bed
for this research is Stack Overflow (SO), an OLC used by software professionals. To date,
seven experiments have been carried out mining SO peer-peer interaction data. Knowing that
question-answerers play a huge role in meeting the learning needs of the question-askers, the
first experiment aimed to detect the learning needs of the answerers. Results from experiment
1 show that reputable answerers themselves demonstrate unperceived learning needs as
revealed by a decline in quality answers in SO. Of course, a decline in quality answers could
impact the help-seeking experience of question-askers; hence experiment 2 sought to
understand the effects of the help-seeking experience of question-askers on their enthusiasm
to continuously participate within the OLC. As expected, negative help-seeking experiences
of question-askers had a large impact on their propensity to seek further help within the OLC.
To improve the help-seeking experience of question-askers, it is important to proactively
detect the learning needs of the question-answerers before they provide poor quality answers.
Thus, in experiment 3 the goal was to predict whether a question-answerer would give a poor
answer to a question based on their past peer-peer interactions. Under various assumptions,
accuracies ranging from 84.57% to 94.54% were achieved. Next, experiment 4 attempted to
detect the unperceived learning needs of question-askers even before they are aware of such
needs. Using information about a learner’s interactions over a 5-month period, a prediction
was made as to what they would be asking about during the next month, achieving recall and
precision values of 0.93 and 0.81. Knowing the learning needs of question-askers early
creates an opportunity to predict prospective answerers who could provide timely and quality
answers to their question. The goal of experiment 5 was thus to predict the actual answerers
for questions based only on information known at the time the question was asked. The
iv
success rate was at best 63.15%, which would only be marginally useful to inform a real-life
peer recommender system. Thus, experiment 6 explored new measures in predicting the
answerers, boosting the success rate to 89.64%. Of course, a peer recommender system
would be deemed to be especially useful if it can provide prompt interventions, especially to
get answers to questions that would otherwise not be answered quickly. To this end,
experiment 7 attempted to predict the question-askers whose questions would be answered
late or even remain unanswered, and a success rate of 68.4% was achieved.
Results from these experiments suggest that modelling the activities of learners in an OLC is
key in providing support to them to meet their learning needs. Perhaps, the most important
lesson learned in this research is that lightweight approaches can be developed to help meet
the evolving learning needs of professionals, even as knowledge changes within a profession.
Metrics based on the experiments above are exactly such lightweight methodologies and
could be the basis for useful tools to support professional learners
FDDetector: A Tool for Deduplicating Features in Software Product Lines
Duplication is one of the model defects that affect software product lines during their evolution. Many approaches have been proposed to deal with duplication in code level while duplication in features hasn’t received big interest in literature. At the aim of reducing maintenance cost and improving product quality in an early stage of a product line, we have proposed in previous work a tool support based on a conceptual framework. The main objective of this tool called FDDetector is to detect and correct duplication in product line models. In this paper, we recall the motivation behind creating a solution for feature deduplication and we present progress done in the design and implementation of FDDetector
Software development process mining: discovery, conformance checking and enhancement
Context. Modern software projects require the proper allocation of human, technical and
financial resources. Very often, project managers make decisions supported only by their personal
experience, intuition or simply by mirroring activities performed by others in similar
contexts. Most attempts to avoid such practices use models based on lines of code, cyclomatic
complexity or effort estimators, thus commonly supported by software repositories which are
known to contain several flaws.
Objective. Demonstrate the usefulness of process data and mining methods to enhance the
software development practices, by assessing efficiency and unveil unknown process insights,
thus contributing to the creation of novel models within the software development analytics
realm.
Method. We mined the development process fragments of multiple developers in three
different scenarios by collecting Integrated Development Environment (IDE) events during their
development sessions. Furthermore, we used process and text mining to discovery developers’
workflows and their fingerprints, respectively.
Results. We discovered and modeled with good quality developers’ processes during programming
sessions based on events extracted from their IDEs. We unveiled insights from
coding practices in distinct refactoring tasks, built accurate software complexity forecast models
based only on process metrics and setup a method for characterizing coherently developers’
behaviors. The latter may ultimately lead to the creation of a catalog of software development
process smells.
Conclusions. Our approach is agnostic to programming languages, geographic location or
development practices, making it suitable for challenging contexts such as in modern global
software development projects using either traditional IDEs or sophisticated low/no code platforms.Contexto. Projetos de software modernos requerem a correta alocação de recursos humanos,
técnicos e financeiros. Frequentemente, os gestores de projeto tomam decisões suportadas
apenas na sua própria experiência, intuição ou simplesmente espelhando atividades executadas
por terceiros em contextos similares. As tentativas para evitar tais práticas baseiam-se em
modelos que usam linhas de código, a complexidade ciclomática ou em estimativas de esforço,
sendo estes tradicionalmente suportados por repositórios de software conhecidos por conterem
várias limitações.
Objetivo. Demonstrar a utilidade dos dados de processo e respetivos métodos de análise na
melhoria das práticas de desenvolvimento de software, colocando o foco na análise da eficiência
e revelando aspetos dos processos até então desconhecidos, contribuindo para a criação de
novos modelos no contexto de análises avançadas para o desenvolvimento de software.
Método. Explorámos os fragmentos de processo de vários programadores em três cenários
diferentes, recolhendo eventos durante as suas sessões de desenvolvimento no IDE. Adicionalmente,
usámos métodos de descoberta e análise de processos e texto no sentido de modelar o
fluxo de trabalho dos programadores e as suas caracterÃsticas individuais, respetivamente.
Resultados. Descobrimos e modelámos com boa qualidade os processos dos programadores
durante as suas sessões de trabalho, usando eventos provenientes dos seus IDEs. Revelámos factos
desconhecidos sobre práticas de refabricação, construÃmos modelos de previsão da complexidade
ciclomática usando apenas métricas de processo e criámos um método para caracterizar
coerentemente os comportamentos dos programadores. Este último, pode levar à criação de um
catálogo de boas/más práticas no processo de desenvolvimento de software.
Conclusões. A nossa abordagem é agnóstica em termos de linguagens de programação,
localização geográfica ou prática de desenvolvimento, tornando-a aplicável em contextos complexos
tal como em projetos modernos de desenvolvimento global que utilizam tanto os IDEs
tradicionais como as atuais e sofisticadas plataformas "low/no code"
Including Everyone, Everywhere:Understanding Opportunities and Challenges of Geographic Gender-Inclusion in OSS
The gender gap is a significant concern facing the software industry as the development becomes more geographically distributed. Widely shared reports indicate that gender differences may be specific to each region. However, how complete can these reports be with little to no research reflective of the Open Source Software (OSS) process and communities software is now commonly developed in? Our study presents a multi-region geographical analysis of gender inclusion on GitHub. This mixed-methods approach includes quantitatively investigating differences in gender inclusion in projects across geographic regions and investigate these trends over time using data from contributions to 21,456 project repositories. We also qualitatively understand the unique experiences of developers contributing to these projects through a survey that is strategically targeted to developers in various regions worldwide. Our findings indicate that gender diversity is low across all parts of the world, with no substantial difference across regions. However, there has been statistically significant improvement in diversity worldwide since 2014, with certain regions such as Africa improving at faster pace. We also find that most motivations and barriers to contributions (e.g., lack of resources to contribute and poor working environment) were shared across regions, however, some insightful differences, such as how to make projects more inclusive, did arise. From these findings, we derive and present implications for tools that can foster inclusion in open source software communities and empower contributions from everyone, everywhere
Representational Learning Approach for Predicting Developer Expertise Using Eye Movements
The thesis analyzes an existing eye-tracking dataset collected while software developers were solving bug fixing tasks in an open-source system. The analysis is performed using a representational learning approach namely, Multi-layer Perceptron (MLP). The novel aspect of the analysis is the introduction of a new feature engineering method based on the eye-tracking data. This is then used to predict developer expertise on the data. The dataset used in this thesis is inherently more complex because it is collected in a very dynamic environment i.e., the Eclipse IDE using an eye-tracking plugin, iTrace. Previous work in this area only worked on short code snippets that do not represent how developers usually program in a realistic setting.
A comparative analysis between representational learning and non-representational learning (Support Vector Machine, Naive Bayes, Decision Tree, and Random Forest) is also presented. The results are obtained from an extensive set of experiments (with an 80/20 training and testing split) which show that representational learning (MLP) works well on our dataset reporting an average higher accuracy of 30% more for all tasks. Furthermore, a state-of-the-art method for feature engineering is proposed to extract features from the eye-tracking data. The average accuracy on all the tasks is 93.4% with a recall of 78.8% and an F1 score of 81.6%. We discuss the implications of these results on the future of automated prediction of developer expertise.
Adviser: Bonita Shari
- …