14 research outputs found

    Features that Predict the Acceptability of Java and JavaScript Answers on Stack Overflow

    Full text link
    Context: It is not uncommon for a new team member to join an existing Agile software development team, even after development has started. This new team member faces a number of challenges before they are integrated into the team and can contribute productively to team progress. Ideally, each newcomer should be supported in this transition through an effective team onboarding program, although prior evidence suggests that this is challenging for many organisations. Objective: We seek to understand how Agile teams address the challenge of team onboarding in order to inform future onboarding design. Method: We conducted an interview survey of eleven participants from eight organisations to investigate what onboarding activities are common across Agile software development teams. We also identify common goals of onboarding from a synthesis of literature. A repertory grid instrument is used to map the contributions of onboarding techniques to onboarding goals. Results: Our study reveals that a broad range of team onboarding techniques, both formal and informal, are used in practice. It also shows that particular techniques that have high contributions to a given goal or set of goals. Conclusions: In presenting a set of onboarding goals to consider and an evidence-based mechanism for selecting techniques to achieve the desired goals it is expected that this study will contribute to better-informed onboarding design and planning. An increase in practitioner awareness of the options for supporting new team members is also an expected outcome.Comment: Conference, 11 pages, 3 figures, 2 table

    Mining Knowledge Bases for Question & Answers Websites

    Get PDF
    We studied the problem of searching answers for questions on a Question-and-Answer Website from knowledge bases. A number of research efforts had been developed using Stack Overflow data, which is available for the public. Surprisingly, only a few papers tried to improve the search for better answers. Furthermore, current approaches for searching a Question-and-Answer Website are usually limited to the question database, which is usually the website own content. We showed it is feasible to use knowledge bases as sources for answers. We implemented both vector-space and topic-space representations for our datasets and compared these distinct techniques. Finally, we proposed a hybrid ranking approach that took advantage of a machine-learned classifier to incorporate the tag information into the ranking and showed that it was able to improve the retrieval performance

    Determinants of quality, latency, and amount of Stack Overflow answers about recent Android APIs.

    Get PDF
    Stack Overflow is a popular crowdsourced question and answer website for programming-related issues. It is an invaluable resource for software developers; on average, questions posted there get answered in minutes to an hour. Questions about well established topics, e.g., the coercion operator in C++, or the difference between canonical and class names in Java, get asked often in one form or another, and answered very quickly. On the other hand, questions on previously unseen or niche topics take a while to get a good answer. This is particularly the case with questions about current updates to or the introduction of new application programming interfaces (APIs). In a hyper-competitive online market, getting good answers to current programming questions sooner could increase the chances of an app getting released and used. So, can developers anyhow, e.g., hasten the speed to good answers to questions about new APIs? Here, we empirically study Stack Overflow questions pertaining to new Android APIs and their associated answers. We contrast the interest in these questions, their answer quality, and timeliness of their answers to questions about old APIs. We find that Stack Overflow answerers in general prioritize with respect to currentness: questions about new APIs do get more answers, but good quality answers take longer. We also find that incentives in terms of question bounties, if used appropriately, can significantly shorten the time and increase answer quality. Interestingly, no operationalization of bounty amount shows significance in our models. In practice, our findings confirm the value of bounties in enhancing expert participation. In addition, they show that the Stack Overflow style of crowdsourcing, for all its glory in providing answers about established programming knowledge, is less effective with new API questions

    Using Crowd-Based Software Repositories to Better Understand Developer-User Interactions

    Get PDF
    Software development is a complex process. To serve the final software product to the end user, developers need to rely on a variety of software artifacts throughout the development process. The term software repository used to denote only containers of source code such as version control systems; more recent usage has generalized the concept to include a plethora of software development artifact kinds and their related meta-data. Broadly speaking, software repositories include version control systems, technical documentation, issue trackers, question and answer sites, distribution information, etc. The software repositories can be based on a specific project (e.g., bug tracker for Firefox), or be crowd-sourced (e.g., questions and answers on technical Q&A websites). Crowd-based software artifacts are created as by-products of developer-user interactions which are sometimes referred to as communication channels. In this thesis, we investigate three distinct crowd-based software repositories that follow different models of developer-user interactions. We believe through a better understanding of the crowd-based software repositories, we can identify challenges in software development and provide insights to improve the software development process. In our first study, we investigate Stack Overflow. It is the largest collection of programming related questions and answers. On Stack Overflow, developers interact with other developers to create crowd-sourced knowledge in the form of questions and answers. The results of the interactions (i.e., the question threads) become valuable information to the entire developer community. Prior research on Stack Overflow tacitly assume that questions receives answers directly on the platform and no need of interaction is required during the process. Meanwhile, the platform allows attaching comments to questions which forms discussions of the question. Our study found that question discussions occur for 59.2% of questions on Stack Overflow. For discussed and solved questions on Stack Overflow, 80.6% of the questions have the discussion begin before the accepted answer is submitted. The results of our study show the importance and nuances of interactions in technical Q&A. We then study dotfiles, a set of publicly shared user-specific configuration files for software tools. There is a culture of sharing dotfiles within the developer community, where the idea is to learn from other developers’ dotfiles and share your variants. The interaction of dotfiles sharing can be viewed as developers sources information from other developers, adapt the information to their own needs, and share their adaptations back to the community. Our study on dotfiles suggests that is a common practice among developers to share dotfiles where 25.8% of the most stared users on GitHub have a dotfiles repository. We provide a taxonomy of the commonly tracked dotfiles and a qualitative study on the commits in dotfiles repositories. We also leveraged the state-of-the-art time-series clustering technique (K-shape) to identify code churn pattern for dotfile edits. This study is the first step towards understanding the practices of maintaining and sharing dotfiles. Finally, we study app stores, the platforms that distribute software products and contain many non-technical attributes (e.g., ratings and reviews) of software products. Three major stakeholders interacts with each other in app stores: the app store owner who governs the operation of the app store; developers who publish applications on the app store; and users who browse and download applications in the app store. App stores often provide means of interaction between all three actors (e.g., app reviews, store policy) and sometimes interactions with in the same actor (e.g., developer forum). We surveyed existing app stores to extract key features from app store operation. We then labeled a representative set of app store collected by web queries. K-means is applied to the labeled app stores to detect natural groupings of app stores. We observed a diverse set of app stores through the process. Instead of a single model that describes all app stores, fundamentally, our observations show that app stores operates differently. This study provide insights in understanding how app stores can affect software development. In summary, we investigated software repositories containing software artifacts created from different developer-user interactions. These software repositories are essential for software development in providing referencing information (i.e., Stack Overflow), improving development productivity (i.e., dotfiles), and help distributing the software products to end users (i.e., app stores)

    Towards Semantic Clone Detection, Benchmarking, and Evaluation

    Get PDF
    Developers copy and paste their code to speed up the development process. Sometimes, they copy code from other systems or look up code online to solve a complex problem. Developers reuse copied code with or without modifications. The resulting similar or identical code fragments are called code clones. Sometimes clones are unintentionally written when a developer implements the same or similar functionality. Even when the resulting code fragments are not textually similar but implement the same functionality they are still considered to be clones and are classified as semantic clones. Semantic clones are defined as code fragments that perform the exact same computation and are implemented using different syntax. Software cloning research indicates that code clones exist in all software systems; on average, 5% to 20% of software code is cloned. Due to the potential impact of clones, whether positive or negative, it is essential to locate, track, and manage clones in the source code. Considerable research has been conducted on all types of code clones, including clone detection, analysis, management, and evaluation. Despite the great interest in code clones, there has been considerably less work conducted on semantic clones. As described in this thesis, I advance the state-of-the-art in semantic clone research in several ways. First, I conducted an empirical study to investigate the status of code cloning in and across open-source game systems and the effectiveness of different normalization, filtering, and transformation techniques for detecting semantic clones. Second, I developed an approach to detect clones across .NET programming languages using an intermediate language. Third, I developed a technique using an intermediate language and an ontology to detect semantic clones. Fourth, I mined Stack Overflow answers to build a semantic code clone benchmark that represents real semantic code clones in four programming languages, C, C#, Java, and Python. Fifth, I defined a comprehensive taxonomy that identifies semantic clone types. Finally, I implemented an injection framework that uses the benchmark to compare and evaluate semantic code clone detectors by automatically measuring recall

    Finding high-quality grey literature for use as evidence in software engineering research.

    Get PDF
    Background: Software engineering research often uses practitioners as a source of evidence in their studies. This evidence is usually gathered through empirical methods such as surveys, interviews and ethnographic research. The web has brought with it the emergence of the social programmer. Software practitioners are publishing their opinions online through blog articles, discussion boards and Q&A sites. Mining these online sources of information could provide a new source of evidence which complements traditional evidence sources. There are benefits to the adoption of grey literature in software engineering research (such as bridging the gap between the state–of–art where research typically operates and the state–of–practice), but also significant challenges. The main challenge is finding grey literature which is of high– quality to the researcher given the vast volume of grey literature available on the web. The thesis defines the quality of grey literature in terms of its relevance to the research being undertaken and its credibility. The thesis also focuses on a particular type of grey literature that has been written by soft- ware practitioners. A typical example of such grey literature is blog articles, which are specifically used as examples throughout the thesis. Objectives: There are two main objectives to the thesis; to investigate the problems of finding high–quality grey literature, and to make progress in addressing those problems. In working towards these objectives, we investigate our main research question, how can researchers more effectively and efficiently search for and then select the higher–quality blog–like content relevant to their research? We divide this question into twelve sub–questions, and more formally define what we mean by ‘blog–like content.’ Method: To achieve the objectives, we first investigate how software engineering researchers define and assess quality when working with grey literature; and then work towards a methodology and also a tool–suite which can semi–automate the identification and the quality assessment of relevant grey literature for use as evidence in the researchers study. To investigate how software engineering researchers define and assess quality, we first conduct a literature review of credibility assessment to gather a set of credibility criteria. We then validate those criteria through a survey of software engineering researchers. This gives us an overall model of credibility assessment within software engineering research. We next investigate the empirical challenges of measuring quality and develop a methodology which has been adapted from the case survey methodology and aims to address the problems and challenges identified. Along with the methodology is a suggested tool–suite which is intended to help researchers in automating the application of a subset of the credibility model. The tool–suite developed supports the methodology by, for example, automating tasks in order to scale the analysis. The use of the methodology and tool–suite is then demonstrated through three examples. These examples include a partial evaluation of the methodology and tool–suite. Results: Our literature review of credibility assessment identified a set of criteria that have been used in previous research. However, we also found a lack of definitions for both the criteria and, more generally, the term credibility. Credibility assessment is a difficult and subjective task that is particular to each individual. Research has addressed this subjectivity by conducting studies that look at how particular user groups assess credibility e.g. pensioners, university students, the visually impaired, however none of the studies reviewed software engineering researchers. Informed by the literature review, we conducted a survey which we believe is the first study on the credibility assessment of software engineering researchers. The results of the survey are a more refined set of criteria, but also a set that many (approximately 60%) of the survey participants believed generalise to other types of media (both practitioner–generated and researcher–generated). We found that there are significant challenges in using blog–like content as evidence in research. For example, there are the challenges of identifying the high–quality content from the vast quantity available on the web, and then creating methods of analysis which are scalable to handle that vast quantity. In addressing these challenges, we produce: a set of heuristics which can help in finding higher–quality results when searching using traditional search engines, a validated list of reasoning markers that can aid in assessing the amount of reasoning within a document, a review of the current state of the experience mining domain, and a modifiable classification schema for classifying the source of URLs. With credibility assessment being such a subjective task, there can be no one–size–fits–all method to automating quality assessment. Instead, our methodology is intended to be used as a framework in which the researcher using it can swap out and adapt the criteria that we assess for their own criteria based on the context of the study being undertaken and the personal preference of the researcher. We find from the survey that there are a variety of attitude’s towards using grey literature in software engineering research and not all respondents view the use of grey literature as evidence in the way that we do (i.e. as having the same benefits and threats as other traditional methods of evidence gathering). Conclusion: The work presented in this thesis makes significant progress towards answering our research question and the thesis provides a foundation for future research on automated quality assessment and credibility. Adoption of the tools and methodology presented in this thesis can help more effectively and efficiently search for and select higher–quality blog–like content, but there is a need for more substantial research on the credibility assessment of software engineering researchers, and a more extensive credibility model to be produced. This can be achieved through replicating the literature review systematically, accepting more studies for analysis, and by conducting a more extensive survey with a greater number, and more representative selection, of survey respondents. With a more robust credibility model, we can have more confidence in the criteria that we choose to include within the methodology and tools, as well as automating the assessment of more criteria. Throughout the re- search, there has been a challenge in aggregating the results after assessing each criterion. Future research should look towards the adoption of machine learning methods to aid with this aggregation. We believe that the criteria and measures used by our tools can serve as features to machine learning classifiers which will be able to more accurately assess quality. However, be- fore such work is to take place, there is a need for annotated data–sets to be developed

    Helping as participation in an open online community : an exploratory study

    Get PDF
    The study explores the issues of participation, and to an extent, learning in an open online community of independent game developers, GameSalad.com. GameSalad is a firm-hosted online support forum for a desktop application of the same name. It is geared to provide members and users with a platform for sharing of information pertaining to their game development, and a place to seek and provide help. It is a large community with over 114,000 registered members (as of March 2015), with an average of 106,000 monthly active unique users, and a high degree of activity such as the posting of tutorials and tips, sharing game development progress, and announcing the launch of a new game. However, the majority of the interactions on the forum are concerned with seeking and providing help. This study focuses on issues around community, participation, and learning within online networks and is underpinned by a concern for participatory and social experiential perspectives on learning. In order to explore participation, an exploratory mixed-method approach was used. This involved a three-phase data collection procedure. First, observation of interaction in the community was carried out (noting the pattern of threads opened, weekly leader boards, resources, and general practices) coupled with document analysis to identify threads that reflected high participation or were deemed beneficial by interviewees. Second, online survey of 35 items including five demographic items, twenty forced 2-point semantic differential scale items, and ten 5-point Likert scale items was carried out, to measure members’ perceptions of the community and identity (n = 110 responses). Third, semi-structured sequential interviews were carried out with 21 volunteer interviewees online, using the forum’s own private messaging system over a period from August 2014 to March 2015. Although originally conceived as an overarching study of online participation, the study became focused on the more active members of the community, and on the question as to why and how some members of online communities appear to take on helping roles. The findings from both survey and interviews showed a strong sense of community among active members, and that active members saw their identity in the online community as an extension of their off-line self. Although open to all members, participants who volunteered to be interviewed tended to be among the more active members and many had adopted ‘caretaker’ or helper role in the community. The interviews showed that giving help was motivated by a mix of extrinsic and intrinsic elements, in particular, helpers were aware of the need to sustain the community and in many cases felt an obligation to offer help as a return or ‘pay it forward’ for the help they had received in the past. They were motivated by community mindedness, empathy, self-confidence and sense of identity. The giving of help depends on ‘mood’, this mood is generated not only when helpers feel they have the available time and relevant expertise in order to help, but also when those asking for help have asked in an appropriate manner and provided sufficient contextualisation. In part, learning in the community is seen as a social exchange, and members put a value on the discussions they saw useful. However, this study reveals some of the problems experienced by the company behind the community, tensions among some members of the community, as well as issues pertaining to shared knowledge and artefacts. This study improves our understanding of community of practice, the provision of help, the motivation for helping, as well as the dynamics of participation in an open online community. It gives insight into the sustainability of online community by showing the motivation, strategies for, and consequences of helping. It also gives insight into how informal learning is embedded in social interactions and perceived value. The study is not a unique case but it is one of an underreported area, a highly participative community. Methodologically, this study offers mixed method approach with a strong focus on qualitative data and analysis methods, with an innovative way of triangulating data

    Discussing writing:peer feedback on writing essays in an online forum for learners of English

    Get PDF
    This case study investigated feedback, interaction, and knowledge creation in an asynchronous discussion forum in which learners of English provided peer feedback on short argument essays for the IELTS test, a gatekeeper English exam used for immigration or university entrance. Over eleven months, a small but active group of intermediate and advanced learners from many countries changed participation from seeking feedback to giving complex macro-level feedback on each other’s writing, changing their perceptions of peer editing and improving their own writing, while a much larger group engaged primarily in lurking. The research was exploratory at first, since it was not known whether learners would join or provide feedback, but as members joined, peer feedback loops and varying patterns of interaction emerged. To investigate these processes, both content and structure were examined, with forum posts examined using thematic units as the unit of analysis, and server logs providing structural data such as membership duration and posting patterns. Semi-structured interviews were carried out to gain further insight into member perceptions. Feedback was viewed as a process with benefits for both givers and receivers, rather than as a product given by an expert. Lurking was a key form of participation for both active and less-active members, while changes in roles and participation were mainly associated with longer membership and more feedback. Because of the informal learning setting and high turnover, models from outside educational settings were used as theoretical lenses: organizational citizenship (Bateman & Organ, 1983) and organizational commitment (Meyer & Allen, 1991), to investigate roles and behavior; and Nonaka’s SECI framework (1994), to examine knowledge conversion and creation. Applying citizenship behavior to online settings posed problems due to the difficulty of distinguishing between discretionary or supra-role behavior and the core intent of a knowledge community. In contrast, a modified SECI framework appeared to be a useful metaphor, emphasizing peer feedback as socially-constructed knowledge

    Market Engineering

    Get PDF
    This open access book provides a broad range of insights on market engineering and information management. It covers topics like auctions, stock markets, electricity markets, the sharing economy, information and emotions in markets, smart decision-making in cities and other systems, and methodological approaches to conceptual modeling and taxonomy development. Overall, this book is a source of inspiration for everybody working on the vision of advancing the science of engineering markets and managing information for contributing to a bright, sustainable, digital world. Markets are powerful and extremely efficient mechanisms for coordinating individuals’ and organizations’ behavior in a complex, networked economy. Thus, designing, monitoring, and regulating markets is an essential task of today’s society. This task does not only derive from a purely economic point of view. Leveraging market forces can also help to tackle pressing social and environmental challenges. Moreover, markets process, generate, and reveal information. This information is a production factor and a valuable economic asset. In an increasingly digital world, it is more essential than ever to understand the life cycle of information from its creation and distribution to its use. Both markets and the flow of information should not arbitrarily emerge and develop based on individual, profit-driven actors. Instead, they should be engineered to serve best the whole society’s goals. This motivation drives the research fields of market engineering and information management. With this book, the editors and authors honor Professor Dr. Christof Weinhardt for his enormous and ongoing contribution to market engineering and information management research and practice. It was presented to him on the occasion of his sixtieth birthday in April 2021. Thank you very much, Christof, for so many years of cooperation, support, inspiration, and friendship
    corecore