25 research outputs found

    A New Model for Increasing Information Access and Literacy in the Global South

    Get PDF
    Rapid advances in technology infrastructure and increasing access to constant connectivity are paving the way for more innovative methods to support knowledge sharing in Global South countries. We present a model for knowledge sharing that is open, interactive, draws on diverse expertise and experience, and builds a searchable information repository that can be used by multiple communities and organizations. In addition, we envision that this model can be an effective and low-cost stepping stone to improved information literacy across the developing world.ye

    Question Quality in Community Question Answering Forums:A survey

    Get PDF
    Community Question Answering websites (CQA) offer a new opportunity for users to provide, search and share knowledge. Although the idea of receiving a direct, targeted response to a question sounds very attractive, the quality of the question itself can have an important effect on the likelihood of getting useful answers. High quality questions improve the CQA experience and therefore it is essential for CQA forums to better understand what characterizes questions that are more appealing for the forum community. In this survey, we review existing research on question quality in CQA websites. We discuss the possible measures of question quality and the question features that have been shown to influence question quality

    "Medium" LMs of Code in the Era of LLMs: Lessons From StackOverflow

    Full text link
    Large pre-trained neural language models have brought immense progress to both NLP and software engineering. Models in OpenAI's GPT series now dwarf Google's BERT and Meta's RoBERTa, which previously set new benchmarks on a wide range of NLP applications. These models are trained on massive corpora of heterogeneous data from web crawls, which enables them to learn general language patterns and semantic relationships. However, the largest models are both expensive to train and deploy and are often closed-source, so we lack access to their data and design decisions. We argue that this trend towards large, general-purpose models should be complemented with single-purpose, more modestly sized pre-trained models. In this work, we take StackOverflow (SO) as a domain example in which large volumes of rich aligned code and text data is available. We adopt standard practices for pre-training large language models, including using a very large context size (2,048 tokens), batch size (0.5M tokens) and training set (27B tokens), coupled with a powerful toolkit (Megatron-LM), to train two models: SOBertBase, with 109M parameters, and SOBertLarge with 762M parameters, at a budget of just $187\$187 and $800\$800 each. We compare the performance of our models with both the previous SOTA model trained on SO data exclusively as well general-purpose BERT models and OpenAI's ChatGPT on four SO-specific downstream tasks - question quality prediction, closed question prediction, named entity recognition and obsoletion prediction (a new task we introduce). Not only do our models consistently outperform all baselines, the smaller model is often sufficient for strong results. Both models are released to the public. These results demonstrate that pre-training both extensively and properly on in-domain data can yield a powerful and affordable alternative to leveraging closed-source general-purpose models

    Understanding Architecture Erosion: The Practitioners' Perceptive

    Get PDF
    As software systems evolve, their architecture is meant to adapt accordingly by following the changes in requirements, the environment, and the implementation. However, in practice, the evolving system often deviates from the architecture, causing severe consequences to system maintenance and evolution. This phenomenon of architecture erosion has been studied extensively in research, but not yet been examined from the point of view of developers. In this exploratory study, we look into how developers perceive the notion of architecture erosion, its causes and consequences, as well as tools and practices to identify and control architecture erosion. To this end, we searched through several popular online developer communities for collecting data of discussions related to architecture erosion. Besides, we identified developers involved in these discussions and conducted a survey with 10 participants and held interviews with 4 participants. Our findings show that: (1) developers either focus on the structural manifestation of architecture erosion or on its effect on run-time qualities, maintenance and evolution; (2) alongside technical factors, architecture erosion is caused to a large extent by non-technical factors; (3) despite the lack of dedicated tools for detecting architecture erosion, developers usually identify erosion through a number of symptoms; and (4) there are effective measures that can help to alleviate the impact of architecture erosion.Comment: The 29th IEEE/ACM International Conference on Program Comprehension (ICPC