18 research outputs found
Retrieving curated Stack Overflow Posts of similar project tasks
Software development depends on diverse technologies and methods and as a result, software development teams often handle issues in which team members are not experts. In order to address this lack of expertise, developers typically search for information on web-based Q&A sites such as Stack Overflow, a well-known place to find solutions to specific technology-related problems. Access to these web-based Q&A locations is currently not integrated into the software development environment, and since the associations between software development projects and the supporting sources of known solutions, usually referred to as knowledge, is not explicitly recorded, software developers often need to search for solutions to similar recurring issues multiple times. This lack of integration hinders the reuse of the knowledge obtained, besides not avoiding efforts of search and selection, curation, of this knowledge over and over again. This research aims at proposing a study regarding explicitly associating project elements (such as project tasks) to Stack Overflow posts that have already been curated by developers, and presents a study about Stack Overflow posts suggestions to developers based on similarity of project tasks.O desenvolvimento de software depende de diversas tecnologias e métodos e, como resultado, as equipes de desenvolvimento de software geralmente lidam com problemas em que não são especialistas. Para lidar com a falta de conhecimento, desenvolvedores normalmente procuram informações em sites de perguntas e respostas, como o Stack Overflow, um site usado para encontrar soluções para problemas específicos relacionados à tecnologia. O acesso a esses sites não é integrado ao ambiente de desenvolvimento de software e porque as associações entre os projetos de desenvolvimento de software e as fontes de suporte de soluções conhecidas não são explicitamente registradas. Com isso, desenvolvedores de software podem investir um esforço em procurar soluções para problemas semelhantes várias vezes. Essa falta de integração dificulta o reuso do conhecimento obtido, além de não evitar esforços de busca e seleção, a curadoria, repetidas vezes. Esta pesquisa tem como objetivo realizar um estudo sobre a associação explicita entre elementos do projeto (como tarefas de projeto) a publicações do Stack Overflow que já sofreram curadoria por desenvolvedores, e apresenta um estudo sobre sugestões de publicações do Stack Overflow a desenvolvedores com base na similaridade de tarefas de projeto
Towards Usable API Documentation
The learning and usage of an API is supported by documentation. Like source code, API documentation is itself a software product. Several research results show that bad design in API documentation can make the reuse of API features difficult. Indeed, similar to code smells, poorly designed API documentation can also exhibit 'smells'. Such documentation smells can be described as bad documentation styles that do not necessarily produce incorrect documentation but make the documentation difficult to understand and use. This thesis aims to enhance API documentation usability by addressing such documentation smells in three phases. In the first phase, we developed a catalog of five API documentation smells consulting literature on API documentation issues and online developer discussion. We validated their presence in the real world by creating a benchmark of 1K official Java API documentation units and conducting a survey of 21 developers. The developers confirmed that these smells hinder their productivity and called for automatic detection and fixing. In the second phase, we developed machine-learning models to detect the smells using the 1K benchmark, however, they performed poorly when evaluated on larger and more diverse documentation sources. We explored more advanced models; employed re-training and hyperparameter tuning to further improve the performance. Our best-performing model, RoBERTa, achieved F1-scores of 0.71-0.93 in detecting different smells. In the third phase, we first focused on evaluating the feasibility and impact of fixing various smells in the eyes of practitioners. Through a second survey of 30 practitioners, we found that fixing the lazy smell was perceived as the most feasible and impactful. However, there was no universal consensus on whether and how other smells can/should be fixed. Finally, we proposed a two-stage pipeline for fixing lazy documentation, involving additional textual description and documentation-specific code example generation. Our approach utilized a large language model, GPT- 3, to generate enhanced documentation based on non-lazy examples and to produce code examples. The generated code examples were refined iteratively until they were error-free. Our technique demonstrated a high success rate with a significant number of lazy documentation instances being fixed and error-free code examples being generated
Knowledge Graphs Evolution and Preservation -- A Technical Report from ISWS 2019
One of the grand challenges discussed during the Dagstuhl Seminar "Knowledge
Graphs: New Directions for Knowledge Representation on the Semantic Web" and
described in its report is that of a: "Public FAIR Knowledge Graph of
Everything: We increasingly see the creation of knowledge graphs that capture
information about the entirety of a class of entities. [...] This grand
challenge extends this further by asking if we can create a knowledge graph of
"everything" ranging from common sense concepts to location based entities.
This knowledge graph should be "open to the public" in a FAIR manner
democratizing this mass amount of knowledge." Although linked open data (LOD)
is one knowledge graph, it is the closest realisation (and probably the only
one) to a public FAIR Knowledge Graph (KG) of everything. Surely, LOD provides
a unique testbed for experimenting and evaluating research hypotheses on open
and FAIR KG. One of the most neglected FAIR issues about KGs is their ongoing
evolution and long term preservation. We want to investigate this problem, that
is to understand what preserving and supporting the evolution of KGs means and
how these problems can be addressed. Clearly, the problem can be approached
from different perspectives and may require the development of different
approaches, including new theories, ontologies, metrics, strategies,
procedures, etc. This document reports a collaborative effort performed by 9
teams of students, each guided by a senior researcher as their mentor,
attending the International Semantic Web Research School (ISWS 2019). Each team
provides a different perspective to the problem of knowledge graph evolution
substantiated by a set of research questions as the main subject of their
investigation. In addition, they provide their working definition for KG
preservation and evolution
Identifying reusable knowledge in developer instant messaging communication.
Context and background: Software engineering is a complex and knowledge-intensive
activity. Required knowledge (e.g., about technologies, frameworks, and design decisions)
changes fast and the knowledge needs of those who design, code, test and maintain
software constantly evolve. On the other hand, software developers use a wide range of
processes, practices and tools where developers explicitly and implicitly “produce” and
capture different types of knowledge.
Problem: Software developers use instant messaging tools (e.g., Slack, Microsoft
Teams and Gitter) to discuss development-related problems, share experiences and to
collaborate in projects. This communication takes place in chat rooms that accumulate
potentially relevant knowledge to be reused by other developers. Therefore, in this
research we analyze whether there is reusable knowledge in developer instant messaging
communication by exploring (a) which instant messaging platforms can be a source
of reusable knowledge, and (b) software engineering themes that represent the main
discussions of developers in instant messaging communication. We also analyze how
this reusable knowledge can be identified with the use of topic modeling (a natural
language processing technique to discover abstract topics in text) by (c) surveying the
literature on how topic modeling has been applied in software engineering research, and
(d) evaluating how topic models perform with developer instant messages.
Method: First, we conducted a Field Study through an exploratory case study and a
reflexive thematic analysis to check whether there is reusable knowledge in developer
instant messaging communication, and if so, what this knowledge (main themes discussed)
is. Then, we conducted a Sample Study to explore how reusable knowledge in
developer instant messaging communication can we identified. In this study, we applied
a literature survey and software repository mining (i.e. short text topic modeling).
Findings and contributions: We (a) developed a comparison framework for instant
messaging tools, (b) identified a map of the main themes discussed in chat rooms of an
instant messaging tool (Gitter, a platform used by software developers), (c) provided a
comprehensive literature review that offers insights and references on the use of topic
modeling in software engineering, and (d) provided an evaluation of the performance of
topic models applied to developer instant messages based on topic coherence metrics
and human judgment for topic quality
Foundation Models and Fair Use
Existing foundation models are trained on copyrighted material. Deploying
these models can pose both legal and ethical risks when data creators fail to
receive appropriate attribution or compensation. In the United States and
several other countries, copyrighted content may be used to build foundation
models without incurring liability due to the fair use doctrine. However, there
is a caveat: If the model produces output that is similar to copyrighted data,
particularly in scenarios that affect the market of that data, fair use may no
longer apply to the output of the model. In this work, we emphasize that fair
use is not guaranteed, and additional work may be necessary to keep model
development and deployment squarely in the realm of fair use. First, we survey
the potential risks of developing and deploying foundation models based on
copyrighted content. We review relevant U.S. case law, drawing parallels to
existing and potential applications for generating text, source code, and
visual art. Experiments confirm that popular foundation models can generate
content considerably similar to copyrighted material. Second, we discuss
technical mitigations that can help foundation models stay in line with fair
use. We argue that more research is needed to align mitigation strategies with
the current state of the law. Lastly, we suggest that the law and technical
mitigations should co-evolve. For example, coupled with other policy
mechanisms, the law could more explicitly consider safe harbors when strong
technical tools are used to mitigate infringement harms. This co-evolution may
help strike a balance between intellectual property and innovation, which
speaks to the original goal of fair use. But we emphasize that the strategies
we describe here are not a panacea and more work is needed to develop policies
that address the potential harms of foundation models
Crossroads of Cuisine
Crossroads of Cuisine provides a history of foods, and foodways in terms of exchanges taking place in Central Asia and in surrounding areas such as China, Korea or Iran during the last 5000 years, stressing the manner in which East and West, West and East grew together through food. It provides a discussion of geographical foundations, and an interlocking historical and cultural overview going down to the present day, with a comparative country by country survey of foods and recipes. An ethnographic photo essay embracing all parts of the book binds it all together, and helps make topics discussed vivid and approachable. The book is important for explaining key relationships that have not always been made clear in past scholarship