36 research outputs found
Simplifying Deep-Learning-Based Model for Code Search
To accelerate software development, developers frequently search and reuse
existing code snippets from a large-scale codebase, e.g., GitHub. Over the
years, researchers proposed many information retrieval (IR) based models for
code search, which match keywords in query with code text. But they fail to
connect the semantic gap between query and code. To conquer this challenge, Gu
et al. proposed a deep-learning-based model named DeepCS. It jointly embeds
method code and natural language description into a shared vector space, where
methods related to a natural language query are retrieved according to their
vector similarities. However, DeepCS' working process is complicated and
time-consuming. To overcome this issue, we proposed a simplified model
CodeMatcher that leverages the IR technique but maintains many features in
DeepCS. Generally, CodeMatcher combines query keywords with the original order,
performs a fuzzy search on name and body strings of methods, and returned the
best-matched methods with the longer sequence of used keywords. We verified its
effectiveness on a large-scale codebase with about 41k repositories.
Experimental results showed the simplified model CodeMatcher outperforms DeepCS
by 97% in terms of MRR (a widely used accuracy measure for code search), and it
is over 66 times faster than DeepCS. Besides, comparing with the
state-of-the-art IR-based model CodeHow, CodeMatcher also improves the MRR by
73%. We also observed that: fusing the advantages of IR-based and
deep-learning-based models is promising because they compensate with each other
by nature; improving the quality of method naming helps code search, since
method name plays an important role in connecting query and code
CoNCRA: A Convolutional Neural Network Code Retrieval Approach
Software developers routinely search for code using general-purpose search
engines. However, these search engines cannot find code semantically unless it
has an accompanying description. We propose a technique for semantic code
search: A Convolutional Neural Network approach to code retrieval (CoNCRA). Our
technique aims to find the code snippet that most closely matches the
developer's intent, expressed in natural language. We evaluated our approach's
efficacy on a dataset composed of questions and code snippets collected from
Stack Overflow. Our preliminary results showed that our technique, which
prioritizes local interactions (words nearby), improved the state-of-the-art
(SOTA) by 5% on average, retrieving the most relevant code snippets in the top
3 (three) positions by almost 80% of the time. Therefore, our technique is
promising and can improve the efficacy of semantic code retrieval
Assessing Practitioner Beliefs about Software Defect Prediction
Just because software developers say they believe in "X", that does not
necessarily mean that "X" is true. As shown here, there exist numerous beliefs
listed in the recent Software Engineering literature which are only supported
by small portions of the available data. Hence we ask what is the source of
this disconnect between beliefs and evidence?. To answer this question we look
for evidence for ten beliefs within 300,000+ changes seen in dozens of
open-source projects. Some of those beliefs had strong support across all the
projects; specifically, "A commit that involves more added and removed lines is
more bug-prone" and "Files with fewer lines contributed by their owners (who
contribute most changes) are bug-prone". Most of the widely-held beliefs
studied are only sporadically supported in the data; i.e. large effects can
appear in project data and then disappear in subsequent releases. Such sporadic
support explains why developers believe things that were relevant to their
prior work, but not necessarily their current work. Our conclusion will be that
we need to change the nature of the debate with Software Engineering.
Specifically, while it is important to report the effects that hold right now,
it is also important to report on what effects change over time.Comment: 9 pages, 3 Figures, 4 Tables, ICSE SEIP 202
Mapping the Information Journey: Unveiling the Documentation Experience of Software Developers in China
This research delves into understanding the behaviors and characteristics of
Chinese developers in relation to their use of technical documentation, which
is crucial for creating high-quality developer documentation. We conducted
interviews with 25 software developers and surveyed 177 participants, using the
preliminary interview findings to inform the survey design. Our approach
encompassed traditional user research methods, including persona and user
journey mapping, to develop typical personas and information journeys based on
the qualitative data from the interviews and quantitative results from the
survey. Our results revealed distinct characteristics and differences between
junior and senior developers in terms of their use of technical documentation,
broadly categorized into personality traits, learning habits, and working
habits. We observed that the information journey of both groups typically
encompasses four stages: Exploration, Understanding, Practice, and Application.
Consequently, we created two distinct personas and information journey maps to
represent these two developer groups. Our findings highlight that developers
prioritize the content, organization, and maintenance aspects of documentation.
In conclusion, we recommend organizing documentation content to align with
developers' information journeys, tailoring documentation to meet the needs of
developers at various levels, and focusing on the content, organization, and
maintenance aspects of documentation.Comment: 27 page