14,877 research outputs found
Lessons Learned from Applying Social Network Analysis on an Industrial Free/Libre/Open Source Software Ecosystem
Many software projects are no longer done in-house by a single organization.
Instead, we are in a new age where software is developed by a networked
community of individuals and organizations, which base their relations to each
other on mutual interest. Paradoxically, recent research suggests that software
development can actually be jointly-developed by rival firms. For instance, it
is known that the mobile-device makers Apple and Samsung kept collaborating in
open source projects while running expensive patent wars in the court. Taking a
case study approach, we explore how rival firms collaborate in the open source
arena by employing a multi-method approach that combines qualitative analysis
of archival data (QA) with mining software repositories (MSR) and Social
Network Analysis (SNA). While exploring collaborative processes within the
OpenStack ecosystem, our research contributes to Software Engineering research
by exploring the role of groups, sub-communities and business models within a
high-networked open source ecosystem. Surprising results point out that
competition for the same revenue model (i.e., operating conflicting business
models) does not necessary affect collaboration within the ecosystem. Moreover,
while detecting the different sub-communities of the OpenStack community, we
found out that the expected social tendency of developers to work with
developers from same firm (i.e., homophily) did not hold within the OpenStack
ecosystem. Furthermore, while addressing a novel, complex and unexplored open
source case, this research also contributes to the management literature in
coopetition strategy and high-tech entrepreneurship with a rich description on
how heterogeneous actors within a high-networked ecosystem (involving
individuals, startups, established firms and public organizations)
joint-develop a complex infrastructure for big-data in the open source arena.Comment: As accepted by the Journal of Internet Services and Applications
(JISA
Stack Overflow in Github: Any Snippets There?
When programmers look for how to achieve certain programming tasks, Stack
Overflow is a popular destination in search engine results. Over the years,
Stack Overflow has accumulated an impressive knowledge base of snippets of code
that are amply documented. We are interested in studying how programmers use
these snippets of code in their projects. Can we find Stack Overflow snippets
in real projects? When snippets are used, is this copy literal or does it
suffer adaptations? And are these adaptations specializations required by the
idiosyncrasies of the target artifact, or are they motivated by specific
requirements of the programmer? The large-scale study presented on this paper
analyzes 909k non-fork Python projects hosted on Github, which contain 290M
function definitions, and 1.9M Python snippets captured in Stack Overflow.
Results are presented as quantitative analysis of block-level code cloning
intra and inter Stack Overflow and GitHub, and as an analysis of programming
behaviors through the qualitative analysis of our findings.Comment: 14th International Conference on Mining Software Repositories, 11
page
Simplifying Deep-Learning-Based Model for Code Search
To accelerate software development, developers frequently search and reuse
existing code snippets from a large-scale codebase, e.g., GitHub. Over the
years, researchers proposed many information retrieval (IR) based models for
code search, which match keywords in query with code text. But they fail to
connect the semantic gap between query and code. To conquer this challenge, Gu
et al. proposed a deep-learning-based model named DeepCS. It jointly embeds
method code and natural language description into a shared vector space, where
methods related to a natural language query are retrieved according to their
vector similarities. However, DeepCS' working process is complicated and
time-consuming. To overcome this issue, we proposed a simplified model
CodeMatcher that leverages the IR technique but maintains many features in
DeepCS. Generally, CodeMatcher combines query keywords with the original order,
performs a fuzzy search on name and body strings of methods, and returned the
best-matched methods with the longer sequence of used keywords. We verified its
effectiveness on a large-scale codebase with about 41k repositories.
Experimental results showed the simplified model CodeMatcher outperforms DeepCS
by 97% in terms of MRR (a widely used accuracy measure for code search), and it
is over 66 times faster than DeepCS. Besides, comparing with the
state-of-the-art IR-based model CodeHow, CodeMatcher also improves the MRR by
73%. We also observed that: fusing the advantages of IR-based and
deep-learning-based models is promising because they compensate with each other
by nature; improving the quality of method naming helps code search, since
method name plays an important role in connecting query and code
- …