17,960 research outputs found
C-HiLasso: A Collaborative Hierarchical Sparse Modeling Framework
Sparse modeling is a powerful framework for data analysis and processing.
Traditionally, encoding in this framework is performed by solving an
L1-regularized linear regression problem, commonly referred to as Lasso or
Basis Pursuit. In this work we combine the sparsity-inducing property of the
Lasso model at the individual feature level, with the block-sparsity property
of the Group Lasso model, where sparse groups of features are jointly encoded,
obtaining a sparsity pattern hierarchically structured. This results in the
Hierarchical Lasso (HiLasso), which shows important practical modeling
advantages. We then extend this approach to the collaborative case, where a set
of simultaneously coded signals share the same sparsity pattern at the higher
(group) level, but not necessarily at the lower (inside the group) level,
obtaining the collaborative HiLasso model (C-HiLasso). Such signals then share
the same active groups, or classes, but not necessarily the same active set.
This model is very well suited for applications such as source identification
and separation. An efficient optimization procedure, which guarantees
convergence to the global optimum, is developed for these new models. The
underlying presentation of the new framework and optimization approach is
complemented with experimental examples and theoretical results regarding
recovery guarantees for the proposed models
Evaluating Maintainability Prejudices with a Large-Scale Study of Open-Source Projects
Exaggeration or context changes can render maintainability experience into
prejudice. For example, JavaScript is often seen as least elegant language and
hence of lowest maintainability. Such prejudice should not guide decisions
without prior empirical validation. We formulated 10 hypotheses about
maintainability based on prejudices and test them in a large set of open-source
projects (6,897 GitHub repositories, 402 million lines, 5 programming
languages). We operationalize maintainability with five static analysis
metrics. We found that JavaScript code is not worse than other code, Java code
shows higher maintainability than C# code and C code has longer methods than
other code. The quality of interface documentation is better in Java code than
in other code. Code developed by teams is not of higher and large code bases
not of lower maintainability. Projects with high maintainability are not more
popular or more often forked. Overall, most hypotheses are not supported by
open-source data.Comment: 20 page
A Decade of Code Comment Quality Assessment: A Systematic Literature Review
Code comments are important artifacts in software systems and play a
paramount role in many software engineering (SE) tasks related to maintenance
and program comprehension. However, while it is widely accepted that high
quality matters in code comments just as it matters in source code, assessing
comment quality in practice is still an open problem. First and foremost, there
is no unique definition of quality when it comes to evaluating code comments.
The few existing studies on this topic rather focus on specific attributes of
quality that can be easily quantified and measured. Existing techniques and
corresponding tools may also focus on comments bound to a specific programming
language, and may only deal with comments with specific scopes and clear goals
(e.g., Javadoc comments at the method level, or in-body comments describing
TODOs to be addressed). In this paper, we present a Systematic Literature
Review (SLR) of the last decade of research in SE to answer the following
research questions: (i) What types of comments do researchers focus on when
assessing comment quality? (ii) What quality attributes (QAs) do they consider?
(iii) Which tools and techniques do they use to assess comment quality?, and
(iv) How do they evaluate their studies on comment quality assessment in
general? Our evaluation, based on the analysis of 2353 papers and the actual
review of 47 relevant ones, shows that (i) most studies and techniques focus on
comments in Java code, thus may not be generalizable to other languages, and
(ii) the analyzed studies focus on four main QAs of a total of 21 QAs
identified in the literature, with a clear predominance of checking consistency
between comments and the code. We observe that researchers rely on manual
assessment and specific heuristics rather than the automated assessment of the
comment quality attributes
- …