321,227 research outputs found
The Thermodynamics of Network Coding, and an Algorithmic Refinement of the Principle of Maximum Entropy
The principle of maximum entropy (Maxent) is often used to obtain prior
probability distributions as a method to obtain a Gibbs measure under some
restriction giving the probability that a system will be in a certain state
compared to the rest of the elements in the distribution. Because classical
entropy-based Maxent collapses cases confounding all distinct degrees of
randomness and pseudo-randomness, here we take into consideration the
generative mechanism of the systems considered in the ensemble to separate
objects that may comply with the principle under some restriction and whose
entropy is maximal but may be generated recursively from those that are
actually algorithmically random offering a refinement to classical Maxent. We
take advantage of a causal algorithmic calculus to derive a thermodynamic-like
result based on how difficult it is to reprogram a computer code. Using the
distinction between computable and algorithmic randomness we quantify the cost
in information loss associated with reprogramming. To illustrate this we apply
the algorithmic refinement to Maxent on graphs and introduce a Maximal
Algorithmic Randomness Preferential Attachment (MARPA) Algorithm, a
generalisation over previous approaches. We discuss practical implications of
evaluation of network randomness. Our analysis provides insight in that the
reprogrammability asymmetry appears to originate from a non-monotonic
relationship to algorithmic probability. Our analysis motivates further
analysis of the origin and consequences of the aforementioned asymmetries,
reprogrammability, and computation.Comment: 30 page
Software Verification and Graph Similarity for Automated Evaluation of Students' Assignments
In this paper we promote introducing software verification and control flow
graph similarity measurement in automated evaluation of students' programs. We
present a new grading framework that merges results obtained by combination of
these two approaches with results obtained by automated testing, leading to
improved quality and precision of automated grading. These two approaches are
also useful in providing a comprehensible feedback that can help students to
improve the quality of their programs We also present our corresponding tools
that are publicly available and open source. The tools are based on LLVM
low-level intermediate code representation, so they could be applied to a
number of programming languages. Experimental evaluation of the proposed
grading framework is performed on a corpus of university students' programs
written in programming language C. Results of the experiments show that
automatically generated grades are highly correlated with manually determined
grades suggesting that the presented tools can find real-world applications in
studying and grading
On the Effect of Semantically Enriched Context Models on Software Modularization
Many of the existing approaches for program comprehension rely on the
linguistic information found in source code, such as identifier names and
comments. Semantic clustering is one such technique for modularization of the
system that relies on the informal semantics of the program, encoded in the
vocabulary used in the source code. Treating the source code as a collection of
tokens loses the semantic information embedded within the identifiers. We try
to overcome this problem by introducing context models for source code
identifiers to obtain a semantic kernel, which can be used for both deriving
the topics that run through the system as well as their clustering. In the
first model, we abstract an identifier to its type representation and build on
this notion of context to construct contextual vector representation of the
source code. The second notion of context is defined based on the flow of data
between identifiers to represent a module as a dependency graph where the nodes
correspond to identifiers and the edges represent the data dependencies between
pairs of identifiers. We have applied our approach to 10 medium-sized open
source Java projects, and show that by introducing contexts for identifiers,
the quality of the modularization of the software systems is improved. Both of
the context models give results that are superior to the plain vector
representation of documents. In some cases, the authoritativeness of
decompositions is improved by 67%. Furthermore, a more detailed evaluation of
our approach on JEdit, an open source editor, demonstrates that inferred topics
through performing topic analysis on the contextual representations are more
meaningful compared to the plain representation of the documents. The proposed
approach in introducing a context model for source code identifiers paves the
way for building tools that support developers in program comprehension tasks
such as application and domain concept location, software modularization and
topic analysis
Approximate Computation and Implicit Regularization for Very Large-scale Data Analysis
Database theory and database practice are typically the domain of computer
scientists who adopt what may be termed an algorithmic perspective on their
data. This perspective is very different than the more statistical perspective
adopted by statisticians, scientific computers, machine learners, and other who
work on what may be broadly termed statistical data analysis. In this article,
I will address fundamental aspects of this algorithmic-statistical disconnect,
with an eye to bridging the gap between these two very different approaches. A
concept that lies at the heart of this disconnect is that of statistical
regularization, a notion that has to do with how robust is the output of an
algorithm to the noise properties of the input data. Although it is nearly
completely absent from computer science, which historically has taken the input
data as given and modeled algorithms discretely, regularization in one form or
another is central to nearly every application domain that applies algorithms
to noisy data. By using several case studies, I will illustrate, both
theoretically and empirically, the nonobvious fact that approximate
computation, in and of itself, can implicitly lead to statistical
regularization. This and other recent work suggests that, by exploiting in a
more principled way the statistical properties implicit in worst-case
algorithms, one can in many cases satisfy the bicriteria of having algorithms
that are scalable to very large-scale databases and that also have good
inferential or predictive properties.Comment: To appear in the Proceedings of the 2012 ACM Symposium on Principles
of Database Systems (PODS 2012
Amorphous slicing of extended finite state machines
Slicing is useful for many Software Engineering applications and has been widely studied for three decades, but there has been comparatively little work on slicing Extended Finite State Machines (EFSMs). This paper introduces a set of dependency based EFSM slicing algorithms and an accompanying tool. We demonstrate that our algorithms are suitable for dependence based slicing. We use our tool to conduct experiments on ten EFSMs, including benchmarks and industrial EFSMs. Ours is the first empirical study of dependence based program slicing for EFSMs. Compared to the only previously published dependence based algorithm, our average slice is smaller 40% of the time and larger only 10% of the time, with an average slice size of 35% for termination insensitive slicing
Quasiconvex Programming
We define quasiconvex programming, a form of generalized linear programming
in which one seeks the point minimizing the pointwise maximum of a collection
of quasiconvex functions. We survey algorithms for solving quasiconvex programs
either numerically or via generalizations of the dual simplex method from
linear programming, and describe varied applications of this geometric
optimization technique in meshing, scientific computation, information
visualization, automated algorithm analysis, and robust statistics.Comment: 33 pages, 14 figure
- …