543 research outputs found
Visualising the structure of document search results: A comparison of graph theoretic approaches
This is the post-print of the article - Copyright @ 2010 Sage PublicationsPrevious work has shown that distance-similarity visualisation or âspatialisationâ can provide a potentially useful context in which to browse the results of a query search, enabling the user to adopt a simple local foraging or âcluster growingâ strategy to navigate through the retrieved document set. However, faithfully mapping feature-space models to visual space can be problematic owing to their inherent high dimensionality and non-linearity. Conventional linear approaches to dimension reduction tend to fail at this kind of task, sacrificing local structural in order to preserve a globally optimal mapping. In this paper the clustering performance of a recently proposed algorithm called isometric feature mapping (Isomap), which deals with non-linearity by transforming dissimilarities into geodesic distances, is compared to that of non-metric multidimensional scaling (MDS). Various graph pruning methods, for geodesic distance estimation, are also compared. Results show that Isomap is significantly better at preserving local structural detail than MDS, suggesting it is better suited to cluster growing and other semantic navigation tasks. Moreover, it is shown that applying a minimum-cost graph pruning criterion can provide a parameter-free alternative to the traditional K-neighbour method, resulting in spatial clustering that is equivalent to or better than that achieved using an optimal-K criterion
AutoPruner: Transformer-Based Call Graph Pruning
Constructing a static call graph requires trade-offs between soundness and
precision. Program analysis techniques for constructing call graphs are
unfortunately usually imprecise. To address this problem, researchers have
recently proposed call graph pruning empowered by machine learning to
post-process call graphs constructed by static analysis. A machine learning
model is built to capture information from the call graph by extracting
structural features for use in a random forest classifier. It then removes
edges that are predicted to be false positives. Despite the improvements shown
by machine learning models, they are still limited as they do not consider the
source code semantics and thus often are not able to effectively distinguish
true and false positives. In this paper, we present a novel call graph pruning
technique, AutoPruner, for eliminating false positives in call graphs via both
statistical semantic and structural analysis. Given a call graph constructed by
traditional static analysis tools, AutoPruner takes a Transformer-based
approach to capture the semantic relationships between the caller and callee
functions associated with each edge in the call graph. To do so, AutoPruner
fine-tunes a model of code that was pre-trained on a large corpus to represent
source code based on descriptions of its semantics. Next, the model is used to
extract semantic features from the functions related to each edge in the call
graph. AutoPruner uses these semantic features together with the structural
features extracted from the call graph to classify each edge via a feed-forward
neural network. Our empirical evaluation on a benchmark dataset of real-world
programs shows that AutoPruner outperforms the state-of-the-art baselines,
improving on F-measure by up to 13% in identifying false-positive edges in a
static call graph.Comment: Accepted to ESEC/FSE 2022, Research Trac
Rate adaptive binary erasure quantization with dual fountain codes
In this contribution, duals of fountain codes are introduced and their use for lossy source compression is investigated. It is shown both theoretically and experimentally that the source coding dual of the binary erasure channel coding problem, binary erasure quantization, is solved at a nearly optimal rate with application of duals of LT and raptor codes by a belief propagation-like algorithm which amounts to a graph pruning procedure. Furthermore, this quantizing scheme is rate adaptive, i.e., its rate can be modified on-the-fly in order to adapt to the source distribution, very much like LT and raptor codes are able to adapt their rate to the erasure probability of a channel
The -matching problem on bipartite graphs
The -matching problem on bipartite graphs is studied with a local
algorithm. A -matching () on a bipartite graph is a set of matched
edges, in which each vertex of one type is adjacent to at most matched edge
and each vertex of the other type is adjacent to at most matched edges. The
-matching problem on a given bipartite graph concerns finding -matchings
with the maximum size. Our approach to this combinatorial optimization are of
two folds. From an algorithmic perspective, we adopt a local algorithm as a
linear approximate solver to find -matchings on general bipartite graphs,
whose basic component is a generalized version of the greedy leaf removal
procedure in graph theory. From an analytical perspective, in the case of
random bipartite graphs with the same size of two types of vertices, we develop
a mean-field theory for the percolation phenomenon underlying the local
algorithm, leading to a theoretical estimation of -matching sizes on
coreless graphs. We hope that our results can shed light on further study on
algorithms and computational complexity of the optimization problem.Comment: 15 pages, 3 figure
Informed RRT*: Optimal Sampling-based Path Planning Focused via Direct Sampling of an Admissible Ellipsoidal Heuristic
Rapidly-exploring random trees (RRTs) are popular in motion planning because
they find solutions efficiently to single-query problems. Optimal RRTs (RRT*s)
extend RRTs to the problem of finding the optimal solution, but in doing so
asymptotically find the optimal path from the initial state to every state in
the planning domain. This behaviour is not only inefficient but also
inconsistent with their single-query nature.
For problems seeking to minimize path length, the subset of states that can
improve a solution can be described by a prolate hyperspheroid. We show that
unless this subset is sampled directly, the probability of improving a solution
becomes arbitrarily small in large worlds or high state dimensions. In this
paper, we present an exact method to focus the search by directly sampling this
subset.
The advantages of the presented sampling technique are demonstrated with a
new algorithm, Informed RRT*. This method retains the same probabilistic
guarantees on completeness and optimality as RRT* while improving the
convergence rate and final solution quality. We present the algorithm as a
simple modification to RRT* that could be further extended by more advanced
path-planning algorithms. We show experimentally that it outperforms RRT* in
rate of convergence, final solution cost, and ability to find difficult
passages while demonstrating less dependence on the state dimension and range
of the planning problem.Comment: 8 pages, 11 figures. Videos available at
https://www.youtube.com/watch?v=d7dX5MvDYTc and
https://www.youtube.com/watch?v=nsl-5MZfwu
- âŚ