622 research outputs found
Two algorithms for LCS Consecutive Suffix Alignment
AbstractThe problem of aligning two sequences A and B to determine their similarity is one of the fundamental problems in pattern matching. A challenging, basic variation of the sequence similarity problem is the incremental string comparison problem, denoted Consecutive Suffix Alignment, which is, given two strings A and B, to compute the alignment solution of each suffix of A versus B.Here, we present two solutions to the Consecutive Suffix Alignment Problem under the LCS (Longest Common Subsequence) metric, where the LCS metric measures the subsequence of maximal length common to A and B. The first solution is an O(nL) time and space algorithm for constant alphabets, where the size of the compared strings is O(n) and L⩽n denotes the size of the LCS of A and B.The second solution is an O(nL+nlog|Σ|) time and O(n) space algorithm for general alphabets, where Σ denotes the alphabet of the compared strings
On the Effect of Semantically Enriched Context Models on Software Modularization
Many of the existing approaches for program comprehension rely on the
linguistic information found in source code, such as identifier names and
comments. Semantic clustering is one such technique for modularization of the
system that relies on the informal semantics of the program, encoded in the
vocabulary used in the source code. Treating the source code as a collection of
tokens loses the semantic information embedded within the identifiers. We try
to overcome this problem by introducing context models for source code
identifiers to obtain a semantic kernel, which can be used for both deriving
the topics that run through the system as well as their clustering. In the
first model, we abstract an identifier to its type representation and build on
this notion of context to construct contextual vector representation of the
source code. The second notion of context is defined based on the flow of data
between identifiers to represent a module as a dependency graph where the nodes
correspond to identifiers and the edges represent the data dependencies between
pairs of identifiers. We have applied our approach to 10 medium-sized open
source Java projects, and show that by introducing contexts for identifiers,
the quality of the modularization of the software systems is improved. Both of
the context models give results that are superior to the plain vector
representation of documents. In some cases, the authoritativeness of
decompositions is improved by 67%. Furthermore, a more detailed evaluation of
our approach on JEdit, an open source editor, demonstrates that inferred topics
through performing topic analysis on the contextual representations are more
meaningful compared to the plain representation of the documents. The proposed
approach in introducing a context model for source code identifiers paves the
way for building tools that support developers in program comprehension tasks
such as application and domain concept location, software modularization and
topic analysis
The Longest Common Subsequence via Generalized Suffix Trees
Given two strings S1 and S 2, finding the longest common subsequence (LCS) is a classical problem in computer science. Many algorithms have been proposed to find the longest common subsequence between two strings. The most common and widely used method is the dynamic programming approach, which runs in quadratic time and takes quadratic space. Other algorithms have been introduced later to solve the LCS problem in less time and space. In this work, we present a new algorithm to find the longest common subsequence using the generalized suffix tree and directed acyclic graph.;The Generalized suffix tree (GST) is the combined suffix tree for a set of strings {lcub}S1, S 2, ..., Sn{rcub}. Both the suffix tree and the generalized suffix tree can be calculated in linear time and linear space. One application for generalized suffix tree is to find the longest common substring between two strings. But finding the longest common subsequence is not straight forward using the generalized suffix tree. Here we describe how we can use the GST to find the common substrings between two strings and introduce a new approach to calculate the longest common subsequence (LCS) from the common substrings. This method takes a different view at the LCS problem, shading more light at novel applications of the LCS. We also show how this method can motivate the development of new compression techniques for genome resequencing data
Fitness Proportionate Niching: Harnessing The Power Of Evolutionary Algorithms For Evolving Cooperative Populations And Dynamic Clustering
Evolutionary algorithms work on the notion of best fit will survive criteria. This makes evolving a cooperative and diverse population in a competing environment via evolutionary algorithms a challenging task. Analogies to species interactions in natural ecological systems have been used to develop methods for maintaining diversity in a population. One such area that mimics species interactions in natural systems is the use of niching. Niching methods extend the application of EAs to areas that seeks to embrace multiple solutions to a given problem. The conventional fitness sharing technique has limitations when the multimodal fitness landscape has unequal peaks. Higher peaks are strong population attractors. And this technique suffers from the curse of population size in attempting to discover all optimum points. The use of high population size makes the technique computationally complex, especially when there is a big jump in fitness values of the peaks. This work introduces a novel bio-inspired niching technique, termed Fitness Proportionate Niching (FPN), based on the analogy of finite resource model where individuals share the resource of a niche in proportion to their actual fitness. FPN makes the search algorithm unbiased to the variation in fitness values of the peaks and hence mitigates the drawbacks of conventional fitness sharing. FPN extends the global search ability of Genetic Algorithms (GAs) for evolving hierarchical cooperation in genetics-based machine learning and dynamic clustering. To this end, this work introduces FPN based resource sharing which leads to the formation of a viable default hierarchy in classifiers for the first time. It results in the co-evolution of default and exception rules, which lead to a robust and concise model description. The work also explores the feasibility and success of FPN for dynamic clustering. Unlike most other clustering techniques, FPN based clustering does not require any a priori information on the distribution of the data
On Algorithms and Complexity for Sets with Cardinality Constraints
Typestate systems ensure many desirable properties of imperative programs,
including initialization of object fields and correct use of stateful library
interfaces. Abstract sets with cardinality constraints naturally generalize
typestate properties: relationships between the typestates of objects can be
expressed as subset and disjointness relations on sets, and elements of sets
can be represented as sets of cardinality one. Motivated by these applications,
this paper presents new algorithms and new complexity results for constraints
on sets and their cardinalities. We study several classes of constraints and
demonstrate a trade-off between their expressive power and their complexity.
Our first result concerns a quantifier-free fragment of Boolean Algebra with
Presburger Arithmetic. We give a nondeterministic polynomial-time algorithm for
reducing the satisfiability of sets with symbolic cardinalities to constraints
on constant cardinalities, and give a polynomial-space algorithm for the
resulting problem.
In a quest for more efficient fragments, we identify several subclasses of
sets with cardinality constraints whose satisfiability is NP-hard. Finally, we
identify a class of constraints that has polynomial-time satisfiability and
entailment problems and can serve as a foundation for efficient program
analysis.Comment: 20 pages. 12 figure
A Configuration Management System for Software Product Lines
Software product line engineering (SPLE) is a methodology for developing a family of software products in a particular domain by systematic reuse of shared code in order to improve product quality and reduce development time and cost. Currently, there are no software configuration management (SCM) tools that support software product line evolution. Conventional SCM tools are designed to support single product development.
The use of conventional SCM tools forces developers to treat a software product line as a single software project by introducing new programming language constructs or using conditional compilation. We propose a research conguration management prototype called Molhado SPL that is designed specifically to support the evolution of software product lines. Molhado SPL addresses the evolution problem at the configuration level instead of at the code level. We studied the type of operations needed to support the evolution of software product lines and proposed a versioning model and eight cases of change propagation.
Molhado SPL supports independent evolution of core assets and products, the sharing of code and the tracking relationships between products and shared code, and the eight cases of change propagation. The Molhado SPL consists of four layers with each layer providing a different type of service. At the heart of Molhado SPL are the versioning model, component object, shared component object, and project objects that allow for independent evolution of products and shared artifacts, for sharing, and for supporting change propagation. Furthermore,they allow product specific changes to shared code without interfering with the core asset that is shared. Products can also introduce product specific assets that only exist in that product.
In order to for Molhado SPL to support product line, we implemented XML merging, feature model editing and debugging, and version-aware XML documents. To support merging of XML documents, we implemented a 3-way XML document merging algorithm that uses versioned data structures, change detection, and node identity. To support software product line derivation or modeling of software product line, we implemented support for feature model including editing and debugging. Finally, we created the version-aware XML document framework to support collaborative editing of XML documents without requiring a version repository. The version history is embedded in the documents using XML namespaces, so that the documents remain valid under the XML specification. The version-aware XML framework can also be used to support the exporting of documents from Molhado SPL repository to be edit outside and import back the change history made to the document.
We evaluated Molhado SPL with two product lines: a document product line and a the graph data structures product line. This evaluation showed that Molhado SPL supports independently evolution of products and core assets and the eight change propagation cases. We did not evaluate MolhadoSPL in terms of scalability or usability.
The main contributions of this dissertation research are: 1) Molhado SPL that supports the evolution of product lines, 2) a fast 3-way XML merge algorithm, 3) a version-aware XML document framework, and 4) a feature model editor and debugger
Cutset Sampling for Bayesian Networks
The paper presents a new sampling methodology for Bayesian networks that
samples only a subset of variables and applies exact inference to the rest.
Cutset sampling is a network structure-exploiting application of the
Rao-Blackwellisation principle to sampling in Bayesian networks. It improves
convergence by exploiting memory-based inference algorithms. It can also be
viewed as an anytime approximation of the exact cutset-conditioning algorithm
developed by Pearl. Cutset sampling can be implemented efficiently when the
sampled variables constitute a loop-cutset of the Bayesian network and, more
generally, when the induced width of the networks graph conditioned on the
observed sampled variables is bounded by a constant w. We demonstrate
empirically the benefit of this scheme on a range of benchmarks
Particpants' Proceedings on the Workshop: Types for Program Analysis
As a satellite meeting of the TAPSOFT'95 conference we organized a small workshop on program analysis. The title of the workshop, ``Types for Program Analysis´´, was motivated by the recent trend of letting the presentation and development of program analyses be influenced by annotated type systems, effect systems, and more general logical systems. The contents of the workshop was intended to be somewhat broader; consequently the call for participation listed the following areas of interest:- specification of specific analyses for programming languages,- the role of effects, polymorphism, conjunction/disjunction types, dependent types etc.in specification of analyses,- algorithmic tools and methods for solving general classes of type-based analyses,- the role of unification, semi-unification etc. in implementations of analyses,- proof techniques for establishing the safety of analyses,- relationship to other approaches to program analysis, including abstract interpretation and constraint-based methods,- exploitation of analysis results in program optimization and implementation.The submissions were not formally refereed; however each submission was read by several members of the program committee and received detailed comments and suggestions for improvement. We expect that several of the papers, in slightly revised forms, will show up at future conferences. The workshop took place at Aarhus University on May 26 and May 27 and lasted two half days
- …