254,014 research outputs found
Pattern matching and pattern discovery algorithms for protein topologies
We describe algorithms for pattern matching and pattern
learning in TOPS diagrams (formal descriptions of protein topologies).
These problems can be reduced to checking for subgraph isomorphism
and finding maximal common subgraphs in a restricted class of ordered
graphs. We have developed a subgraph isomorphism algorithm for
ordered graphs, which performs well on the given set of data. The
maximal common subgraph problem then is solved by repeated
subgraph extension and checking for isomorphisms. Despite the
apparent inefficiency such approach gives an algorithm with time
complexity proportional to the number of graphs in the input set and is
still practical on the given set of data. As a result we obtain fast
methods which can be used for building a database of protein
topological motifs, and for the comparison of a given protein of known
secondary structure against a motif database
Efficient Pattern Matching in Python
Pattern matching is a powerful tool for symbolic computations. Applications
include term rewriting systems, as well as the manipulation of symbolic
expressions, abstract syntax trees, and XML and JSON data. It also allows for
an intuitive description of algorithms in the form of rewrite rules. We present
the open source Python module MatchPy, which offers functionality and
expressiveness similar to the pattern matching in Mathematica. In particular,
it includes syntactic pattern matching, as well as matching for commutative
and/or associative functions, sequence variables, and matching with
constraints. MatchPy uses new and improved algorithms to efficiently find
matches for large pattern sets by exploiting similarities between patterns. The
performance of MatchPy is investigated on several real-world problems
Designing optimal- and fast-on-average pattern matching algorithms
Given a pattern and a text , the speed of a pattern matching algorithm
over with regard to , is the ratio of the length of to the number of
text accesses performed to search into . We first propose a general
method for computing the limit of the expected speed of pattern matching
algorithms, with regard to , over iid texts. Next, we show how to determine
the greatest speed which can be achieved among a large class of algorithms,
altogether with an algorithm running this speed. Since the complexity of this
determination make it impossible to deal with patterns of length greater than
4, we propose a polynomial heuristic. Finally, our approaches are compared with
9 pre-existing pattern matching algorithms from both a theoretical and a
practical point of view, i.e. both in terms of limit expected speed on iid
texts, and in terms of observed average speed on real data. In all cases, the
pre-existing algorithms are outperformed
- …