1,097 research outputs found

    DFA Minimization Algorithms in Map-Reduce

    Get PDF
    Map-Reduce has been a highly popular parallel-distributed programming model. In this thesis, we study the problem of minimizing Deterministic Finite State Automata (DFA). We focus our attention on two well-known (serial) algorithms, namely the algorithms of Moore (1956) and of Hopcroft (1971). The central cost-parameter in Map-Reduce is that of communication cost i.e., the amount of data that has to be communicated between the processes. Using techniques from Communication Complexity we derive an O(kn log{n}) lower bound and O(kn^3 log{n}) upper bound for the problem, where n is the number of states in the DFA to be minimized,and k is the size of its alphabet. We then develop Map-Reduce versions of both Moore's and Hopcroft's algorithms, and show that their communication cost is O(kn^2 (log {n} + log {k})). Both methods have been implemented and tested on large DFA, with 131,072 states. The experiments verify our theoretical analysis, and also reveal that Hopcroft's algorithm -- considered superior in the sequential framework -- is very sensitive to skew in the topology of the graph of the DFA, whereas Moore's algorithm handles skew without major efficiency loss

    Finite Automata Algorithms in Map-Reduce

    Get PDF
    In this thesis the intersection of several large nondeterministic finite automata (NFA's) as well as minimization of a large deterministic finite automaton (DFA) in map-reduce are studied. We have derived a lower bound on replication rate for computing NFA intersections and provided three concrete algorithms for the problem. Our investigation of the replication rate for each of all three algorithms shows where each algorithm could be applied through detailed experiments on large datasets of finite automata. Denoting n the number of states in DFA A, we propose an algorithm to minimize A in n map-reduce rounds in the worst-case. Our experiments, however, indicate that the number of rounds, in practice, is much smaller than n for all DFA's we examined. In other words, this algorithm converges in d iterations by computing the equivalence classes of each state, where d is the diameter of the input DFA

    CAIR: Using Formal Languages to Study Routing, Leaking, and Interception in BGP

    Full text link
    The Internet routing protocol BGP expresses topological reachability and policy-based decisions simultaneously in path vectors. A complete view on the Internet backbone routing is given by the collection of all valid routes, which is infeasible to obtain due to information hiding of BGP, the lack of omnipresent collection points, and data complexity. Commonly, graph-based data models are used to represent the Internet topology from a given set of BGP routing tables but fall short of explaining policy contexts. As a consequence, routing anomalies such as route leaks and interception attacks cannot be explained with graphs. In this paper, we use formal languages to represent the global routing system in a rigorous model. Our CAIR framework translates BGP announcements into a finite route language that allows for the incremental construction of minimal route automata. CAIR preserves route diversity, is highly efficient, and well-suited to monitor BGP path changes in real-time. We formally derive implementable search patterns for route leaks and interception attacks. In contrast to the state-of-the-art, we can detect these incidents. In practical experiments, we analyze public BGP data over the last seven years

    Hyper-Minimization for Deterministic Weighted Tree Automata

    Full text link
    Hyper-minimization is a state reduction technique that allows a finite change in the semantics. The theory for hyper-minimization of deterministic weighted tree automata is provided. The presence of weights slightly complicates the situation in comparison to the unweighted case. In addition, the first hyper-minimization algorithm for deterministic weighted tree automata, weighted over commutative semifields, is provided together with some implementation remarks that enable an efficient implementation. In fact, the same run-time O(m log n) as in the unweighted case is obtained, where m is the size of the deterministic weighted tree automaton and n is its number of states.Comment: In Proceedings AFL 2014, arXiv:1405.527

    Pre-Reduction Graph Products: Hardnesses of Properly Learning DFAs and Approximating EDP on DAGs

    Full text link
    The study of graph products is a major research topic and typically concerns the term f(G∗H)f(G*H), e.g., to show that f(G∗H)=f(G)f(H)f(G*H)=f(G)f(H). In this paper, we study graph products in a non-standard form f(R[G∗H]f(R[G*H] where RR is a "reduction", a transformation of any graph into an instance of an intended optimization problem. We resolve some open problems as applications. (1) A tight n1−ϵn^{1-\epsilon}-approximation hardness for the minimum consistent deterministic finite automaton (DFA) problem, where nn is the sample size. Due to Board and Pitt [Theoretical Computer Science 1992], this implies the hardness of properly learning DFAs assuming NP≠RPNP\neq RP (the weakest possible assumption). (2) A tight n1/2−ϵn^{1/2-\epsilon} hardness for the edge-disjoint paths (EDP) problem on directed acyclic graphs (DAGs), where nn denotes the number of vertices. (3) A tight hardness of packing vertex-disjoint kk-cycles for large kk. (4) An alternative (and perhaps simpler) proof for the hardness of properly learning DNF, CNF and intersection of halfspaces [Alekhnovich et al., FOCS 2004 and J. Comput.Syst.Sci. 2008]
    • …
    corecore