Search CORE

1,670 research outputs found

Finding Frequent Subsequences in a Set of Texts

Author: Mancheron Alban
Symphor Jean-Émile
Publication venue: HAL CCSD
Publication date: 01/01/2007
Field of study

Given a set of strings, the Common Subsequence Automaton accepts all common subsequences of these strings. Such an automaton can be deduced from other automata like the Directed Acyclic Subsequence Graph or the Subsequence Automaton. In this paper, we introduce some new issues in text algorithm on the basis of Common Subsequences related problems. Firstly, we make an overview of different existing automata, focusing on their similarities and differences. Secondly, we present a new automaton, the Constrained Subsequence Automaton, which extends the Common Subsequence Automaton, by adding an integer

q

denoted quorum

HAL - Lille 3

INRIA a CCSD electronic archive server

Subsequence Automata with Default Transitions

Author: Bille Philip
Gørtz Inge Li
Skjoldjensen Frederik Rye
Publication venue
Publication date: 01/01/2016
Field of study

Let

S

be a string of length

n

with characters from an alphabet of size

\sigma

. The \emph{subsequence automaton} of

S

(often called the \emph{directed acyclic subsequence graph}) is the minimal deterministic finite automaton accepting all subsequences of

S

. A straightforward construction shows that the size (number of states and transitions) of the subsequence automaton is

O(n\sigma)

and that this bound is asymptotically optimal. In this paper, we consider subsequence automata with \emph{default transitions}, that is, special transitions to be taken only if none of the regular transitions match the current character, and which do not consume the current character. We show that with default transitions, much smaller subsequence automata are possible, and provide a full trade-off between the size of the automaton and the \emph{delay}, i.e., the maximum number of consecutive default transitions followed before consuming a character. Specifically, given any integer parameter

k

1 < k \leq \sigma

, we present a subsequence automaton with default transitions of size

O(nk\log_{k}\sigma)

and delay

O(\log_k \sigma)

. Hence, with

k = 2

we obtain an automaton of size

O(n \log \sigma)

and delay

O(\log \sigma)

. On the other extreme, with

k = \sigma

, we obtain an automaton of size

O(n \sigma)

and delay

O(1)

, thus matching the bound for the standard subsequence automaton construction. Finally, we generalize the result to multiple strings. The key component of our result is a novel hierarchical automata construction of independent interest.Comment: Corrected typo

arXiv.org e-Print Archive

Online Research Database In Technology

Improving legibility of natural deduction proofs is not trivial

Author: Pąk Karol
Publication venue: 'Logical Methods in Computer Science e.V.'
Publication date: 16/09/2014
Field of study

In formal proof checking environments such as Mizar it is not merely the validity of mathematical formulas that is evaluated in the process of adoption to the body of accepted formalizations, but also the readability of the proofs that witness validity. As in case of computer programs, such proof scripts may sometimes be more and sometimes be less readable. To better understand the notion of readability of formal proofs, and to assess and improve their readability, we propose in this paper a method of improving proof readability based on Behaghel's First Law of sentence structure. Our method maximizes the number of local references to the directly preceding statement in a proof linearisation. It is shown that our optimization method is NP-complete.Comment: 33 page

arXiv.org e-Print Archive

Episciences.org

Specifying ODP computational objects in Z

Author: B Meyer
GS Blair
J Spivey
J Spivey
KJ Turner
R Sinnott
W Brookes
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/1996
Field of study

The computational viewpoint contained within the Reference Model of Open Distributed Processing (RM-ODP) shows how collections of objects can be configured within a distributed system to enable interworking. It prescribes certain capabilities that such objects are expected to possess and structuring rules that apply to how these objects can be configured with one another. This paper highlights how the specification language Z can be used to formalise these capabilities and the associated structuring rules, thereby enabling specifications of ODP systems from the computational viewpoint to be achieved

Crossref

Stirling Online Research Repository (RIOXX)

Enlighten

Stirling Online Research Repository

University of Melbourne Institutional Repository

Improving the smoothed complexity of FLIP for max cut problems

Author: Bibak Ali
Carlson Charles
Chandrasekaran Karthekeyan
Publication venue
Publication date: 15/07/2018
Field of study

Finding locally optimal solutions for max-cut and max-

k

-cut are well-known PLS-complete problems. An instinctive approach to finding such a locally optimum solution is the FLIP method. Even though FLIP requires exponential time in worst-case instances, it tends to terminate quickly in practical instances. To explain this discrepancy, the run-time of FLIP has been studied in the smoothed complexity framework. Etscheid and R\"{o}glin showed that the smoothed complexity of FLIP for max-cut in arbitrary graphs is quasi-polynomial. Angel, Bubeck, Peres, and Wei showed that the smoothed complexity of FLIP for max-cut in complete graphs is

O(\phi^5n^{15.1})

, where

\phi

is an upper bound on the random edge-weight density and

n

is the number of vertices in the input graph. While Angel et al.'s result showed the first polynomial smoothed complexity, they also conjectured that their run-time bound is far from optimal. In this work, we make substantial progress towards improving the run-time bound. We prove that the smoothed complexity of FLIP in complete graphs is

O(\phi n^{7.83})

. Our results are based on a carefully chosen matrix whose rank captures the run-time of the method along with improved rank bounds for this matrix and an improved union bound based on this matrix. In addition, our techniques provide a general framework for analyzing FLIP in the smoothed framework. We illustrate this general framework by showing that the smoothed complexity of FLIP for max-

3

-cut in complete graphs is polynomial and for max-

k

-cut in arbitrary graphs is quasi-polynomial. We believe that our techniques should also be of interest towards addressing the smoothed complexity of FLIP for max-

k

-cut in complete graphs for larger constants

k

.Comment: 36 page

arXiv.org e-Print Archive

Crossref

Interpreting and using CPDAGs with background knowledge

Author: Kalisch Markus
Maathuis Maloes H.
Perković Emilija
Publication venue
Publication date: 01/01/2017
Field of study

We develop terminology and methods for working with maximally oriented partially directed acyclic graphs (maximal PDAGs). Maximal PDAGs arise from imposing restrictions on a Markov equivalence class of directed acyclic graphs, or equivalently on its graphical representation as a completed partially directed acyclic graph (CPDAG), for example when adding background knowledge about certain edge orientations. Although maximal PDAGs often arise in practice, causal methods have been mostly developed for CPDAGs. In this paper, we extend such methodology to maximal PDAGs. In particular, we develop methodology to read off possible ancestral relationships, we introduce a graphical criterion for covariate adjustment to estimate total causal effects, and we adapt the IDA and joint-IDA frameworks to estimate multi-sets of possible causal effects. We also present a simulation study that illustrates the gain in identifiability of total causal effects as the background knowledge increases. All methods are implemented in the R package pcalg.Comment: 17 pages, 6 figures, UAI 201

arXiv.org e-Print Archive

Repository for Publications and Research Data

Recommended from our members

Minimally supervised induction of morphology through bitexts

Author: Moon Taesun, Ph. D.
Publication venue
Publication date: 01/12/2008
Field of study

textA knowledge of morphology can be useful for many natural language processing systems. Thus, much effort has been expended in developing accurate computational tools for morphology that lemmatize, segment and generate new forms. The most powerful and accurate of these have been manually encoded, such endeavors being without exception expensive and time-consuming. There have been consequently many attempts to reduce this cost in the development of morphological systems through the development of unsupervised or minimally supervised algorithms and learning methods for acquisition of morphology. These efforts have yet to produce a tool that approaches the performance of manually encoded systems. Here, I present a strategy for dealing with morphological clustering and segmentation in a minimally supervised manner but one that will be more linguistically informed than previous unsupervised approaches. That is, this study will attempt to induce clusters of words from an unannotated text that are inflectional variants of each other. Then a set of inflectional suffixes by part-of-speech will be induced from these clusters. This level of detail is made possible by a method known as alignment and transfer (AT), among other names, an approach that uses aligned bitexts to transfer linguistic resources developed for one language–the source language–to another language–the target. This approach has a further advantage in that it allows a reduction in the amount of training data without a significant degradation in performance making it useful in applications targeted at data collected from endangered languages. In the current study, however, I use English as the source and German as the target for ease of evaluation and for certain typlogical properties of German. The two main tasks, that of clustering and segmentation, are approached as sequential tasks with the clustering informing the segmentation to allow for greater accuracy in morphological analysis. While the performance of these methods does not exceed the current roster of unsupervised or minimally supervised approaches to morphology acquisition, it attempts to integrate more learning methods than previous studies. Furthermore, it attempts to learn inflectional morphology as opposed to derivational morphology, which is a crucial distinction in linguistics.Linguistic

Texas ScholarWorks