Search CORE

62,023 research outputs found

Recommended from our members

Distributed Morphology as a regular relation

Author: Edmiston Daniel
Ermolaeva Marina
Publication venue: ScholarWorks@UMass Amherst
Publication date: 01/01/2018
Field of study

This research reorganizes the Distributed Morphology (DM) framework to work over strings. Typically, DM operates on binary trees, with the syntax-morphology interface implicitly treated as a tree-transducer. We contend that using (binary) trees is overpowered, predicting patterns unattested in natural language. Assuming the standard Y-model, DM operating on trees presumes that the flattening of the derivation for PF takes place post-morphology. We however flatten the structure above the morphological module, between the syntax and morphology. Restricting the morphological component to working on strings, we correctly predict that morphology can be modeled with regular string languages

ScholarWorks@UMass Amherst

Computational Perspectives on Phonological Constituency and Recursion

Author: Yu Kristine M.
Publication venue: 'Universitat Autonoma de Barcelona'
Publication date: 01/01/2021
Field of study

Whether or not phonology has recursion is often conflated with whether or not phonology has strings or trees as data structures. Taking a computational perspective from formal language theory and focusing on how phonological strings and trees are built, we disentangle these issues. We show that even considering the boundedness of words and utterances in physical realization and the lack of observable examples of potential recursive embedding of phonological constituents beyond a few layers, recursion is a natural consequence of expressing generalization in phonological grammars for strings and trees. While prosodically-conditioned phonological patterns can be represented using grammars for strings, e.g., with bracketed string representations, we show how grammars for trees provide a natural way to express these patterns and provide insight into the kinds of analyses that phonologists have proposed for them.Que la fonologia mostri o no recursivitat sovint va lligat al fet que tingui o no cadenes o arbres en l'estructura de les seves dades. A partir de la perspectiva computacional de la teoria formal del llenguatge i tenint en compte com es construeixen les cadenes i els arbres fonològics, mirem de destriar aquestes qüestions. Mostrem que, fins i tot tenint en compte la limitació de paraules i enunciats en la realització física i la manca d'exemples observables d'incorporació recursiva potencial de constituents fonològics més enllà d'unes poques capes, la recursivitat és una conseqüència natural de l'expressió de generalitzacions fonològiques per a cadenes i arbres. Tot i que els patrons fonològics condicionats prosòdicament es poden representar utilitzant gramàtiques per a cadenes, per exemple amb representacions amb claudàtors, mostrem com les gramàtiques amb arbres proporcionen una manera natural d'expressar aquests patrons i proporcionen coneixement rellevant sobre els tipus d'anàlisis d'aquests patrons que s'han proposat des de la fonologia

Diposit Digital de Documents de la UAB

String Indexing for Patterns with Wildcards

Author: A. Tam
B. Chazelle
D. Harel
D. Tsur
G. Chen
G. Landau
G. Landau
G. Navarro
H.L. Chan
K. Hofmann
L.P. Coelho
M. Lewenstein
M. Maas
M.L. Fredman
P. Bille
P. Bille
P. Clifford
T.-W. Lam
Z. Galil
Publication venue
Publication date: 01/01/2012
Field of study

We consider the problem of indexing a string

t

of length

n

to report the occurrences of a query pattern

p

containing

m

characters and

j

wildcards. Let

occ

be the number of occurrences of

p

t

, and

\sigma

the size of the alphabet. We obtain the following results. - A linear space index with query time

O(m+\sigma^j \log \log n + occ)

. This significantly improves the previously best known linear space index by Lam et al. [ISAAC 2007], which requires query time

\Theta(jn)

in the worst case. - An index with query time

O(m+j+occ)

using space

O(\sigma^{k^2} n \log^k \log n)

, where

k

is the maximum number of wildcards allowed in the pattern. This is the first non-trivial bound with this query time. - A time-space trade-off, generalizing the index by Cole et al. [STOC 2004]. We also show that these indexes can be generalized to allow variable length gaps in the pattern. Our results are obtained using a novel combination of well-known and new techniques, which could be of independent interest

arXiv.org e-Print Archive

Crossref

Online Research Database In Technology

Prospects and limitations of full-text index structures in genome analysis

Author: Dawyndt Peter
De Baets Bernard
Fack Veerle
Vyverman Michaël
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2012
Field of study

The combination of incessant advances in sequencing technology producing large amounts of data and innovative bioinformatics approaches, designed to cope with this data flood, has led to new interesting results in the life sciences. Given the magnitude of sequence data to be processed, many bioinformatics tools rely on efficient solutions to a variety of complex string problems. These solutions include fast heuristic algorithms and advanced data structures, generally referred to as index structures. Although the importance of index structures is generally known to the bioinformatics community, the design and potency of these data structures, as well as their properties and limitations, are less understood. Moreover, the last decade has seen a boom in the number of variant index structures featuring complex and diverse memory-time trade-offs. This article brings a comprehensive state-of-the-art overview of the most popular index structures and their recently developed variants. Their features, interrelationships, the trade-offs they impose, but also their practical limitations, are explained and compared

Ghent University Academic Bibliography

PubMed Central

Reverse-Safe Data Structures for Text Indexing

Author: Gabriele Fici
Giulia Bernardini
Grigorios Loukides
Huiping Chen
Solon P. Pissis
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/2020
Field of study

We introduce the notion of reverse-safe data structures. These are data structures that prevent the reconstruction of the data they encode (i.e., they cannot be easily reversed). A data structure D is called z-reverse-safe when there exist at least z datasets with the same set of answers as the ones stored by D. The main challenge is to ensure that D stores as many answers to useful queries as possible, is constructed efficiently, and has size close to the size of the original dataset it encodes. Given a text of length n and an integer z, we propose an algorithm which constructs a z-reverse-safe data structure that has size O(n) and answers pattern matching queries of length at most d optimally, where d is maximal for any such z-reverse-safe data structure. The construction algorithm takes O(n ω log d) time, where ω is the matrix multiplication exponent. We show that, despite the n ω factor, our engineered implementation takes only a few minutes to finish for million-letter texts. We further show that plugging our method in data analysis applications gives insignificant or no data utility loss. Finally, we show how our technique can be extended to support applications under a realistic adversary model

Archivio istituzionale della ricerca - Università di Trieste

Crossref

CWI's Institutional Repository

University of Birmingham Research Portal

Archivio istituzionale della ricerca - Università di Palermo