Search CORE

288 research outputs found

Compressed Spaced Suffix Arrays

Author: Gagie Travis
Manzini Giovanni
Valenzuela Daniel
Publication venue
Publication date: 01/01/2014
Field of study

Spaced seeds are important tools for similarity search in bioinformatics, and using several seeds together often significantly improves their performance. With existing approaches, however, for each seed we keep a separate linear-size data structure, either a hash table or a spaced suffix array (SSA). In this paper we show how to compress SSAs relative to normal suffix arrays (SAs) and still support fast random access to them. We first prove a theoretical upper bound on the space needed to store an SSA when we already have the SA. We then present experiments indicating that our approach works even better in practice

arXiv.org e-Print Archive

CiteSeerX

Archivio della Ricerca - Università di Pisa

Archivio Istituzionale della Ricerca- Università del Piemonte Orientale

An Algorithmic Study of Manufacturing Paperclips and Other Folded Structures

Author: Arkin Esther M.
Fekete Sandor P.
Mitchell Joseph S. B.
Publication venue
Publication date: 30/09/2002
Field of study

We study algorithmic aspects of bending wires and sheet metal into a specified structure. Problems of this type are closely related to the question of deciding whether a simple non-self-intersecting wire structure (a carpenter's ruler) can be straightened, a problem that was open for several years and has only recently been solved in the affirmative. If we impose some of the constraints that are imposed by the manufacturing process, we obtain quite different results. In particular, we study the variant of the carpenter's ruler problem in which there is a restriction that only one joint can be modified at a time. For a linkage that does not self-intersect or self-touch, the recent results of Connelly et al. and Streinu imply that it can always be straightened, modifying one joint at a time. However, we show that for a linkage with even a single vertex degeneracy, it becomes NP-hard to decide if it can be straightened while altering only one joint at a time. If we add the restriction that each joint can be altered at most once, we show that the problem is NP-complete even without vertex degeneracies. In the special case, arising in wire forming manufacturing, that each joint can be altered at most once, and must be done sequentially from one or both ends of the linkage, we give an efficient algorithm to determine if a linkage can be straightened.Comment: 28 pages, 14 figures, Latex, to appear in Computational Geometry - Theory and Application

arXiv.org e-Print Archive

Elsevier - Publisher Connector

Compressed Spaced Suffix Arrays

Author: Gagie Travis
Manzini Giovanni
Valenzuela Daniel
Publication venue
Publication date: 15/08/2016
Field of study

As a first step in designing relatively-compressed data structures---i.e., such that storing an instance for one dataset helps us store instances for similar datasets---we consider how to compress spaced suffix arrays relative to normal suffix arrays and still support fast access to them. This problem is of practical interest when performing similarity search with spaced seeds because using several seeds in parallel significantly improves their performance, but with existing approaches we keep a separate linear-space hash table or spaced suffix array for each seed. We first prove a theoretical upper bound on the space needed to store a spaced suffix array when we already have the suffix array. We then present experiments indicating that our approach works even better in practice.Peer reviewe

CiteSeerX

Archivio della Ricerca - Università di Pisa

Helsingin yliopiston digitaalinen arkisto

Archivio Istituzionale della Ricerca- Università del Piemonte Orientale

Recombination between heterologous human acrocentric chromosomes

Author: Buonaiuto Silvia
Gomes de Lima Leonardo
Guarracino Andrea
Marco Santiago
Potapova Tamara
Rhie Arang
Publication venue: Nature Research
Publication date: 01/01/2023
Field of study

The short arms of the human acrocentric chromosomes 13, 14, 15, 21 and 22 (SAACs) share large homologous regions, including ribosomal DNA repeats and extended segmental duplications1,2. Although the resolution of these regions in the first complete assembly of a human genome—the Telomere-to-Telomere Consortium’s CHM13 assembly (T2T-CHM13)—provided a model of their homology3, it remained unclear whether these patterns were ancestral or maintained by ongoing recombination exchange. Here we show that acrocentric chromosomes contain pseudo-homologous regions (PHRs) indicative of recombination between non-homologous sequences. Utilizing an all-to-all comparison of the human pangenome from the Human Pangenome Reference Consortium4 (HPRC), we find that contigs from all of the SAACs form a community. A variation graph5 constructed from centromere-spanning acrocentric contigs indicates the presence of regions in which most contigs appear nearly identical between heterologous acrocentric chromosomes in T2T-CHM13. Except on chromosome 15, we observe faster decay of linkage disequilibrium in the pseudo-homologous regions than in the corresponding short and long arms, indicating higher rates of recombination6,7. The pseudo-homologous regions include sequences that have previously been shown to lie at the breakpoint of Robertsonian translocations8, and their arrangement is compatible with crossover in inverted duplications on chromosomes 13, 14 and 21. The ubiquity of signals of recombination between heterologous acrocentric chromosomes seen in the HPRC draft pangenome suggests that these shared sequences form the basis for recurrent Robertsonian translocations, providing sequence and population-based confirmation of hypotheses first developed from cytogenetic studies 50 years ago9.Our work depends on the HPRC draft human pangenome resource established in the accompanying Article4, and we thank the production and assembly groups for their efforts in establishing this resource. This work used the computational resources of the UTHSC Octopus cluster and NIH HPC Biowulf cluster. We acknowledge support in maintaining these systems that was critical to our analyses. The authors thank M. Miller for the development of a graphical synopsis of our study (Fig. 5); and R. Williams and N. Soranzo for support and guidance in the design and discussion of our work. This work was supported, in part, by National Institutes of Health/NIDA U01DA047638 (E.G.), National Institutes of Health/NIGMS R01GM123489 (E.G.), NSF PPoSS Award no. 2118709 (E.G. and C.F.), the Tennessee Governor’s Chairs programme (C.F. and E.G.), National Institutes of Health/NCI R01CA266339 (T.P., L.G.d.L. and J.L.G.), and the Intramural Research Program of the National Human Genome Research Institute, National Institutes of Health (A.R., S.K. and A.M.P.). We acknowledge support from Human Technopole (A.G.), Consiglio Nazionale delle Ricerche, Italy (S.B. and V.C.), and Stowers Institute for Medical Research (T.P., L.G.d.L., B.R. and J.L.G.).Peer Reviewed"Article signat per 13 autors/es: Andrea Guarracino, Silvia Buonaiuto, Leonardo Gomes de Lima, Tamara Potapova, Arang Rhie, Sergey Koren, Boris Rubinstein, Christian Fischer, Human Pangenome Reference Consortium, Jennifer L. Gerton, Adam M. Phillippy, Vincenza Colonna & Erik Garrison " Human Pangenome Reference Consortium: "Haley J. Abel, Lucinda L. Antonacci-Fulton, Mobin Asri, Gunjan Baid, Carl A. Baker, Anastasiya Belyaeva, Konstantinos Billis, Guillaume Bourque, Silvia Buonaiuto, Andrew Carroll, Mark J. P. Chaisson, Pi-Chuan Chang, Xian H. Chang, Haoyu Cheng, Justin Chu, Sarah Cody, Vincenza Colonna, Daniel E. Cook, Robert M. Cook-Deegan, Omar E. Cornejo, Mark Diekhans, Daniel Doerr, Peter Ebert, Jana Ebler, Evan E. Eichler, Jordan M. Eizenga, Susan Fairley, Olivier Fedrigo, Adam L. Felsenfeld, Xiaowen Feng, Christian Fischer, Paul Flicek, Giulio Formenti, Adam Frankish, Robert S. Fulton, Yan Gao, Shilpa Garg, Erik Garrison, Nanibaa’ A. Garrison, Carlos Garcia Giron, Richard E. Green, Cristian Groza, Andrea Guarracino, Leanne Haggerty, Ira Hall, William T. Harvey, Marina Haukness, David Haussler, Simon Heumos, Glenn Hickey, Kendra Hoekzema, Thibaut Hourlier, Kerstin Howe, Miten Jain, Erich D. Jarvis, Hanlee P. Ji, Eimear E. Kenny, Barbara A. Koenig, Alexey Kolesnikov, Jan O. Korbel, Jennifer Kordosky, Sergey Koren, HoJoon Lee, Alexandra P. Lewis, Heng Li, Wen-Wei Liao, Shuangjia Lu, Tsung-Yu Lu, Julian K. Lucas, Hugo Magalhães, Santiago Marco-Sola, Pierre Marijon, Charles Markello, Tobias Marschall, Fergal J. Martin, Ann McCartney, Jennifer McDaniel, Karen H. Miga, Matthew W. Mitchell, Jean Monlong, Jacquelyn Mountcastle, Katherine M. Munson, Moses Njagi Mwaniki, Maria Nattestad, Adam M. Novak, Sergey Nurk, Hugh E. Olsen, Nathan D. Olson, Benedict Paten, Trevor Pesout, Adam M. Phillippy, Alice B. Popejoy, David Porubsky, Pjotr Prins, Daniela Puiu, Mikko Rautiainen, Allison A. Regier, Arang Rhie, Samuel Sacco, Ashley D. Sanders, Valerie A. Schneider, Baergen I. Schultz, Kishwar Shafin, Jonas A. Sibbesen, Jouni Sirén, Michael W. Smith, Heidi J. Sofia, Ahmad N. Abou Tayoun, Françoise Thibaud-Nissen, Chad Tomlinson, Francesca Floriana Tricomi, Flavia Villani, Mitchell R. Vollger, Justin Wagner, Brian Walenz, Ting Wang, Jonathan M. D. Wood, Aleksey V. Zimin & Justin M. Zook"Postprint (published version

UPCommons. Portal del coneixement obert de la UPC

All-Pairs LCA in DAGs: Breaking through the $O(n^{2.5})$ barrier

Author: Grandoni Fabrizio
Italiano Giuseppe F.
Parotsidis Nikos
Uznański Przemysław
Łukasiewicz Aleksander
Publication venue
Publication date: 13/11/2020
Field of study

Let

G=(V,E)

be an

n

-vertex directed acyclic graph (DAG). A lowest common ancestor (LCA) of two vertices

u

and

v

is a common ancestor

w

u

and

v

such that no descendant of

w

has the same property. In this paper, we consider the problem of computing an LCA, if any, for all pairs of vertices in a DAG. The fastest known algorithms for this problem exploit fast matrix multiplication subroutines and have running times ranging from

O(n^{2.687})

[Bender et al.~SODA'01] down to

O(n^{2.615})

[Kowaluk and Lingas~ICALP'05] and

O(n^{2.569})

[Czumaj et al.~TCS'07]. Somewhat surprisingly, all those bounds would still be

\Omega(n^{2.5})

even if matrix multiplication could be solved optimally (i.e.,

\omega=2

). This appears to be an inherent barrier for all the currently known approaches, which raises the natural question on whether one could break through the

O(n^{2.5})

barrier for this problem. In this paper, we answer this question affirmatively: in particular, we present an

\tilde O(n^{2.447})

(

\tilde O(n^{7/3})

for

\omega=2

) algorithm for finding an LCA for all pairs of vertices in a DAG, which represents the first improvement on the running times for this problem in the last 13 years. A key tool in our approach is a fast algorithm to partition the vertex set of the transitive closure of

G

into a collection of

O(\ell)

chains and

O(n/\ell)

antichains, for a given parameter

\ell

. As usual, a chain is a path while an antichain is an independent set. We then find, for all pairs of vertices, a \emph{candidate} LCA among the chain and antichain vertices, separately. The first set is obtained via a reduction to min-max matrix multiplication. The computation of the second set can be reduced to Boolean matrix multiplication similarly to previous results on this problem. We finally combine the two solutions together in a careful (non-obvious) manner

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

Archivio della ricerca- LUISS Libera Università Internazionale degli Studi Sociali Guido Carli di Roma

All-Pairs LCA in DAGs: Breaking through the O(n2.5) barrier

Author: \u141ukasiewicz Aleksander
Grandoni Fabrizio
Italiano Giuseppe Francesco
Parotsidis Nikos
Uzna\u144ski Przemys\u142aw
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/2021
Field of study

Archivio della ricerca- LUISS Libera Università Internazionale degli Studi Sociali Guido Carli di Roma

Algorithms for Manufacturing Paperclips and Sheet Metal Structures

Author: Arkin Esther M.
Fekete Sándor P.
Mitchell Joseph S. B.
Publication venue
Publication date: 01/01/2001
Field of study

DepositOnce

13th international workshop on expressiveness in concurrency

Author: Amadio R
Phillips I
Publication venue: Department of Computing, Imperial College London
Publication date: 01/01/2006
Field of study

Spiral - Imperial College Digital Repository

Flexeme: Untangling Commits Using Lexical Flows

Author: Allamanis M
Barr ET
Dash SK
Pârtachi PP
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 13/11/2020
Field of study

UCL Discovery

Space-Efficient Data Structures for Information Retrieval

Author: Claude Francisco
Publication venue: 'University of Waterloo'
Publication date: 22/04/2013
Field of study

The amount of data that people and companies store has grown exponentially over the last few years. Storing this information alone is not enough, because in order to make it useful we need to be able to efficiently search inside it. Furthermore, it is highly valuable to keep the historic data of each document stored, allowing to not only access and search inside the newest version, but also over the whole history of the documents. Grammar-based compression has proven to be very effective for repetitive data, which is the case for versioned documents. In this thesis we present several results on representing textual information and searching in it. In particular, we present text indexes for grammar-based compressed text that support searching for a pattern and extracting substrings of the input text. These are the first general indexes for grammar-based compressed text that support searching in sublinear time. In order to build our indexes, we present new results on representing binary relations in a space-efficient manner, and construction algorithms that use little space to achieve their goal. These two results have a wide range of applications. In particular, the representations for binary relations can be used as a building block for several structures in computer science, such as graphs, inverted indexes, etc. Finally, we present a new index, that uses on grammar-based compression, to solve the document listing problem. This problem deals with representing a collection of texts and searching for the documents that contain a given pattern. In spite of being similar to the classical text indexing problem, this problem has proven to be a challenge when we do not want to pay time proportional to the number of occurrences, but time proportional to the size of the result. Our proposal is designed particularly for versioned text, allowing the storage of a collection of documents with all their historic versions in little space. This is currently the smallest structure for such a purpose in practice

University of Waterloo's Institutional Repository