Search CORE

9 research outputs found

Recommended from our members

Seedability: optimizing alignment parameters for sensitive sequence comparison

Author: Ayad LAK
Chikhi R
Pissis SP
Publication venue: Oxford University Press (OUP)
Publication date: 12/08/2023
Field of study

Data availability: The data underlying this article are available either in https://github.com/lorrainea/Seedability or in the ensembl database at https://www.ensembl.org, and can be accessed using the gene names ENSPTRG00000044036 and ENSG00000174236 or in the NCBI database at https://www.ncbi.nlm.nih.gov and can be found using the reference sequence NC_000001.11.Motivation: Most sequence alignment techniques make use of exact k-mer hits, called seeds, as anchors to optimize alignment speed. A large number of bioinformatics tools employing seed-based alignment techniques, such as Minimap2⁠, use a single value of k per sequencing technology, without a strong guarantee that this is the best possible value. Given the ubiquity of sequence alignment, identifying values of k that lead to more sensitive alignments is thus an important task. To aid this, we present Seedability⁠, a seed-based alignment framework designed for estimating an optimal seed k-mer length (as well as a minimal number of shared seeds) based on a given alignment identity threshold. In particular, we were motivated to make Minimap2 more sensitive in the pairwise alignment of short sequences. Results: The experimental results herein show improved alignments of short and divergent sequences when using the parameter values determined by Seedability in comparison to the default values of Minimap2. We also show several cases of pairs of real divergent sequences, where the default parameter values of Minimap2 yield no output alignments, but the values output by Seedability produce plausible alignments. Availability and implementation: https://github.com/lorrainea/Seedability (distributed under GPL v3.0).R.C. was supported by ANR Full-RNA, SeqDigger, Inception, and PRAIRIE grants (ANR-22-CE45-0007, ANR-19-CE45-0008, PIA/ANR16-CONV-0005, ANR-19-P3IA-0001). This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreements No. 872539 (PANGAIA) and 956229 (ALPACA)

Brunel University Research Archive

Circular pattern matching with k mismatches

Author: A Amir
C Barton
C Barton
C Barton
C Hazay
CS Iliopoulos
GM Landau
Karl Bringmann
Kimmo Fredriksson
KR Abrahamson
LAK Ayad
LAK Ayad
M Crochemore
M Ružić
MAR Azim
ML Fredman
P Bille
P Gawrychowski
P Gawrychowski
R Grossi
Raphael Clifford
T Hirvola
T Kociumaka
V Palazón-González
V Palazón-González
V Palazón-González
WI Chang
Z Galil
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 10/07/2019
Field of study

The k-mismatch problem consists in computing the Hamming distance between a pattern P of length m and every length-m substring of a text T of length n, if this distance is no more than k. In many real-world applications, any cyclic shift of P is a relevant pattern, and thus one is interested in computing the minimal distance of every length-m substring of T and any cyclic shift of P. This is the circular pattern m

arXiv.org e-Print Archive

Crossref

VU Research Portal

CWI's Institutional Repository

INRIA a CCSD electronic archive server

Hal-Diderot

Recommended from our members

CNEFinder: Finding conserved non-coding elements in genomes

Author: Ayad LAK
Pissis SP
Polychronopoulos D
Publication venue: Oxford University Press
Publication date: 01/09/2018
Field of study

Availability and implementation: Free software under the terms of the GNU GPL (https://github.com/lorrainea/CNEFinder).Motivation: Conserved non-coding elements (CNEs) represent an enigmatic class of genomic elements which, despite being extremely conserved across evolution, do not encode for proteins. Their functions are still largely unknown. Thus, there exists a need to systematically investigate their roles in genomes. Towards this direction, identifying sets of CNEs in a wide range of organisms is an important first step. Currently, there are no tools published in the literature for systematically identifying CNEs in genomes. Results We fill this gap by presenting CNEFinder⁠; a tool for identifying CNEs between two given DNA sequences with user-defined criteria. The results presented here show the tool’s ability of identifying CNEs accurately and efficiently. CNEFinder is based on a k-mer technique for computing maximal exact matches. The tool thus does not require or compute whole-genome alignments or indexes, such as the suffix array or the Burrows Wheeler Transform (BWT), which makes it flexible to use on a wide scale.This work was supported by the Engineering and Physical Sciences Research Council [grant number EP/M50788X/1]

Brunel University Research Archive

Recommended from our members

Sparse Suffix and LCP Array: Simple, Direct, Small, and Fast

Author: Ayad LAK
Loukides G
Pissis SP
Verbeek H
Publication venue: Springer Nature
Publication date: 20/12/2023
Field of study

A preprint version of this article is available at arXiv:2310.09023v1 [cs.DS] (https://arxiv.org/abs/2310.09023) under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/). It has not been certified by peer review.Sparse suffix sorting is the problem of sorting b=o(n) suffixes of a string of length n. Efficient sparse suffix sorting algorithms have existed for more than a decade. Despite the multitude of works and their justified claims for applications in text indexing, the existing algorithms have not been employed by practitioners. Arguably this is because there are no simple, direct, and efficient algorithms for sparse suffix array construction. We provide two new algorithms for constructing the sparse suffix and LCP arrays that are simultaneously simple, direct, small, and fast. In particular, our algorithms are: simple in the sense that they can be implemented using only basic data structures; direct in the sense that the output arrays are not a byproduct of constructing the sparse suffix tree or an LCE data structure; fast in the sense that they run in O(nlogb) time, in the worst case, or in O(n) time, when the total number of suffixes with an LCP value greater than 2⌊lognb⌋+1−1 is in O(b/logb), matching the time of the optimal yet much more complicated algorithms [Gawrychowski and Kociumaka, SODA 2017; Birenzwige et al., SODA 2020]; and small in the sense that they can be implemented using only 8b+o(b) machine words. Our algorithms are simplified, yet non-trivial, space-efficient adaptations of the Monte Carlo algorithm by I et al. for constructing the sparse suffix tree in O(nlogb) time [STACS 2014]. We also provide proof-of-concept experiments to justify our claims on simplicity and efficiency.SPP and HV are supported by the PANGAIA project (GA 872539). SPP is supported by the ALPACA project (GA 956229). HV is supported by a Constance van Eeden Fellowship

Brunel University Research Archive

Recommended from our members

IsoXpressor: A tool to assess transcriptional activity within isochores

Author: Arhondakis S
Ayad LAK
Dourou A-M
Pissis SP
Publication venue: Oxford University Press on behalf of the Society for Molecular Biology and Evolution
Publication date: 08/08/2020
Field of study

Data Availability: The data underlying this article are available in the article and in its Supplementary Material online at: https://academic.oup.com/gbe/article/12/9/1573/5898630#207856986.Genomes are characterized by large regions of homogeneous base compositions known as isochores. The latter are divided into GC-poor and GC-rich classes linked to distinct functional and structural properties. Several studies have addressed how isochores shape function and structure. To aid in this important subject, we present IsoXpressor, a tool designed for the analysis of the functional property of transcription within isochores. IsoXpressor allows users to process RNA-Seq data in relation to the isochores, and it can be employed to investigate any biological question of interest for any species. The results presented herein as proof of concept are focused on the preimplantation process in Homo sapiens (human) and Macaca mulatta (rhesus monkey)

Brunel University Research Archive

Range Shortest Unique Substring queries

Author: A Amir
A Amir
A Ganguly
A Ganguly
AM İleri
B Haubold
C Schleiermacher
CS Iliopoulos
D Harel
H Inoue
K Tsuruta
LAK Ayad
MA Bender
O Berkman
W Hon
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 07/10/2019
Field of study

Let be a string of length n and be the substring of starting at position i and ending at position j. A substring of is a repeat if it occurs more than once in; otherwise, it is a unique substring of. Repeats and unique substrings are of great interest in computational biology and in information retrieval. Given string as input, the Shortest Unique Substring problem is to find a shortest substring of that does not occur elsewhere in. In this paper, we introduce the range variant of this problem, which we call the Range Shortest Unique Substring problem. The task is to construct a data structure over answering the following type of online queries efficiently. Given a range, return a shortest substring of with exactly one occurrence in. We present an -word data structure with query time, where is the word size. Our construction is based on a non-trivial reduction allowing us to apply a recently introduced optimal geometric data structure [Chan et al. ICALP 2018]

Crossref

Report of a rare co-incidence of congenital factor V deficiency and thalassemia intermedia in a family

Author: Ali Shamseddine
Ali Taher
Ayad Hamdan
Chiu HC
Chiu HC
Giannini E
Girolami M
Lak M
Manotti C
Peyvandi F
Qatanani M
Susane Koussa
Tuddenham EGD
Yasser Abou Mourad
Publication venue: 'King Faisal Specialist Hospital and Research Centre'
Publication date
Field of study

Crossref

Online Algorithms on Antipowers and Antiperiods

Author: G Badkobeh
G Fici
HW Lenstra
J Fischer
L Li
LAK Ayad
M Crochemore
M Crochemore
M Dietzfelbinger
M Lothaire
M Lothaire
M Lothaire
P Bille
R Kolpakov
RM Karp
S Inenaga
SW Bae
T Kociumaka
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

International audienceThe definition of antipower introduced by Fici et al. (ICALP 2016) captures the notion of being the opposite of a power : a sequence of k pairwise distinct blocks of the same length. Recently, Alamro et al. (CPM 2019) defined a string to have an antiperiod if it is a prefix of an antipower, and gave complexity bounds for the offline computation of the minimum antiperiod and all the antiperiods of a word. In this paper, we address the same problems in the online setting. Our solutions rely on new arrays that compactly and incrementally store antiperiods and antipowers as the word grows, obtaining in the process this information for all the word's prefixes. We show how to compute those arrays online in O(n log n) space, O(n log n) time, and o(n) delay per character, for any constant > 0. Running times are worst-case and hold with high probability. We also discuss more space-efficient solutions returning the correct result with high probability, and small data structures to support random access to those arrays

Crossref

Archivio della Ricerca - Università di Pisa

HAL Descartes

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

King's Research Portal

Archivio della ricerca- LUISS Libera Università Internazionale degli Studi Sociali Guido Carli di Roma

Hal-Diderot

The roles of hybridization and habitat fragmentation in the evolution of Brazil’s enigmatic longwing butterflies, Heliconius nattereri and H. hermathena

Author: A Larsson
A Martin
A McKenna
A-L Hikl
AL Price
AM Bolger
AVL Freitas
B Langmead
B Paten
BA Counterman
BR Larget
BS Weir
C Ané
C Pardo-Diaz
C Zhang
CD Jiggins
D Petkova
DH Alexander
DT Hoang
EL Westerman
EV Kriventseva
EY Durand
F Ronquist
F Tajima
F Tajima
G Marçais
GW Vurture
H Li
HW Bates
J Enciso-Romero
J Mallet
J Mallet
J Mavárez
J Mavárez
J Rozas
J Terhorst
JB Lack
JJ Lewis
JJ Lewis
JK Pritchard
JW Leigh
K Katoh
KK Dasmahapatra
KM Kozak
KS Brown
KS Brown
KS Brown
KS Brown
L Excoffier
LAK Ayad
LEOC Aragão
LP Pryszcz
LT Nguyen
M Alonge
M Beltrán
M Joron
M Joron
M Martin
MC Whitlock
MR Kronforst
MR Kronforst
N Dierckxsens
NB Edelman
NJ Nadeau
NM Haddad
P Cingolani
P Danecek
P Jay
PD Keightley
PJA Cock
PM Sheppard
R Kajitani
R Kajitani
RD Reed
RE Green
RM Merrill
RM Waterhouse
RR Seixas
RWR Wallbank
S Kalyaanamoorthy
S Mirarab
SH Martin
SH Martin
SM Van Belleghem
SM Van Belleghem
TC Bruen
TJ Thurman
W Zhang
W Zhang
WC Hewitson
WF Laurance
WM Neukirchen
YB Simons
YB Simons
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref