Search CORE

317 research outputs found

A Succinct Four Russians Speedup for Edit Distance Computation and One-against-many Banded Alignment

Author: Brubach Brian
Ghurye Jay
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. Annual Symposium on Combinatorial Pattern Matching (CPM 2018)
Publication date: 01/01/2018
Field of study

The classical Four Russians speedup for computing edit distance (a.k.a. Levenshtein distance), due to Masek and Paterson [Masek and Paterson, 1980], involves partitioning the dynamic programming table into k-by-k square blocks and generating a lookup table in O(psi^{2k} k^2 |Sigma|^{2k}) time and O(psi^{2k} k |Sigma|^{2k}) space for block size k, where psi depends on the cost function (for unit costs psi = 3) and |Sigma| is the size of the alphabet. We show that the O(psi^{2k} k^2) and O(psi^{2k} k) factors can be improved to O(k^2 lg{k}) time and O(k^2) space. Thus, we improve the time and space complexity of that aspect compared to Masek and Paterson [Masek and Paterson, 1980] and remove the dependence on psi. We further show that for certain problems the O(|Sigma|^{2k}) factor can also be reduced. Using this technique, we show a new algorithm for the fundamental problem of one-against-many banded alignment. In particular, comparing one string of length m to n other strings of length m with maximum distance d can be performed in O(n m + m d^2 lg{d} + n d^3) time. When d is reasonably small, this approaches or meets the current best theoretic result of O(nm + n d^2) achieved by using the best known pairwise algorithm running in O(m + d^2) time [Myers, 1986][Ukkonen, 1985] while potentially being more practical. It also improves on the standard practical approach which requires O(n m d) time to iteratively run an O(md) time pairwise banded alignment algorithm. Regarding pairwise comparison, we extend the classic result of Masek and Paterson [Masek and Paterson, 1980] which computes the edit distance between two strings in O(m^2/log{m}) time to remove the dependence on psi even when edits have arbitrary costs from a penalty matrix. Crochemore, Landau, and Ziv-Ukelson [Crochemore, 2003] achieved a similar result, also allowing for unrestricted scoring matrices, but with variable-sized blocks. In practical applications of the Four Russians speedup wherein space efficiency is important and smaller block sizes k are used (notably k < |Sigma|), Kim, Na, Park, and Sim [Kim et al., 2016] showed how to remove the dependence on the alphabet size for the unit cost version, generating a lookup table in O(3^{2k} (2k)! k^2) time and O(3^{2k} (2k)! k) space. Combining their work with our result yields an improvement to O((2k)! k^2 lg{k}) time and O((2k)! k^2) space

Dagstuhl Research Online Publication Server

GENOME ASSEMBLY AND VARIANT DETECTION USING EMERGING SEQUENCING TECHNOLOGIES AND GRAPH BASED METHODS

Author: Ghurye Jay
Publication venue
Publication date: 01/01/2018
Field of study

The increased availability of genomic data and the increased ease and lower costs of DNA sequencing have revolutionized biomedical research. One of the critical steps in most bioinformatics analyses is the assembly of the genome sequence of an organism using the data generated from the sequencing machines. Despite the long length of sequences generated by third-generation sequencing technologies (tens of thousands of basepairs), the automated reconstruction of entire genomes continues to be a formidable computational task. Although long read technologies help in resolving highly repetitive regions, the contigs generated from long read assembly do not always span a complete chromosome or even an arm of the chromosome. Recently, new genomic technologies have been developed that can ''bridge" across repeats or other genomic regions that are difficult to sequence or assemble and improve genome assemblies by ''scaffolding" together large segments of the genome. The problem of scaffolding is vital in the context of both single genome assembly of large eukaryotic genomes and in metagenomics where the goal is to assemble multiple bacterial genomes in a sample simultaneously. First, we describe SALSA2, a method we developed to use interaction frequency between any two loci in the genome obtained using Hi-C technology to scaffold fragmented eukaryotic genome assemblies into chromosomes. SALSA2 can be used with either short or long read assembly to generate highly contiguous and accurate chromosome level assemblies. Hi-C data are known to introduce small inversion errors in the assembly, so we included the assembly graph in the scaffolding process and used the sequence overlap information to correct the orientation errors. Next, we present our contributions to metagenomics. We developed a scaffolding and variant detection method MetaCarvel for metagenomic datasets. Several factors such as the presence of inter-genomic repeats, coverage ambiguities, and polymorphic regions in the genomes complicate the task of scaffolding metagenomes. Variant detection is also tricky in metagenomes because the different genomes within these complex samples are not known beforehand. We showed that MetaCarvel was able to generate accurate scaffolds and find genome-wide variations de novo in metagenomic datasets. Finally, we present EDIT, a tool for clustering millions of DNA sequence fragments originating from the highly conserved 16s rRNA gene in bacteria. We extended classical Four Russians' speed up to banded sequence alignment and showed that our method clusters highly similar sequences efficiently. This method can also be used to remove duplicates or near duplicate sequences from a dataset. With the increasing data being generated in different genomic and metagenomic studies using emerging sequencing technologies, our software tools and algorithms are well timed with the need of the community

Digital Repository at the University of Maryland

Better Greedy Sequence Clustering with Fast Banded Alignment

Author: Brubach Brian
Ghurye Jay
Pop Mihai
Srinivasan Aravind
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 17th International Workshop on Algorithms in Bioinformatics (WABI 2017)
Publication date: 01/01/2017
Field of study

Comparing a string to a large set of sequences is a key subroutine in greedy heuristics for clustering genomic data. Clustering 16S rRNA gene sequences into operational taxonomic units (OTUs) is a common method used in studying microbial communities. We present a new approach to greedy clustering using a trie-like data structure and Four Russians speedup. We evaluate the running time of our method in terms of the number of comparisons it makes during clustering and show in experimental results that the number of comparisons grows linearly with the size of the dataset as opposed to the quadratic running time of other methods. We compare the clusters output by our method to the popular greedy clustering tool UCLUST. We show that the clusters we generate can be both tighter and larger

Dagstuhl Research Online Publication Server

Generalized Marshall-Olkin Distributions, and Related Bivariate Aging Properties

Author: Aczel
Asimit
Denuit
Deshpande
Franco Pellerey
Galambos
Ghurye
Hanagal
Hutchinson
Klein
Kotz
Lai
Li
Li
Lu
Lu
Marshall
Mulero
Muliere
Navarro
Navarro
Nelsen
Pellerey
Scarsini
Shaked
Wu
Xiaohu Li
Publication venue: Elsevier
Publication date: 01/01/2011
Field of study

National Natural Science Foundation of China [10771090]A class of generalized bivariate Marshall-Olkin distributions, which includes as special cases the Marshall-Olkin bivariate exponential distribution and the Marshall-Olkin type distribution due to Muliere and Scarsini (1987) [19] are examined in this paper. Stochastic comparison results are derived, and bivariate aging properties, together with properties related to evolution of dependence along time, are investigated for this class of distributions. Extensions of results previously presented in the literature are provided as well. (C) 2011 Elsevier Inc. All rights reserved

Elsevier - Publisher Connector

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Xiamen University Institutional Repository

PORTO Publications Open Repository TOrino

Improved reference genome of the arboviral vector Aedes albopictus

Author: Akbari Omar,
Antoshechkin Igor
Biedler James,
Bonizzoni Mariangela
Caccone Adalgisa
Cosme Luciano,
Crawford Jacob,
Failloux Anna-Bella
Gamez Stephanie
Ghurye Jay
Halbach Rebecca
Jenrette Jeremy
Johnston J. Spencer
Karagodin Dmitry,
Koren Sergey
Krsticevic Flavia
Marconcini Michele
Masri Reem,
Masterson Patrick
Miesen Pascal
Palatini Umberto
Papathanos Philippos,
Phillippy Adam,
Pischedda Elisa
Powell Jeffrey
Rhie Arang
Sharakhova Maria,
Sharma Atashi
Thibaud-Nissen Francoise
Tu Zhijian
van Rij Ronald,
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/08/2020
Field of study

Background: The Asian tiger mosquito Aedes albopictus is globally expanding and has become the main vector for human arboviruses in Europe. With limited antiviral drugs and vaccines available, vector control is the primary approach to prevent mosquito-borne diseases. A reliable and accurate DNA sequence of the Ae. albopictus genome is essential to develop new approaches that involve genetic manipulation of mosquitoes. Results: We use long-read sequencing methods and modern scaffolding techniques (PacBio, 10X, and Hi-C) to produce AalbF2, a dramatically improved assembly of the Ae. albopictus genome. AalbF2 reveals widespread viral insertions, novel microRNAs and piRNA clusters, the sex-determining locus, and new immunity genes, and enables genome-wide studies of geographically diverse Ae. albopictus populations and analyses of the developmental and stage-dependent network of expression data. Additionally, we build the first physical map for this species with 75% of the assembled genome anchored to the chromosomes. Conclusion: The AalbF2 genome assembly represents the most up-to-date collective knowledge of the Ae. albopictus genome. These resources represent a foundation to improve understanding of the adaptation potential and the epidemiological relevance of this species and foster the development of innovative control measures

ZENODO

Directory of Open Access Journals

PubMed Central

HAL Descartes

eScholarship - University of California

Caltech Authors

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

HAL-Pasteur

MetaCarvel: linking assembly graph motifs to biological variants

Author: Fedarko Marcus
Ghurye Jay
Hervey W. Judson IV
Pop Mihai
Treangen Todd
Publication venue: Springer Nature
Publication date: 06/08/2019
Field of study

Reconstructing genomic segments from metagenomics data is a highly complex task. In addition to general challenges, such as repeats and sequencing errors, metagenomic assembly needs to tolerate the uneven depth of coverage among organisms in a community and differences between nearly identical strains. Previous methods have addressed these issues by smoothing genomic variants. We present a variant-aware metagenomic scaffolder called MetaCarvel, which combines new strategies for repeat detection with graph analytics for the discovery of variants. We show that MetaCarvel can accurately reconstruct genomic segments from complex microbial mixtures and correctly identify and characterize several classes of common genomic variants.https://doi.org/10.1186/s13059-019-1791-

Digital Repository at the University of Maryland

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Recommended from our members

De novo assembly of the cattle reference genome with single-molecule sequencing.

Author: Bickhart Derek M
Cole John B
Couldrey Christine
Dreischer Christian
Elsik Christine G
Ghurye Jay
Hagen Darren E
Hall Richard
Hammond John A
Hoffman Jinna
Koren Sergey
Li Wenli
Liu George
Low Wai Y
McDaneld Tara G
McKay Stephanie D
Medrano Juan F
Murdoch Brenda M
Nandolo Wilson
Phillippy Adam M
Rhie Arang
Rosen Benjamin D
Rowan Troy N
Schnabel Robert D
Schroeder Steven G
Schultheiss Sebastian J
Schwartz John C
Smith Timothy PL
Snelling Warren M
Thibaud-Nissen Françoise
Tseng Elizabeth
Van Tassell Curtis P
Zimin Aleksey
Publication venue: eScholarship, University of California
Publication date: 01/03/2020
Field of study

BackgroundMajor advances in selection progress for cattle have been made following the introduction of genomic tools over the past 10-12 years. These tools depend upon the Bos taurus reference genome (UMD3.1.1), which was created using now-outdated technologies and is hindered by a variety of deficiencies and inaccuracies.ResultsWe present the new reference genome for cattle, ARS-UCD1.2, based on the same animal as the original to facilitate transfer and interpretation of results obtained from the earlier version, but applying a combination of modern technologies in a de novo assembly to increase continuity, accuracy, and completeness. The assembly includes 2.7 Gb and is >250× more continuous than the original assembly, with contig N50 >25 Mb and L50 of 32. We also greatly expanded supporting RNA-based data for annotation that identifies 30,396 total genes (21,039 protein coding). The new reference assembly is accessible in annotated form for public use.ConclusionsWe demonstrate that improved continuity of assembled sequence warrants the adoption of ARS-UCD1.2 as the new cattle reference genome and that increased assembly accuracy will benefit future research on this species

eScholarship - University of California

Chromosome-scale, Haplotype-resolved Assembly of Human Genomes

Author: Aach J.
Carroll A.
Cheng H.
Chin C.
Chou M.
Church G.
Fungtammasan A.
Garg S.
Ghurye J.
Hatas E.
Heller D.
Li H.
Mac S.
Maguire J.
Mahmoud M.
Marschall T.
Moemke T.
Peluso P.
Schmitt A.
Sedlazeck F.
Zhou X.
Zook J.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

MPG.PuRe

The Place of Scripture in the Trajectories of a Distinct Religious Identity among Ravidassias in Britain: Guru Granth Sahib or Amritbani Guru Ravidass

This article highlights narratives, collected as informant testimonies, relating to trajectories of a distinct religious identity among the Ravidassia community in Britain. Current tensions surround the replacement of the Guru Granth Sahib with the Amritbani Guru Ravidass in Ravidassia places of worship. This is primarily in response to cartographies of the Ravidassia identity as distinct from Sikh identity. The opinions of Ravidassia individuals, from a varied age range, expressed in interviews conducted at various periods during 2010–2012, are considered in relation to dominant discourses emphasising the importance of one hegemonic ‘Ravidassia’ scripture. The interview data highlight three main positions among the followers of Guru Ravidass: (1) Ravidassias seeking a distinct identity but preferring to retain the Guru Granth Sahib in Ravidassia places of worship, (2) Ravidassias demanding a distinct identity by installing the Amritbani Guru Ravidass, (3) Ravidassias wanting to maintain their link with the Panth as Sikhs or as Ravidassi Sikhs

Crossref

Wolverhampton Intellectual Repository and E-theses

Production and characterization of a biosurfactant produced by Streptomyces sp. DPUA 1559 isolated from lichens of the Amazon region

Author: Accorsini FR
Bondi CAM
Chen YC
Cooper DG
Davis BJ
DuBois M
Folch J
Ghurye GL
Gudina EJ
Gudiãa EJ
Javaheri M
Khopade R
Khopade R
Kim SH
Laemmli UK
Lima CJB
Lima e Silva TA
Lima RA
Lowry OH
Luna JM
Luna JM
Meyer BN
Mulligan CN
Muth G
Nkadi PO
Pridham TG
Quinn PJ
Richter M
Rufino RD
Rufino RD
Rufino RD
Saeki H
Santos APP
Silva SNRL
Sobrinho HBS
Thavasi R
Tiquia SM
Publication venue: 'FapUNIFESP (SciELO)'
Publication date: 01/01/2018
Field of study

Crossref