Search CORE

9 research outputs found

Comparing Elastic-Degenerate Strings: Algorithms, Lower Bounds, and Applications

Author: Gabory Esteban
Mwaniki Moses Njagi
Pisanti Nadia
Pissis Solon P.
Radoszewski Jakub
Sweering Michelle
Zuba Wiktor
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)
Publication date: 01/01/2023
Field of study

An elastic-degenerate (ED) string T is a sequence of n sets T[1], . . ., T[n] containing m strings in total whose cumulative length is N. We call n, m, and N the length, the cardinality and the size of T, respectively. The language of T is defined as L(T) = {S1 · · · Sn : Si ∈ T[i] for all i ∈ [1, n]}. ED strings have been introduced to represent a set of closely-related DNA sequences, also known as a pangenome. The basic question we investigate here is: Given two ED strings, how fast can we check whether the two languages they represent have a nonempty intersection? We call the underlying problem the ED String Intersection (EDSI) problem. For two ED strings T1 and T2 of lengths n1 and n2, cardinalities m1 and m2, and sizes N1 and N2, respectively, we show the following: There is no O((N1N2)1−ϵ)-time algorithm, thus no O ((N1m2 + N2m1)1−ϵ)-time algorithm and no O ((N1n2 + N2n1)1−ϵ)-time algorithm, for any constant ϵ > 0, for EDSI even when T1 and T2 are over a binary alphabet, unless the Strong Exponential-Time Hypothesis is false. There is no combinatorial O((N1 + N2)1.2−ϵf(n1, n2))-time algorithm, for any constant ϵ > 0 and any function f, for EDSI even when T1 and T2 are over a binary alphabet, unless the Boolean Matrix Multiplication conjecture is false. An O(N1 log N1 log n1 + N2 log N2 log n2)-time algorithm for outputting a compact (RLE) representation of the intersection language of two unary ED strings. In the case when T1 and T2 are given in a compact representation, we show that the problem is NP-complete. An O(N1m2 + N2m1)-time algorithm for EDSI. An Õ(N1ω−1n2 + N2ω−1n1)-time algorithm for EDSI, where ω is the exponent of matrix multiplication; the Õ notation suppresses factors that are polylogarithmic in the input size. We also show that the techniques we develop have applications outside of ED string comparison

VU Research Portal

CWI's Institutional Repository

INRIA a CCSD electronic archive server

Dagstuhl Research Online Publication Server

Antimicrobial resistance and plasmid profiles of Aeromonas hydrophila isolated from River Njoro, Kenya

Author: Kiruki Silas
Limo Moses
Mbala M Jessica
Nathan Lawless
Njagi Eliud Nyaga Mwaniki
Njeru S Ngoci
Okemo Paul O
Publication venue: 'African Journals Online (AJOL)'
Publication date: 03/02/2016
Field of study

The purpose of this study was to investigate the presence of Aeromonas hydrophila at commonly used water collection points on the River Njoro and to determine the in-vitro antimicrobial susceptibility and plasmid profiles of isolates. In total, 126 samples were collected and 36.5% of them were positive for A. hydrophila. The A. hydrophila were recovered on membrane filters, cultured on Trypticase Soy agar, Bile aesculin agar and Aeromonas Medium agar. They were further characterized using cytochrome oxidase and API 20E tests. Detection of drug susceptibility was determined using modified disc diffusion method to ampicillin (25 ìg), cefaclor (30 ìg), ceftizoxime (30 ìg), cefixime (5 ìg), cefazidime (30 ìg), gentamicin (200 ìg), streptomycin (25 ìg), chloramphenicol (50 ìg), nalidixic acid (30 ìg) and ciprofloxacin (1 ìg). Most of the isolates showed multi-drug resistance to two or more antibiotics. Chloramphenicol, nalidixic acid, ciprofloxacin, cefazidime and cefixime were the most sensitive drugs with 100% efficacy whereas ampicillin, cefaclor and streptomycin were the most resistant drugs having 100, 67 and 50 resistance, respectively. There was low resistance against ceftizoxime (16.7%) and gentamicin (23.3%). These results indicates that all A. hydrophila isolated from River Njoro had complete resistance to ampicillin and showed variable resistance to cefaclor, streptomycin, gentamycin and ceftizoxime. R-plasmids were extracted from multi-drug resistance strains and separated by agarose gel (0.8%) electrophoresis for profiling. Plasmid profiling revealed that most of the multi-drug resistant isolates contained one plasmid of 21.0 kb. Although some strains exhibited different antimicrobial resistance patterns, all of their plasmids were of the same size (21.0 kb). However, there were no plasmids in the antimicrobial sensitive isolates. This study also indicates that plasmid 21.0 kb is common in A. hydrophila and is important for antimicrobial resistance and virulence. Further studies are required to ascertain the role of this plasmid as a virulence marker.Key words: Aeromonas hydrophila, antimicrobial resistance, plasmid profile

AJOL - African Journals Online

Antimicrobial resistance and plasmid profiles of Aeromonas hydrophila isolated from River Njoro, Kenya

Author: Eliud Nyaga
Kiruki Silas
Lawless Nathan
Limo Moses
Mbala M Jessica
Mwaniki Njagi
Njeru S Ngoci
Paul O Okemo
Publication venue
Publication date: 03/04/2020
Field of study

The purpose of this study was to investigate the presence of Aeromonas hydrophila at commonly used water collection points on the River Njoro and to determine the in-vitro antimicrobial susceptibility and plasmid profiles of isolates. In total, 126 samples were collected and 36.5% of them were positive for A. hydrophila. The A. hydrophila were recovered on membrane filters, cultured on Trypticase Soy agar, Bile aesculin agar and Aeromonas Medium agar. They were further characterized using cytochrome oxidase and API 20E tests. Detection of drug susceptibility was determined using modified disc diffusion method to ampicillin (25 μg), cefaclor (30 μg), ceftizoxime (30 μg), cefixime (5 μg), cefazidime (30 μg), gentamicin (200 μg), streptomycin (25 μg), chloramphenicol (50 μg), nalidixic acid (30 μg) and ciprofloxacin (1 μg). Most of the isolates showed multi-drug resistance to two or more antibiotics. Chloramphenicol, nalidixic acid, ciprofloxacin, cefazidime and cefixime were the most sensitive drugs with 100% efficacy whereas ampicillin, cefaclor and streptomycin were the most resistant drugs having 100, 67 and 50 resistance, respectively. There was low resistance against ceftizoxime (16.7%) and gentamicin (23.3%). These results indicates that all A. hydrophila isolated from River Njoro had complete resistance to ampicillin and showed variable resistance to cefaclor, streptomycin, gentamycin and ceftizoxime. R-plasmids were extracted from multi-drug resistance strains and separated by agarose gel (0.8%) electrophoresis for profiling. Plasmid profiling revealed that most of the multi-drug resistant isolates contained one plasmid of 21.0 kb. Although some strains exhibited different antimicrobial resistance patterns, all of their plasmids were of the same size (21.0 kb). However, there were no plasmids in the antimicrobial sensitive isolates. This study also indicates that plasmid 21.0 kb is common in A. hydrophila and is important for antimicrobial resistance and virulence. Further studies are required to ascertain the role of this plasmid as a virulence marker

CiteSeerX

Fast Exact String to D-Texts Alignments

Author: Garrison Erik
Mwaniki Njagi Moses
Pisanti Nadia
Publication venue
Publication date: 07/06/2022
Field of study

In recent years, aligning a sequence to a pangenome has become a central problem in genomics and pangenomics. A fast and accurate solution to this problem can serve as a toolkit to many crucial tasks such as read-correction, Multiple Sequences Alignment (MSA), genome assemblies, variant calling, just to name a few. In this paper we propose a new, fast and exact method to align a string to a D-string, the latter possibly representing an MSA, a pan-genome or a partial assembly. An implementation of our tool dsa is publicly available at https://github.com/urbanslug/ds

arXiv.org e-Print Archive

A draft human pangenome reference

Author: Abel Haley J.
Abou Tayoun Ahmad
Antonacci-Fulton Lucinda L.
Asri Mobin
Baid Gunjan
Baker Carl A.
Belyaeva Anastasiya
Billis Konstantinos
Bourque Guillaume
Buonaiuto Silvia
Carroll Andrew
Chaisson Mark
Chang Pi-Chuan
Chang Xian H.
Cheng Haoyu
Chu Justin
Cody Sarah
Colonna Vincenza
Cook Daniel E.
Cook-Deegan Robert M.
Cornejo Omar E.
Diekhans Mark
Doerr Daniel
Ebert Peter
Ebler Jana
Eichler Evan E.
Eizenga Jordan
Fairley Susan
Fedrigo Olivier
Felsenfeld Adam L.
Feng Xiaowen
Fischer Christian
Flicek Paul
Formenti Giulio
Frankish Adam
Fulton Robert S.
Gao Yan
Garg Shilpa
Garrison Erik
Garrison Nanibaa' A.
Giron Carlos Garcia
Green Richard E.
Groza Cristian
Guarracino Andrea
Haggerty Leanne
Hall Ira M.
Harvey William T.
Haukness Marina
Haussler David
Heumos Simon
Hickey Glenn
Hoekzema Kendra
Hourlier Thibaut
Howe Kerstin
Jain Miten
Jarvis Erich
Ji Hanlee P.
Kenny Eimear E.
Koenig Barbara A.
Kolesnikov Alexey
Korbel Jan O.
Kordosky Jennifer
Koren Sergey
Lee HoJoon
Lewis Alexandra P.
Li Heng
Liao Wen-Wei
Lu Shuangjia
Lu Tsung-Yu
Lucas Julian K.
Magalhães Hugo
Marco-Sola Santiago
Marijon Pierre
Markello Charles
Marschall Tobias
Martin Fergal J.
McCartney Ann
McDaniel Jennifer
Miga Karen H.
Mitchell Matthew W.
Monlong Jean
Mountcastle Jacquelyn
Munson Katherine M.
Mwaniki Moses Njagi
Nattestad Maria
Novak Adam M.
Nurk Sergey
Olsen Hugh E.
Olson Nathan D.
Paten Benedict
Pesout Trevor
Phillippy Adam M.
Popejoy Alice B.
Porubsky David
Prins Pjotr
Puiu Daniela
Rautiainen Mikko
Regier Allison A.
Rhie Arang
Sacco Samuel
Sanders Ashley D.
Schneider Valerie A.
Schultz Baergen I.
Shafin Kishwar
Sibbesen Jonas A.
Sirén Jouni
Smith Michael W.
Sofia Heidi J.
Thibaud-Nissen Françoise
Tomlinson Chad
Tricomi Francesca Floriana
Villani Flavia
Vollger Mitchell R.
Wagner Justin
Walenz Brian
Wang Ting
Wood Jonathan M. D.
Zimin Aleksey V.
Zook Justin M.
Publication venue
Publication date: 01/01/2023
Field of study

Here the Human Pangenome Reference Consortium presents a first draft of the human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals1. These assemblies cover more than 99% of the expected sequence in each genome and are more than 99% accurate at the structural and base pair levels. Based on alignments of the assemblies, we generate a draft pangenome that captures known variants and haplotypes and reveals new alleles at structurally complex loci. We also add 119 million base pairs of euchromatic polymorphic sequences and 1,115 gene duplications relative to the existing reference GRCh38. Roughly 90 million of the additional base pairs are derived from structural variation. Using our draft pangenome to analyse short-read data reduced small variant discovery errors by 34% and increased the number of structural variants detected per haplotype by 104% compared with GRCh38-based workflows, which enabled the typing of the vast majority of structural variant alleles per sample

Diposit Digital de Documents de la UAB

Recommended from our members

Gaps and complex structurally variant loci in phased genome assemblies

Author: Abel Haley J
Antonacci-Fulton Lucinda L
Asri Mobin
Baid Gunjan
Baker Carl A
Belyaeva Anastasiya
Billis Konstantinos
Bourque Guillaume
Buonaiuto Silvia
Carroll Andrew
Chaisson Mark JP
Chang Pi-Chuan
Chang Xian H
Cheng Haoyu
Chu Justin
Cody Sarah
Colonna Vincenza
Consortium Human Pangenome Reference
Cook Daniel E
Cook-Deegan Robert M
Cornejo Omar E
Diekhans Mark
Doerr Daniel
Ebert Peter
Ebert Peter
Ebler Jana
Eichler Evan E
Eichler Evan E
Eizenga Jordan M
Fairley Susan
Fedrigo Olivier
Felsenfeld Adam L
Feng Xiaowen
Fischer Christian
Flicek Paul
Formenti Giulio
Frankish Adam
Fulton Robert S
Gao Yan
Garg Shilpa
Garrison Erik
Garrison Nanibaa’ A
Giron Carlos Garcia
Green Richard E
Groza Cristian
Guarracino Andrea
Haggerty Leanne
Hall Ira M
Harvey William T
Harvey William T
Hasenfeld Patrick
Haukness Marina
Haussler David
Heumos Simon
Hickey Glenn
Hickey Glenn
Hoekzema Kendra
Hourlier Thibaut
Howe Kerstin
Jain Miten
Jarvis Erich D
Ji Hanlee P
Kenny Eimear E
Koenig Barbara A
Kolesnikov Alexey
Korbel Jan O
Korbel Jan O
Kordosky Jennifer
Koren Sergey
Lee HoJoon
Lewis Alexandra P
Li Heng
Liao Wen-Wei
Lu Shuangjia
Lu Tsung-Yu
Lucas Julian K
Magalhães Hugo
Marco-Sola Santiago
Marijon Pierre
Markello Charles
Marschall Tobias
Marschall Tobias
Martin Fergal J
McCartney Ann
McDaniel Jennifer
Miga Karen H
Mitchell Matthew W
Monlong Jean
Mountcastle Jacquelyn
Munson Katherine M
Mwaniki Moses Njagi
Nattestad Maria
Novak Adam M
Nurk Sergey
Paten Benedict
Porubsky David
Rozanski Allison N
Sanders Ashley D
Stober Catherine
Vollger Mitchell R
Publication venue: eScholarship, University of California
Publication date: 01/04/2023
Field of study

There has been tremendous progress in phased genome assembly production by combining long-read data with parental information or linked-read data. Nevertheless, a typical phased genome assembly generated by trio-hifiasm still generates more than 140 gaps. We perform a detailed analysis of gaps, assembly breaks, and misorientations from 182 haploid assemblies obtained from a diversity panel of 77 unique human samples. Although trio-based approaches using HiFi are the current gold standard, chromosome-wide phasing accuracy is comparable when using Strand-seq instead of parental data. Importantly, the majority of assembly gaps cluster near the largest and most identical repeats (including segmental duplications [35.4%], satellite DNA [22.3%], or regions enriched in GA/AT-rich DNA [27.4%]). Consequently, 1513 protein-coding genes overlap assembly gaps in at least one haplotype, and 231 are recurrently disrupted or missing from five or more haplotypes. Furthermore, we estimate that 6-7 Mbp of DNA are misorientated per haplotype irrespective of whether trio-free or trio-based approaches are used. Of these misorientations, 81% correspond to bona fide large inversion polymorphisms in the human species, most of which are flanked by large segmental duplications. We also identify large-scale alignment discontinuities consistent with 11.9 Mbp of deletions and 161.4 Mbp of insertions per haploid genome. Although 99% of this variation corresponds to satellite DNA, we identify 230 regions of euchromatic DNA with frequent expansions and contractions, nearly half of which overlap with 197 protein-coding genes. Such variable and incompletely assembled regions are important targets for future algorithmic development and pangenome representation

eScholarship - University of California

Recommended from our members

A draft human pangenome reference.

Author: Abel Haley J
Abou Tayoun Ahmad N
Antonacci-Fulton Lucinda L
Asri Mobin
Baid Gunjan
Baker Carl A
Belyaeva Anastasiya
Billis Konstantinos
Buonaiuto Silvia
Carroll Andrew
Chang Pi-Chuan
Chang Xian H
Cheng Haoyu
Chu Justin
Cody Sarah
Colonna Vincenza
Cook Daniel E
Cook-Deegan Robert M
Cornejo Omar E
Diekhans Mark
Doerr Daniel
Ebert Peter
Ebler Jana
Eizenga Jordan M
Fairley Susan
Fedrigo Olivier
Felsenfeld Adam L
Feng Xiaowen
Fischer Christian
Formenti Giulio
Frankish Adam
Fulton Robert S
Gao Yan
Garg Shilpa
Garrison Nanibaa' A
Giron Carlos Garcia
Green Richard E
Groza Cristian
Guarracino Andrea
Haggerty Leanne
Harvey William T
Haukness Marina
Heumos Simon
Hickey Glenn
Hoekzema Kendra
Hourlier Thibaut
Howe Kerstin
Jain Miten
Ji Hanlee P
Kenny Eimear E
Koenig Barbara A
Kolesnikov Alexey
Korbel Jan O
Kordosky Jennifer
Koren Sergey
Lee HoJoon
Lewis Alexandra P
Liao Wen-Wei
Lu Shuangjia
Lu Tsung-Yu
Lucas Julian K
Magalhães Hugo
Marco-Sola Santiago
Marijon Pierre
Markello Charles
Martin Fergal J
McCartney Ann
McDaniel Jennifer
Mitchell Matthew W
Monlong Jean
Mountcastle Jacquelyn
Munson Katherine M
Mwaniki Moses Njagi
Nattestad Maria
Novak Adam M
Nurk Sergey
Olsen Hugh E
Olson Nathan D
Pesout Trevor
Popejoy Alice B
Porubsky David
Prins Pjotr
Puiu Daniela
Rautiainen Mikko
Regier Allison A
Rhie Arang
Sacco Samuel
Sanders Ashley D
Schneider Valerie A
Schultz Baergen I
Shafin Kishwar
Sibbesen Jonas A
Sirén Jouni
Smith Michael W
Sofia Heidi J
Thibaud-Nissen Françoise
Tomlinson Chad
Tricomi Francesca Floriana
Villani Flavia
Vollger Mitchell R
Publication venue: eScholarship, University of California
Publication date: 01/05/2023
Field of study

Here the Human Pangenome Reference Consortium presents a first draft of the human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals1. These assemblies cover more than 99% of the expected sequence in each genome and are more than 99% accurate at the structural and base pair levels. Based on alignments of the assemblies, we generate a draft pangenome that captures known variants and haplotypes and reveals new alleles at structurally complex loci. We also add 119 million base pairs of euchromatic polymorphic sequences and 1,115 gene duplications relative to the existing reference GRCh38. Roughly 90 million of the additional base pairs are derived from structural variation. Using our draft pangenome to analyse short-read data reduced small variant discovery errors by 34% and increased the number of structural variants detected per haplotype by 104% compared with GRCh38-based workflows, which enabled the typing of the vast majority of structural variant alleles per sample

eScholarship - University of California

A draft human pangenome reference

Author: Abel Haley J.
Abou Tayoun Ahmad N.
Antonacci-Fulton Lucinda L.
Asri Mobin
Baid Gunjan
Baker Carl A.
Belyaeva Anastasiya
Billis Konstantinos
Bourque Guillaume
Buonaiuto Silvia
Carroll Andrew
Chaisson Mark J.P.
Chang Pi Chuan
Chang Xian H.
Cheng Haoyu
Chu Justin
Cody Sarah
Colonna Vincenza
Cook Daniel E.
Cook-Deegan Robert M.
Cornejo Omar E.
Diekhans Mark
Doerr Daniel
Ebert Peter
Ebler Jana
Eichler Evan E.
Eizenga Jordan M.
Fairley Susan
Fedrigo Olivier
Felsenfeld Adam L.
Feng Xiaowen
Fischer Christian
Flicek Paul
Formenti Giulio
Frankish Adam
Fulton Robert S.
Gao Yan
Garg Shilpa
Garrison Erik
Garrison Nanibaa’ A.
Giron Carlos Garcia
Green Richard E.
Groza Cristian
Guarracino Andrea
Haggerty Leanne
Hall Ira M.
Harvey William T.
Haukness Marina
Haussler David
Heumos Simon
Hickey Glenn
Hoekzema Kendra
Hourlier Thibaut
Howe Kerstin
Jain Miten
Jarvis Erich D.
Ji Hanlee P.
Kenny Eimear E.
Koenig Barbara A.
Kolesnikov Alexey
Korbel Jan O.
Kordosky Jennifer
Koren Sergey
Lee Ho Joon
Lewis Alexandra P.
Li Heng
Liao Wen Wei
Lu Shuangjia
Lu Tsung Yu
Lucas Julian K.
Magalhães Hugo
Marco-Sola Santiago
Marijon Pierre
Markello Charles
Marschall Tobias
Martin Fergal J.
McCartney Ann
McDaniel Jennifer
Miga Karen H.
Mitchell Matthew W.
Monlong Jean
Mountcastle Jacquelyn
Munson Katherine M.
Mwaniki Moses Njagi
Nattestad Maria
Novak Adam M.
Nurk Sergey
Olsen Hugh E.
Olson Nathan D.
Paten Benedict
Pesout Trevor
Phillippy Adam M.
Popejoy Alice B.
Porubsky David
Prins Pjotr
Puiu Daniela
Rautiainen Mikko
Regier Allison A.
Rhie Arang
Sacco Samuel
Sanders Ashley D.
Schneider Valerie A.
Schultz Baergen I.
Shafin Kishwar
Sibbesen Jonas A.
Sirén Jouni
Smith Michael W.
Sofia Heidi J.
Thibaud-Nissen Françoise
Tomlinson Chad
Tricomi Francesca Floriana
Villani Flavia
Vollger Mitchell R.
Wagner Justin
Walenz Brian
Wang Ting
Wood Jonathan M.D.
Zimin Aleksey V.
Zook Justin M.
Publication venue
Publication date: 01/01/2023
Field of study

Here the Human Pangenome Reference Consortium presents a first draft of the human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals 1. These assemblies cover more than 99% of the expected sequence in each genome and are more than 99% accurate at the structural and base pair levels. Based on alignments of the assemblies, we generate a draft pangenome that captures known variants and haplotypes and reveals new alleles at structurally complex loci. We also add 119 million base pairs of euchromatic polymorphic sequences and 1,115 gene duplications relative to the existing reference GRCh38. Roughly 90 million of the additional base pairs are derived from structural variation. Using our draft pangenome to analyse short-read data reduced small variant discovery errors by 34% and increased the number of structural variants detected per haplotype by 104% compared with GRCh38-based workflows, which enabled the typing of the vast majority of structural variant alleles per sample.</p

Online Research Database In Technology