9 research outputs found
Comparing Elastic-Degenerate Strings: Algorithms, Lower Bounds, and Applications
An elastic-degenerate (ED) string T is a sequence of n sets T[1], . . ., T[n] containing m strings in total whose cumulative length is N. We call n, m, and N the length, the cardinality and the size of T, respectively. The language of T is defined as L(T) = {S1 · · · Sn : Si ∈ T[i] for all i ∈ [1, n]}. ED strings have been introduced to represent a set of closely-related DNA sequences, also known as a pangenome. The basic question we investigate here is: Given two ED strings, how fast can we check whether the two languages they represent have a nonempty intersection? We call the underlying problem the ED String Intersection (EDSI) problem. For two ED strings T1 and T2 of lengths n1 and n2, cardinalities m1 and m2, and sizes N1 and N2, respectively, we show the following: There is no O((N1N2)1−ϵ)-time algorithm, thus no O ((N1m2 + N2m1)1−ϵ)-time algorithm and no O ((N1n2 + N2n1)1−ϵ)-time algorithm, for any constant ϵ > 0, for EDSI even when T1 and T2 are over a binary alphabet, unless the Strong Exponential-Time Hypothesis is false. There is no combinatorial O((N1 + N2)1.2−ϵf(n1, n2))-time algorithm, for any constant ϵ > 0 and any function f, for EDSI even when T1 and T2 are over a binary alphabet, unless the Boolean Matrix Multiplication conjecture is false. An O(N1 log N1 log n1 + N2 log N2 log n2)-time algorithm for outputting a compact (RLE) representation of the intersection language of two unary ED strings. In the case when T1 and T2 are given in a compact representation, we show that the problem is NP-complete. An O(N1m2 + N2m1)-time algorithm for EDSI. An Õ(N1ω−1n2 + N2ω−1n1)-time algorithm for EDSI, where ω is the exponent of matrix multiplication; the Õ notation suppresses factors that are polylogarithmic in the input size. We also show that the techniques we develop have applications outside of ED string comparison
Antimicrobial resistance and plasmid profiles of Aeromonas hydrophila isolated from River Njoro, Kenya
The purpose of this study was to investigate the presence of Aeromonas hydrophila at commonly used water collection points on the River Njoro and to determine the in-vitro antimicrobial susceptibility and plasmid profiles of isolates. In total, 126 samples were collected and 36.5% of them were positive for A. hydrophila. The A. hydrophila were recovered on membrane filters, cultured on Trypticase Soy agar, Bile aesculin agar and Aeromonas Medium agar. They were further characterized using cytochrome oxidase and API 20E tests. Detection of drug susceptibility was determined using modified disc diffusion method to ampicillin (25 ìg), cefaclor (30 ìg), ceftizoxime (30 ìg), cefixime (5 ìg), cefazidime (30 ìg), gentamicin (200 ìg), streptomycin (25 ìg), chloramphenicol (50 ìg), nalidixic acid (30 ìg) and ciprofloxacin (1 ìg). Most of the isolates showed multi-drug resistance to two or more antibiotics. Chloramphenicol, nalidixic acid, ciprofloxacin, cefazidime and cefixime were the most sensitive drugs with 100% efficacy whereas ampicillin, cefaclor and streptomycin were the most resistant drugs having 100, 67 and 50 resistance, respectively. There was low resistance against ceftizoxime (16.7%) and gentamicin (23.3%). These results indicates that all A. hydrophila isolated from River Njoro had complete resistance to ampicillin and showed variable resistance to cefaclor, streptomycin, gentamycin and ceftizoxime. R-plasmids were extracted from multi-drug resistance strains and separated by agarose gel (0.8%) electrophoresis for profiling. Plasmid profiling revealed that most of the multi-drug resistant isolates contained one plasmid of 21.0 kb. Although some strains exhibited different antimicrobial resistance patterns, all of their plasmids were of the same size (21.0 kb). However, there were no plasmids in the antimicrobial sensitive isolates. This study also indicates that plasmid 21.0 kb is common in A. hydrophila and is important for antimicrobial resistance and virulence. Further studies are required to ascertain the role of this plasmid as a virulence marker.Key words: Aeromonas hydrophila, antimicrobial resistance, plasmid profile
Antimicrobial resistance and plasmid profiles of Aeromonas hydrophila isolated from River Njoro, Kenya
The purpose of this study was to investigate the presence of Aeromonas hydrophila at commonly used water collection points on the River Njoro and to determine the in-vitro antimicrobial susceptibility and plasmid profiles of isolates. In total, 126 samples were collected and 36.5% of them were positive for A. hydrophila. The A. hydrophila were recovered on membrane filters, cultured on Trypticase Soy agar, Bile aesculin agar and Aeromonas Medium agar. They were further characterized using cytochrome oxidase and API 20E tests. Detection of drug susceptibility was determined using modified disc diffusion method to ampicillin (25 μg), cefaclor (30 μg), ceftizoxime (30 μg), cefixime (5 μg), cefazidime (30 μg), gentamicin (200 μg), streptomycin (25 μg), chloramphenicol (50 μg), nalidixic acid (30 μg) and ciprofloxacin (1 μg). Most of the isolates showed multi-drug resistance to two or more antibiotics. Chloramphenicol, nalidixic acid, ciprofloxacin, cefazidime and cefixime were the most sensitive drugs with 100% efficacy whereas ampicillin, cefaclor and streptomycin were the most resistant drugs having 100, 67 and 50 resistance, respectively. There was low resistance against ceftizoxime (16.7%) and gentamicin (23.3%). These results indicates that all A. hydrophila isolated from River Njoro had complete resistance to ampicillin and showed variable resistance to cefaclor, streptomycin, gentamycin and ceftizoxime. R-plasmids were extracted from multi-drug resistance strains and separated by agarose gel (0.8%) electrophoresis for profiling. Plasmid profiling revealed that most of the multi-drug resistant isolates contained one plasmid of 21.0 kb. Although some strains exhibited different antimicrobial resistance patterns, all of their plasmids were of the same size (21.0 kb). However, there were no plasmids in the antimicrobial sensitive isolates. This study also indicates that plasmid 21.0 kb is common in A. hydrophila and is important for antimicrobial resistance and virulence. Further studies are required to ascertain the role of this plasmid as a virulence marker
Fast Exact String to D-Texts Alignments
In recent years, aligning a sequence to a pangenome has become a central
problem in genomics and pangenomics. A fast and accurate solution to this
problem can serve as a toolkit to many crucial tasks such as read-correction,
Multiple Sequences Alignment (MSA), genome assemblies, variant calling, just to
name a few. In this paper we propose a new, fast and exact method to align a
string to a D-string, the latter possibly representing an MSA, a pan-genome or
a partial assembly. An implementation of our tool dsa is publicly available at
https://github.com/urbanslug/ds
A draft human pangenome reference
Here the Human Pangenome Reference Consortium presents a first draft of the human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals1. These assemblies cover more than 99% of the expected sequence in each genome and are more than 99% accurate at the structural and base pair levels. Based on alignments of the assemblies, we generate a draft pangenome that captures known variants and haplotypes and reveals new alleles at structurally complex loci. We also add 119 million base pairs of euchromatic polymorphic sequences and 1,115 gene duplications relative to the existing reference GRCh38. Roughly 90 million of the additional base pairs are derived from structural variation. Using our draft pangenome to analyse short-read data reduced small variant discovery errors by 34% and increased the number of structural variants detected per haplotype by 104% compared with GRCh38-based workflows, which enabled the typing of the vast majority of structural variant alleles per sample
Recommended from our members
Gaps and complex structurally variant loci in phased genome assemblies
There has been tremendous progress in phased genome assembly production by combining long-read data with parental information or linked-read data. Nevertheless, a typical phased genome assembly generated by trio-hifiasm still generates more than 140 gaps. We perform a detailed analysis of gaps, assembly breaks, and misorientations from 182 haploid assemblies obtained from a diversity panel of 77 unique human samples. Although trio-based approaches using HiFi are the current gold standard, chromosome-wide phasing accuracy is comparable when using Strand-seq instead of parental data. Importantly, the majority of assembly gaps cluster near the largest and most identical repeats (including segmental duplications [35.4%], satellite DNA [22.3%], or regions enriched in GA/AT-rich DNA [27.4%]). Consequently, 1513 protein-coding genes overlap assembly gaps in at least one haplotype, and 231 are recurrently disrupted or missing from five or more haplotypes. Furthermore, we estimate that 6-7 Mbp of DNA are misorientated per haplotype irrespective of whether trio-free or trio-based approaches are used. Of these misorientations, 81% correspond to bona fide large inversion polymorphisms in the human species, most of which are flanked by large segmental duplications. We also identify large-scale alignment discontinuities consistent with 11.9 Mbp of deletions and 161.4 Mbp of insertions per haploid genome. Although 99% of this variation corresponds to satellite DNA, we identify 230 regions of euchromatic DNA with frequent expansions and contractions, nearly half of which overlap with 197 protein-coding genes. Such variable and incompletely assembled regions are important targets for future algorithmic development and pangenome representation
Recommended from our members
A draft human pangenome reference.
Here the Human Pangenome Reference Consortium presents a first draft of the human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals1. These assemblies cover more than 99% of the expected sequence in each genome and are more than 99% accurate at the structural and base pair levels. Based on alignments of the assemblies, we generate a draft pangenome that captures known variants and haplotypes and reveals new alleles at structurally complex loci. We also add 119 million base pairs of euchromatic polymorphic sequences and 1,115 gene duplications relative to the existing reference GRCh38. Roughly 90 million of the additional base pairs are derived from structural variation. Using our draft pangenome to analyse short-read data reduced small variant discovery errors by 34% and increased the number of structural variants detected per haplotype by 104% compared with GRCh38-based workflows, which enabled the typing of the vast majority of structural variant alleles per sample
A draft human pangenome reference
Here the Human Pangenome Reference Consortium presents a first draft of the human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals 1. These assemblies cover more than 99% of the expected sequence in each genome and are more than 99% accurate at the structural and base pair levels. Based on alignments of the assemblies, we generate a draft pangenome that captures known variants and haplotypes and reveals new alleles at structurally complex loci. We also add 119 million base pairs of euchromatic polymorphic sequences and 1,115 gene duplications relative to the existing reference GRCh38. Roughly 90 million of the additional base pairs are derived from structural variation. Using our draft pangenome to analyse short-read data reduced small variant discovery errors by 34% and increased the number of structural variants detected per haplotype by 104% compared with GRCh38-based workflows, which enabled the typing of the vast majority of structural variant alleles per sample.</p