Search CORE

36 research outputs found

genomeRxiv : a microbial whole-genome database and diagnostic marker design resource for classification, identification, and data sharing

Author: Brown C Titus
Harrington Bailey
Heath Lenwood
Irber Luiz
Mazloom Reza
Pierce Tessa
Pritchard Leighton
Sharma Parul
Vinatzer Boris
Publication venue: 'Microbiology Society'
Publication date: 27/05/2022
Field of study

genomeRxiv is a newly-funded US-UK collaboration to provide a public, web-accessible database of public genome sequences, accurately catalogued and classified by whole-genome similarity independent of their taxonomic affiliation. Our goal is to supply the basic and applied research community with rapid, precise and accurate identification of unknown isolates based on genome sequence alone, and with molecular tools for environmental analysis. The DNA sequencing revolution enabled the use of cultured and uncultured microorganism genomes for fast and precise identification. However, precise identification is impossible without 1. reference databases that precisely circumscribe classes of microorganisms, and label these with their uniquely-shared characteristics 2. fast algorithms that can handle the volumes of genome data Our approach integrates the highly-resolved classification framework of Life Identification Numbers (LINs) with the speed and computational efficiency of sourmash and k-mer hashing algorithms, and the precision and filtering of average nucleotide identity (ANI). We aim to construct a single genome-based indexing scheme that extends from phylum to strain, enabling the unique and consistent placement of any sequenced prokaryote genome. genomeRxiv includes protocols for confidentiality, allowing groups to identify and announce the identities of newly-sequenced organisms without sharing genome data directly. This protects communities working with commercially- and ethically-sensitive organisms (e.g. production engineering strains, potential bioweapons, and to enable benefit sharing with indigenous communities). genomeRxiv will also provide online capability to design molecular diagnostic tools for metabarcoding and qPCR, to enable tracking of specific groupings of bacteria directly in the environment

University of Strathclyde Institutional Repository

Biogeographic distribution of five Antarctic cyanobacteria using large-scale k-mer searching with sourmash branchwater

Author: Anne D. Jungblut
C. Titus Brown
Christen L. Grettenberger
Christen L. Grettenberger
Dawn Y. Sumner
Jessica Lumian
Luiz Irber
N. Tessa Pierce-Ward
Publication venue: Frontiers Media S.A.
Publication date: 01/02/2024
Field of study

Cyanobacteria form diverse communities and are important primary producers in Antarctic freshwater environments, but their geographic distribution patterns in Antarctica and globally are still unresolved. There are however few genomes of cultured cyanobacteria from Antarctica available and therefore metagenome-assembled genomes (MAGs) from Antarctic cyanobacteria microbial mats provide an opportunity to explore distribution of uncultured taxa. These MAGs also allow comparison with metagenomes of cyanobacteria enriched communities from a range of habitats, geographic locations, and climates. However, most MAGs do not contain 16S rRNA gene sequences, making a 16S rRNA gene-based biogeography comparison difficult. An alternative technique is to use large-scale k-mer searching to find genomes of interest in public metagenomes. This paper presents the results of k-mer based searches for 5 Antarctic cyanobacteria MAGs from Lake Fryxell and Lake Vanda, assigned the names Phormidium pseudopriestleyi FRX01, Microcoleus sp. MP8IB2.171, Leptolyngbya sp. BulkMat.35, Pseudanabaenaceae cyanobacterium MP8IB2.15, and Leptolyngbyaceae cyanobacterium MP9P1.79 in 498,942 unassembled metagenomes from the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA). The Microcoleus sp. MP8IB2.171 MAG was found in a wide variety of environments, the P. pseudopriestleyi MAG was found in environments with challenging conditions, the Leptolyngbyaceae cyanobacterium MP9P1.79 MAG was only found in Antarctica, and the Leptolyngbya sp. BulkMat.35 and Pseudanabaenaceae cyanobacterium MP8IB2.15 MAGs were found in Antarctic and other cold environments. The findings based on metagenome matches and global comparisons suggest that these Antarctic cyanobacteria have distinct distribution patterns ranging from locally restricted to global distribution across the cold biosphere and other climatic zones

Directory of Open Access Journals

Context-aware genomic surveillance reveals hidden transmission of a carbapenemase-producing Klebsiella pneumoniae

Author: Blumenscheit Christian
Brandt Christian
Gatermann Sören
Hans Jörg B
Holt Kathryn E
Hölzer Martin
Irber Luiz
König Brigitte
Lippmann Norman
Lübbert Christoph
Pletz Mathias W
Viehweger Adrian
Wyres Kelly L
Publication venue: 'Microbiology Society'
Publication date: 16/12/2021
Field of study

Genomic surveillance can inform effective public health responses to pathogen outbreaks. However, integration of non-local data is rarely done. We investigate two large hospital outbreaks of a carbapenemase-carrying Klebsiella pneumoniae strain in Germany and show the value of contextual data. By screening about 10 000 genomes, over 400 000 metagenomes and two culture collections using in silico and in vitro methods, we identify a total of 415 closely related genomes reported in 28 studies. We identify the relationship between the two outbreaks through time-dated phylogeny, including their respective origin. One of the outbreaks presents extensive hidden transmission, with descendant isolates only identified in other studies. We then leverage the genome collection from this meta-analysis to identify genes under positive selection. We thereby identify an inner membrane transporter (ynjC) with a putative role in colistin resistance. Contextual data from other sources can thus enhance local genomic surveillance at multiple levels and should be integrated by default when available

LSHTM Research Online

PubMed Central

Publikationsserver des Robert Koch-Instituts

The khmer software package: enabling efficient nucleotide sequence analysis

Author: Alameldin Hussein
Awad Sherine
Boucher Elmar
Brown C. Titus
Caldwell Adam
Cartwright Reed
Charbonneau Amanda
Constantinides Bede
Crusoe Michael
Edvenson Greg
Fay Scott
Fenton Jacob
Fenzl Thomas
Fish Jordan
Garcia-Gutierrez Leonor
Garland Phillip
Gluck Jonathan
González Iván
Guermond Sarah
Guo Jiarong
Gupta Aditi
Herr Joshua
Howe Adina
Howe Adina
Hyer Alex
Härpfer Andreas
Irber Luiz
Kidd Rhys
Lin David
Lippi Justin
Mansour Tamer
McA'Nulty Pamela
McDonald Erin
Mizzi Jessica
Murray Kevin
Nahum Joshua
Nanlohy Kaben
Nederbragt Alexander
Ortiz-Zuazaga Humberto
Ory Jeramia
Pell Jason
Pepe-Ranney Charles
Russ Zachary
Schwarz Erich
Scott Camille
Seaman Josiah
Sievert Scott
Simpson Jared
Skennerton Connor
Spencer James
Srinivasan Ramakrishnan
Standage Daniel
Stapleton James
Stein Joe
Steinman Susan
Taylor Benjamin
Tremble Will
Wiencko Heather
Wright Michael
Wyss Brian
Zhang Qingpeng
zyme en
Publication venue: Iowa State University Digital Repository
Publication date: 25/09/2015
Field of study

The khmer package is a freely available software library for working efficiently with fixed length DNA words, or k-mers. khmer provides implementations of a probabilistic k-mer counting data structure, a compressible De Bruijn graph representation, De Bruijn graph partitioning, and digital normalization. khmer is implemented in C++ and Python, and is freely available under the BSD license at https://github.com/dib-lab/khmer/

Digital Repository @ Iowa State University (ISU)

PubMed Central

eScholarship - University of California

The khmer software package: enabling efficient nucleotide sequence analysis [version 1; referees: 2 approved, 1 approved with reservations]

Author: Alameldin Hussien F.
Awad Sherine
Boucher Elmar
Brown C. Titus
Caldwell Adam
Cartwright Reed
Charbonneau Amanda
Constantinides Bede
Crusoe Michael R.
Edvenson Greg
Fay Scott
Fenton Jacob
Fenzl Thomas
Fish Jordan
Garcia-Gutierrez Leonor
Garland Phillip
Gluck Jonathan
González Iván
Guermond Sarah
Guo Jiarong
Gupta Aditi
Herr Joshua R.
Howe Adina
Hyer Alex
Härpfer Andreas
Irber Luiz
Kidd Rhys
Lin David
Lippi Justin
Mansour Tamer
McA\u27Nulty Pamela
McDonald Eric
Mizzi Jessica
Murray Kevin D.
Nahum Joshua R.
Nanlohy Kaben
Nederbragt Alexander Johan
Ortiz-Zuazaga Humberto
Ory Jeramia
Pell Jason
Pepe-Ranney Charles
Russ Zachary N.
Schwarz Erich
Scott Camille
Seaman Josiah
Sievert 38 Scott
Simpson Jared
Skennerton Connor T.
Spencer James
Srinivasan Ramakrishnan
Standage Daniel
Stapleton James A.
Stein Joe
Steinman Susan R.
Taylor Benjamin
Trimble Will
Wiencko Heather L.
Wright Michael
Wyss Brian
Zhang Qingpeng
zyme en
Publication venue: DigitalCommons@University of Nebraska - Lincoln
Publication date: 15/10/2015
Field of study

Qualifying Examination: Overlap graph-based sequence assembly in bioinformatics

Author: Luiz Irber (544408)
Publication venue
Publication date
Field of study

This is my qualifying examination final report and presentation for the oral exam, presented at Michigan State University on August 29th. It reviews how the genome assembly problem formulation changed since the foundation of genome sequencing, with special attention to overlap graph-based sequence assembly. Three assemblers implementing this theoretical framework are reviewed, and a fourth one based on de Bruijn graph serves as a comparison between shared methods and differences on each approach. Finally possible directions for future works are discussed.</p

FigShare

Decentralizing Indices for Genomic Data

Author: Irber Luiz Carlos, Jr.
Publication venue: University of California, Davis
Publication date: 01/01/2020
Field of study

Biology as a field is being transformed by the increasing availability of data, especially genomic sequencing data. Computational methods that can adapt and take advantage of this data deluge are essential for exploring and providing insights for new hypotheses, helping to unveil the biological processes that were previously expensive or even impossible to study. This dissertation introduces data structures and approaches for scaling data analysis to hundreds of thousands of DNA sequencing datasets using Scaled MinHash sketches, a reduced space representation of the original datasets that can lower computational requirements for similarity and containment estimation; MHBT and LCA indices, structures for indexing and searching large collections of Scaled MinHash sketches; gather, a new top-down approach for decomposing datasets into a collection of reference components that can be implemented efficiently with Scaled MinHash sketches and MHBT and LCA indices; wort, a distributed system for large scale sketch computation across heterogeneous systems, from laptops to academic clusters and cloud instances, including prototypes for containment searches across millions of datasets; as well as explorations on how to facilitate sharing and increase the resilience of sketches collections built from public genomic data

ProQuest OAI Repository

castelao/seabird: 0.10

Author: Guilherme Castelão
Luiz Irber
Publication venue
Publication date
Field of study

Python parser for Sea-Bird CTD outputs, usually .cnv files

ZENODO

Oceansound demonstration

Author: Arnaldo Russo (546336)
Luiz Irber (544408)
Publication venue
Publication date
Field of study

This is the code for the demonstration of OceanSound, using Flask to create a backend and OpenLayers + MIDI.js on the browser to select lat/lon and play music. Live demo at http://ocean.datasounds.org/</p

FigShare

ImSound - Access sound of 2D dataset with mouse movements.

Author: Arnaldo Russo (546336)
Luiz Irber (544408)
Publication venue
Publication date
Field of study

Access sound of 2D dataset with mouse movements. Program implemented in Python for free use inside other scripts.</p

FigShare