Search CORE

6 research outputs found

The third international hackathon for applying insights into large-scale genomic composition to use cases in a wide range of organisms

Author: Agustinho Daniel Paiva
Aliyev Elbay
Avdeyev Pavel
Barrozo Enrico R.
Behera Sairam
Billingsley Kimberley
Busby Ben
Chen Guangyi
Chong Li Chuin
Choubey Deepak
Dabbaghie Fawaz
De Coster Wouter
Fu Yilei
Gener Alejandro R.
Hefferon Timothy
Henke David Morgan
Höps Wolfram
Illarionova Anastasia
Jochum Michael D.
Jose Maria
Kalra Divya
Kesharwani Rupesh K.
Khleifat Ahmad Al
Kolora Sree Rohit Raj
Kubica Jedrzej
Lakra Priya
Lattimer Damaris
Liew Chia-Sin
Lo Bai-Wei
Lo Chunhsuan
Lowdon Rebecca
Lötter Anneri
Mahmoud Medhat
Majidian Sina
Mendem Suresh Kumar
Molik David
Mondal Rajarshi
Ohmiya Hiroko
Parvin Nasrin
Paulin Luis F.
Peralta Carolina
Pfeifer Susanne P.
Poon Chi-Lam
Prabhakaran Ramanandan
Raza Muhammad Sohail
Saitou Marie
Sammi Aditi
Sanio Philippe
Sapoval Nicolae
Sedlazeck Fritz J
Soto Daniela C.
Syed Najeeb
Treangen Todd
Walker Kimberly
Wang Gaojianyong
Xu Tiancheng
Yang Jianzhi
Zhang Shangzhe
Zhou Weiyu
Publication venue: 'F1000 Research Ltd'
Publication date: 01/01/2022
Field of study

publishedVersio

Brage NMBU

PubMed Central

University of St. Andrews - Pure

UPSpace at the University of Pretoria

PanPA: generation and alignment of panproteome graphs

Author: Dabbaghie Fawaz
Kalinina Olga V.
Marschall Tobias
Srikakulam Sanjay K.
Publication venue: Saarländische Universitäts- und Landesbibliothek
Publication date: 01/01/2023
Field of study

Motivation: Compared to eukaryotes, prokaryote genomes are more diverse through different mechanisms, including a higher mutation rate and horizontal gene transfer. Therefore, using a linear representative reference can cause a reference bias. Graph-based pangenome methods have been developed to tackle this problem. However, comparisons in DNA space are still challenging due to this high diversity. In contrast, amino acid sequences have higher similarity due to evolutionary constraints, whereby a single amino acid may be encoded by several synonymous codons. Coding regions cover the majority of the genome in prokaryotes. Thus, panproteomes present an attractive alternative leveraging the higher sequence similarity while not losing much of the genome in non-coding regions. Results: We present PanPA, a method that takes a set of multiple sequence alignments of protein sequences, indexes them, and builds a graph for each multiple sequence alignment. In the querying step, it can align DNA or amino acid sequences back to these graphs. We first showcase that PanPA generates correct alignments on a panproteome from 1350 Escherichia coli. To demonstrate that panproteomes allow comparisons at longer phylogenetic distances, we compare DNA and protein alignments from 1073 Salmonella enterica assemblies against E. coli reference genome, pangenome, and panproteome using BWA, GraphAligner, and PanPA, respectively; with PanPA aligning around 22% more sequences. We also aligned a DNA short-reads whole genome sequencing (WGS) sample from S.enterica against the E.coli reference with BWA and the panproteome with PanPA, where PanPA was able to find alignment for 68% of the reads compared to 5% with BWA

Acronym

MetaProFi: an ultrafast chunked Bloom filter for storing and querying protein and nucleotide sequence data for accurate identification of functionally relevant genetic variants

Author: Bals Robert
Dabbaghie Fawaz
Kalinina Olga V.
Keller Sebastian
Srikakulam Sanjay K.
Publication venue: Saarländische Universitäts- und Landesbibliothek
Publication date: 01/01/2023
Field of study

Motivation: Bloom filters are a popular data structure that allows rapid searches in large sequence datasets. So far, all tools work with nucleotide sequences; however, protein sequences are conserved over longer evolutionary distances, and only mutations on the protein level may have any functional significance. Results: We present MetaProFi, a Bloom filter-based tool that, for the first time, offers the functionality to build indexes of amino acid sequences and query them with both amino acid and nucleotide sequences, thus bringing sequence comparison to the biologically relevant protein level. MetaProFi implements additional efficient engineering solutions, such as a shared memory system, chunked data storage and efficient compression. In addition to its conceptual novelty, MetaProFi demonstrates state-of-the-art performance and excellent memory consumption-to-speed ratio when applied to various large datasets. Availability and implementation: Source code in Python is available at https://github.com/kalininalab/metaprofi. Contact: [email protected]

Acronym

An international virtual hackathon to build tools for the analysis of structural variants within species ranging from coronaviruses to vertebrates

Author: Agustinho Daniel Paiva
Al Khleifat Ahmad
Albin Dreycey
Aliyev Elbay
Almabrazi Hakeem
Arslan Ahmed
Balaji Advait
Behera Sairam
Biederstedt Evan
Billingsley Kimberley
Busby Ben
C. Soto Daniela
Chaisson Mark
Chin Chen Shan
Dabbaghie Fawaz
Daw Joyjit
Dawood Moez
De Coster Wouter
Du Haowei
Dunn Christopher
English Adam
Esteban Rocio
Hefferon Timothy
J Sedlazeck Fritz
J. Treangen Todd
Jochum Michael
Jolly Angad
K Kesharwani Rupesh
Kalra Divya
Kille Bryce
Kronenberg Zev
L Cameron Daniel
Liao Chunxiao
Liu Yunxi
Lu Tsung Yu
M Havrilla James
M Khayat Michael
Mahmoud Medhat
Marin Maximillian
Mc Cartney Ann M.
Monlong Jean
Price Stephen
Rafael Gener Alejandro
Ren Jingwen
Sagayaradj Sagayamary
Sapoval Nicolae
Sinner Claude
Smolka Moritz
Soylev Arda
Subramaniyan Arun
Syed Najeeb
T. Dawson Eric
Tadimeti Neha
Tater Pamella
Vats Pankaj
Vaughn Justin
Walker Kimberly
Wang Gaojianyong
Zeng Qiandong
Zhang Shangzhe
Zhao Tingting
Zorman Barry
Publication venue: 'F1000 Research Ltd'
Publication date: 01/01/2021
Field of study

In October 2020, 62 scientists from nine nations worked together remotely in the Second Baylor College of Medicine & DNAnexus hackathon, focusing on different related topics on Structural Variation, Pan-genomes, and SARS-CoV-2 related research. The overarching focus was to assess the current status of the field and identify the remaining challenges. Furthermore, how to combine the strengths of the different interests to drive research and method development forward. Over the four days, eight groups each designed and developed new open-source methods to improve the identification and analysis of variations among species, including humans and SARS-CoV-2. These included improvements in SV calling, genotyping, annotations and filtering. Together with advancements in benchmarking existing methods. Furthermore, groups focused on the diversity of SARS-CoV-2. Daily discussion summary and methods are available publicly at https://github.com/collaborativebioinformatics provides valuable insights for both participants and the research community

Directory of Open Access Journals

King's Research Portal

University of St. Andrews - Pure