Search CORE

320 research outputs found

ntLink: a toolkit for de novo genome assembly scaffolding and mapping using long reads

Author: Birol Inanc
Coombe Lauren
Nikolic Vladimir
Warren René L.
Wong Johnathan
Publication venue
Publication date: 20/01/2023
Field of study

With the increasing affordability and accessibility of genome sequencing data, de novo genome assembly is an important first step to a wide variety of downstream studies and analyses. Therefore, bioinformatics tools that enable the generation of high-quality genome assemblies in a computationally efficient manner are essential. Recent developments in long-read sequencing technologies have greatly benefited genome assembly work, including scaffolding, by providing long-range evidence that can aid in resolving the challenging repetitive regions of complex genomes. ntLink is a flexible and resource-efficient genome scaffolding tool that utilizes long-read sequencing data to improve upon draft genome assemblies built from any sequencing technologies, including the same long reads. Instead of using read alignments to identify candidate joins, ntLink utilizes minimizer-based mappings to infer how input sequences should be ordered and oriented into scaffolds. Recent improvements to ntLink have added important features such as overlap detection, gap-filling and in-code scaffolding iterations. Here, we present three basic protocols demonstrating how to use each of these new features to yield highly contiguous genome assemblies, while still maintaining ntLink's proven computational efficiency. Further, as we illustrate in the alternate protocols, the lightweight minimizer-based mappings that enable ntLink scaffolding can also be utilized for other downstream applications, such as misassembly detection. With its modularity and multiple modes of execution, ntLink has broad benefit to the genomics community, from genome scaffolding and beyond. ntLink is an open-source project and is freely available from https://github.com/bcgsc/ntLink.Comment: 23 pages, 2 figure

arXiv.org e-Print Archive

DIDA: Distributed Indexing Dispatched Alignment

Author: Birol Inanc
Birol Inanc
Breshears Clay P.
Breshears Clay P.
Chu Justin
Chu Justin
Jackman Shaun D.
Jackman Shaun D.
Mohamadi Hamid
Mohamadi Hamid
Raymond Anthony
Raymond Anthony
Vandervalk Benjamin P.
Vandervalk Benjamin P.
Publication venue: 'Simon Fraser University Library'
Publication date: 01/01/2015
Field of study

Summit Research Repository (Simon Fraser University)

Swarm v3: towards tera-scale amplicon clustering

Author: Birol Inanc
Czech Lucas
de Vargas Colomban
Dunthorn Micah
Mahé Frédéric
Quince Christopher
Rognes Torbjørn
Stamatakis Alexandros
Publication venue: Oxford University Press
Publication date: 13/12/2022
Field of study

Motivation: Previously we presented swarm, an open-source amplicon clustering programme that produces fine-scale molecular operational taxonomic units (OTUs) that are free of arbitrary global clustering thresholds. Here, we present swarm v3 to address issues of contemporary datasets that are growing towards tera-byte sizes. Results: When compared with previous swarm versions, swarm v3 has modernized C++ source code, reduced memory footprint by up to 50%, optimized CPU-usage and multithreading (more than 7 times faster with default parameters), and it has been extensively tested for its robustness and logic

KITopen

Identifying cancer mutation targets across thousands of samples: MuteProc, a high throughput mutation analysis pipeline

Author: Alireza Hadj Khodabakhshi
Anthony P Fejes
Inanc Birol
Steven JM Jones
Publication venue: Springer Science and Business Media LLC
Publication date: 01/01/2013
Field of study

BACKGROUND: In the past decade, bioinformatics tools have matured enough to reliably perform sophisticated primary data analysis on Next Generation Sequencing (NGS) data, such as mapping, assemblies and variant calling, however, there is still a dire need for improvements in the higher level analysis such as NGS data organization, analysis of mutation patterns and Genome Wide Association Studies (GWAS). RESULTS: We present a high throughput pipeline for identifying cancer mutation targets, capable of processing billions of variations across thousands of samples. This pipeline is coupled with our Human Variation Database to provide more complex down stream analysis on the variations hosted in the database. Most notably, these analysis include finding significantly mutated regions across multiple genomes and regions with mutational preferences within certain types of cancers. The results of the analysis is presented in HTML summary reports that incorporate gene annotations from various resources for the reported regions. CONCLUSION: MuteProc is available for download through the Vancouver Short Read Analysis Package on Sourceforge: http://vancouvershortr.sourceforge.net. Instructions for use and a tutorial are provided on the accompanying wiki pages at https://sourceforge.net/apps/mediawiki/vancouvershortr/index.php?title=Pipeline_introduction

Crossref

Springer - Publisher Connector

PubMed Central

Recommended from our members

The genetic landscape of high-risk neuroblastoma

Author: Ally Adrian
Asgharzadeh Shahab
Attiyeh Edward F.
Auclair Daniel
Auvil Jaime M. Guidry
Badgett Thomas
Birol Inanc
Carter Scott L.
Chiu Readman
Cibulskis Kristian
Cole Kristina A.
Corbett Richard D.
Diamond Maura
Diskin Sharon J.
Gabriel Stacey B.
Gastier-Foster Julie M.
Gerhard Daniela S.
Getz Gad
Hanna Megan
Hirst Martin
Hogarty Michael D.
Jackman Shaun D.
Ji Lingyun
Jones Steven J. M.
Kamoh Baljit
Khan Javed
Khodabakshi Alireza Hadj
Kiezun Adam
Kim Jaegil
Krzywinski Martin
Lander Eric S.
Lawrence Michael S.
Lichenstein Lee
Lo Allan
London Wendy B.
Maris John M.
Marra Marco A.
McKenna Aaron
Meyerson Matthew
Moore Richard A.
Morozova Olena
Mosse Yael P.
Moyer Yvonne
Mungall Karen L.
Pedamallu Chandra Sekhar
Pugh Trevor J.
Qian Jenny
Ramos Alex H.
Seeger Robert C.
Shefler Erica
Sivachenko Andrey
Smith Malcolm A.
Sougnez Carrie
Sposto Richard
Stewart Chip
Tam Angela
Thiessen Nina
Wei Jun S.
Wood Andrew C.
Zhao Yongjun
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 10/03/2014
Field of study

Neuroblastoma is a malignancy of the developing sympathetic nervous system that often presents with widespread metastatic disease, resulting in survival rates of less than 50%1. To determine the spectrum of somatic mutation in high-risk neuroblastoma, we studied 240 cases using a combination of whole exome, genome and transcriptome sequencing as part of the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) initiative. Here we report a low median exonic mutation frequency of 0.60 per megabase (0.48 non-silent), and remarkably few recurrently mutated genes in these tumors. Genes with significant somatic mutation frequencies included ALK (9.2% of cases), PTPN11 (2.9%), ATRX (2.5%, an additional 7.1% had focal deletions), MYCN (1.7%, a recurrent p.Pro44Leu alteration), and NRAS (0.83%). Rare, potentially pathogenic germline variants were significantly enriched in ALK, CHEK2, PINK1, and BARD1. The relative paucity of recurrent somatic mutations in neuroblastoma challenges current therapeutic strategies reliant upon frequently altered oncogenic drivers

Harvard University - DASH

Long-insert sequence capture detects high copy numbers in a defence-related beta-glucosidase gene βglu-1 with large variations in white spruce but not Norway spruce

Author: Birol Inanc
Bohlmann Joerg
Bousquet Jean
Clegg Sonya
Erbilgin Nadir
Hung Tin
Jansons Āris
MacKay John
Ullah Aziz
Wu Ernest Ting Yu
Zeltiņš Pauls
Publication venue: BioMed Central
Publication date: 27/01/2024
Field of study

Conifers are long-lived and slow-evolving, thus requiring effective defences against their fast-evolving insect natural enemies. The copy number variation (CNV) of two key acetophenone biosynthesis genes Ugt5/Ugt5b and βglu-1 may provide a plausible mechanism underlying the constitutively variable defence in white spruce (Picea glauca) against its primary defoliator, spruce budworm. This study develops a long-insert sequence capture probe set (Picea_hung_p1.0) for quantifying copy number of βglu-1-like, Ugt5-like genes and single-copy genes on 38 Norway spruce (Picea abies) and 40 P. glauca individuals from eight and nine provenances across Europe and North America respectively. We developed local assemblies (Piabi_c1.0 and Pigla_c.1.0), full-length transcriptomes (PIAB_v1 and PIGL_v1), and gene models to characterise the diversity of βglu-1 and Ugt5 genes. We observed very large copy numbers of βglu-1, with up to 381 copies in a single P. glauca individual. We observed among-provenance CNV of βglu-1 in P. glauca but not P. abies. Ugt5b was predominantly single-copy in both species. This study generates critical hypotheses for testing the emergence and mechanism of extreme CNV, the dosage effect on phenotype, and the varying copy number of genes with the same pathway. We demonstrate new approaches to overcome experimental challenges in genomic research in conifer defences

Oxford University Research Archive

Conifers Concentrate Large Numbers of NLR Immune Receptor Genes on One Chromosome

Author: A’Hara Stuart
Birol Inanc
Bohlmann Joerg
Bousquet Jean
Cottrell Joan
Girardi Sebastien
Hung Tin Hang
Ilska Joana J
MacKay John J
McLean Paul
Tumas Hayley
van Ghelder Cyril
Woolliams John A
Woudstra Yannick
Publication venue: Oxford University Press
Publication date: 24/05/2024
Field of study

Nucleotide-binding domain and leucine-rich repeat (NLR) immune receptor genes form a major line of defense in plants, acting in both pathogen recognition and resistance machinery activation. NLRs are reported to form large gene clusters in limber pine (Pinus flexilis), but it is unknown how widespread this genomic architecture may be among the extant species of conifers (Pinophyta). We used comparative genomic analyses to assess patterns in the abundance, diversity, and genomic distribution of NLR genes. Chromosome-level whole genome assemblies and high-density linkage maps in the Pinaceae, Cupressaceae, Taxaceae, and other gymnosperms were scanned for NLR genes using existing and customized pipelines. The discovered genes were mapped across chromosomes and linkage groups and analyzed phylogenetically for evolutionary history. Conifer genomes are characterized by dense clusters of NLR genes, highly localized on one chromosome. These clusters are rich in TNL-encoding genes, which seem to have formed through multiple tandem duplication events. In contrast to angiosperms and nonconiferous gymnosperms, genomic clustering of NLR genes is ubiquitous in conifers. NLR-dense genomic regions are likely to influence a large part of the plant's resistance, informing our understanding of adaptation to biotic stress and the development of genetic resources through breeding

Oxford University Research Archive

HAL: Hyper Article en Ligne

Draft Genome Of The Mountain Pine Beetle, Dendroctonus Ponderosae Hopkins, A Major Forest Pest

Author: Birol Inanc
Birol Inanc
Bohlmann Joerg
Bohlmann Joerg
Chan Simon
Chan Simon
Henderson Hannah
Henderson Hannah
Jackman Shaun
Jackman Shaun
Janes Jasmine
Janes Jasmine
Jones Steven
Jones Steven
Keeling Christopher
Keeling Christopher
Li Maria
Li Maria
Liao Nancy
Liao Nancy
Moore Richard
Moore Richard
Nguyen Anh
Nguyen Anh
Palmquist Diana
Palmquist Diana
Pandoh Pawan
Pandoh Pawan
Roderick Docking T.
Roderick Docking T.
Sperling Felix
Sperling Felix
Taylor Greg
Taylor Greg
W Huber Dezene
W Huber Dezene
Yuen Macaire
Yuen Macaire
Zhao Yongjun
Zhao Yongjun
Publication venue: 'Simon Fraser University Library'
Publication date: 01/01/2013
Field of study

Summit Research Repository (Simon Fraser University)