Search CORE

88 research outputs found

Assembling genomes using short-read sequencing technology

Author: Birol İnanç
Jackman Shaun D
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Short-read sequencing technology can bring gigabase genome assemblies in under a million dollars

Crossref

PubMed Central

DIDA: Distributed Indexing Dispatched Alignment

Author: Birol Inanc
Breshears Clay P.
Chu Justin
Jackman Shaun D.
Mohamadi Hamid
Raymond Anthony
Vandervalk Benjamin P.
Publication venue
Publication date: 01/01/2015
Field of study

One essential application in bioinformatics that is affected by the high-throughput sequencing data deluge is the sequence alignment problem, where nucleotide or amino acid sequences are queried against targets to find regions of close similarity. When queries are too many and/or targets are too large, the alignment process becomes computationally challenging. This is usually addressed by preprocessing techniques, where the queries and/or targets are indexed for easy access while searching for matches. When the target is static, such as in an established reference genome, the cost of indexing is amortized by reusing the generated index. However, when the targets are non-static, such as contigs in the intermediate steps of a de novo assembly process, a new index must be computed for each run. To address such scalability problems, we present DIDA, a novel framework that distributes the indexing and alignment tasks into smaller subtasks over a cluster of compute nodes. It provides a workflow beyond the common practice of embarrassingly parallel implementations. DIDA is a cost-effective, scalable and modular framework for the sequence alignment problem in terms of memory usage and runtime. It can be employed in large-scale alignments to draft genomes and intermediate stages of de novo assembly runs. The DIDA source code, sample files and user manual are available through http://www.bcgsc.ca/platform/bioinfo/software/dida. The software is released under the British Columbia Cancer Agency License (BCCA), and is free for academic use

Directory of Open Access Journals

Simon Fraser University Institutional Repository

FigShare

Recommended from our members

The genetic landscape of high-risk neuroblastoma

Author: Ally Adrian
Asgharzadeh Shahab
Attiyeh Edward F.
Auclair Daniel
Auvil Jaime M. Guidry
Badgett Thomas
Birol Inanc
Carter Scott L.
Chiu Readman
Cibulskis Kristian
Cole Kristina A.
Corbett Richard D.
Diamond Maura
Diskin Sharon J.
Gabriel Stacey B.
Gastier-Foster Julie M.
Gerhard Daniela S.
Getz Gad
Hanna Megan
Hirst Martin
Hogarty Michael D.
Jackman Shaun D.
Ji Lingyun
Jones Steven J. M.
Kamoh Baljit
Khan Javed
Khodabakshi Alireza Hadj
Kiezun Adam
Kim Jaegil
Krzywinski Martin
Lander Eric S.
Lawrence Michael S.
Lichenstein Lee
Lo Allan
London Wendy B.
Maris John M.
Marra Marco A.
McKenna Aaron
Meyerson Matthew
Moore Richard A.
Morozova Olena
Mosse Yael P.
Moyer Yvonne
Mungall Karen L.
Pedamallu Chandra Sekhar
Pugh Trevor J.
Qian Jenny
Ramos Alex H.
Seeger Robert C.
Shefler Erica
Sivachenko Andrey
Smith Malcolm A.
Sougnez Carrie
Sposto Richard
Stewart Chip
Tam Angela
Thiessen Nina
Wei Jun S.
Wood Andrew C.
Zhao Yongjun
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 10/03/2014
Field of study

Neuroblastoma is a malignancy of the developing sympathetic nervous system that often presents with widespread metastatic disease, resulting in survival rates of less than 50%1. To determine the spectrum of somatic mutation in high-risk neuroblastoma, we studied 240 cases using a combination of whole exome, genome and transcriptome sequencing as part of the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) initiative. Here we report a low median exonic mutation frequency of 0.60 per megabase (0.48 non-silent), and remarkably few recurrently mutated genes in these tumors. Genes with significant somatic mutation frequencies included ALK (9.2% of cases), PTPN11 (2.9%), ATRX (2.5%, an additional 7.1% had focal deletions), MYCN (1.7%, a recurrent p.Pro44Leu alteration), and NRAS (0.83%). Rare, potentially pathogenic germline variants were significantly enriched in ALK, CHEK2, PINK1, and BARD1. The relative paucity of recurrent somatic mutations in neuroblastoma challenges current therapeutic strategies reliant upon frequently altered oncogenic drivers

Harvard University - DASH

Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species

Author: \uc9l\ue9nie Godzaridis
Adam M. Phillippy
Alexey Sergushichev
Anton Alexandrov
Benedict Paten
Binghang Liu
Bruno M. Vieira
Carson Qu
Daniel S. Rokhsar
Dariusz Przybylski
David B. Jaffe
David C. Schwartz
David Haussler
DEL FABBRO Cristian
Delphine Naquin
Dent Earl
Dominique Lavenier
Erich D. Jarvis
Fedor Tsarev
Filipe J. Ribeiro
Fran\ue7ois Laviolette
Francisco Pina Martins
Ganeshkumar Ganapathy
Giles Hall
Guillaume Chapuis
Guojie Zhang
Hamidreza Chitsaz
Hao Zhang
Henry Song
Huaiyang Jiang
Iain Maccallum
Ian F. Korf
Inan\ue7 Birol
Isaac Y. Ho
J. Ruby
Jacob O. Kitzman
Jacques Corbeil
James R. Knight
Jared T. Simpson
Jarrod A. Chapman
Jason Howard
Jay Shendure
Jianying Yuan
Joseph B. Hiatt
Joseph N. Fass
Jun Wang
Keith R. Bradnam
Kim C. Worley
Martin Hunt
Matthew D. Macmanes
Matthias Haimel
Michael C. Schatz
Michael Bechner
Michael Place
Nicolas Maillet
Nuno A. Fonseca
Oct\ue1vio S. Paulo
Paul J. Kersey
Paul Baranay
Pavel Fedotov
Rayan Chikhi
Richard A. Gibbs
Richard Durbin
Ruibang Luo
S\ue9bastien Boisvert
Sante Gnerre
Scalabrin Simone
Scott Emrich
Sergey Kazakov
Sergey Koren
Sergey Melnikov
Shaun D. Jackman
Shiguo Zhou
Shuangye Yin
Siu Ming Yiu
Stephen Richards
Steve Goldstein
T. Docking
Tak Wah Lam
Ted Sharpe
Thomas D. Otto
Timothy I. Shaw
Vezzi Francesco
Vicedomini Riccardo
Wen Chi Chou
Xiang Qin
Yingrui Li
Yue Liu
Yujian Shi
Zemin Ning
Zhenyu Li
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Background: The process of generating raw genome sequence data continues to become cheaper, faster, and more accurate. However, assembly of such data into high-quality, finished genome sequences remains challenging. Many genome assembly tools are available, but they differ greatly in terms of their performance (speed, scalability, hardware requirements, acceptance of newer read technologies) and in their final output (composition of assembled sequence). More importantly, it remains largely unclear how to best assess the quality of assembled genome sequences. The Assemblathon competitions are intended to assess current state-of-the-art methods in genome assembly. Results: In Assemblathon 2, we provided a variety of sequence data to be assembled for three vertebrate species (a bird, a fish, and snake). This resulted in a total of 43 submitted assemblies from 21 participating teams. We evaluated these assemblies using a combination of optical map data, Fosmid sequences, and several statistical methods. From over 100 different metrics, we chose ten key measures by which to assess the overall quality of the assemblies. Conclusions: Many current genome assemblers produced useful assemblies, containing a significant representation of their genes and overall genome structure. However, the high degree of variability between the entries suggests that there is still much room for improvement in the field of genome assembly and that approaches which work well in assembling the genome of one species may not necessarily work well for another

Archivio istituzionale della ricerca - Università degli Studi di Udine

Frequent mutation of histone-modifying genes in non-Hodgkin lymphoma

Author: A Brunet
A Giordano
A Han
A Mortazavi
A Shilatifard
Allen Delaney
Andrew J. Mungall
Angela Brooks-Wilson
Angela Tam
B Kreutz
Barbara Meissner
Bruce Woolcock
C Greenman
CJ Sneeringer
David W. Scott
DB Yap
DE Horsman
Diane L. Trinh
DJC Tai
Douglas E. Horsman
Duane Smailus
DW Parsons
E Canaani
ED Pleasance
ER Mardis
Eric Y. Zhao
G Lenz
G Lenz
G Lenz
G Robertson
G Wright
GL Dalgliesh
H Li
H Youn
Helen McDonald
I Issaeva
I Yusuf
Inanç Birol
Irmtraud M. Meyer
J Iqbal
J Iqbal
J Mo
Jacqueline Schein
Jessica Tamura-Wells
JM Manganello
John J. Spinelli
Joseph M. Connors
JR Anderson
Kane Tse
Karen L. Mungall
KH Young
KJ Cheung
L Pasqualucci
L Pasqualucci
Lisa Rimsza
M Compagno
M Kato
M Saito
Malachi Griffith
Marco A. Marra
Maria Mendez-Lago
Marlo R. Firme
Martin Hirst
Martin Krzywinski
Matthew Field
Merrill Boyle
Michelle Moksa
MQ Du
Nathalie A. Johnson
Oleksandr Yakovenko
P Wang
PA Futreal
R Bhattacharyya
R Goya
R Krumlauf
Randy D. Gascoyne
RD Morin
RE Davis
Readman Chiu
Richard D. Corbett
Richard Moore
Rob Holt
Rodrigo Goya
Ryan D. Morin
Sa Li
Sanja Rogic
Shaun Jackman
SP Shah
Steven J. M. Jones
Suganthi Chittaranjan
Susana Ben-Neriah
Susanna Chan
T Milne
Tesa M. Severson
Thomas Zeng
VN Ngo
Yaron Butterfield
Yongjun Zhao
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 27/07/2011
Field of study

Follicular lymphoma (FL) and diffuse large B-cell lymphoma (DLBCL) are the two most common non-Hodgkin lymphomas (NHLs). Here we sequenced tumour and matched normal DNA from 13 DLBCL cases and one FL case to identify genes with mutations in B-cell NHL. We analysed RNA-seq data from these and another 113 NHLs to identify genes with candidate mutations, and then re-sequenced tumour and matched normal DNA from these cases to confirm 109 genes with multiple somatic mutations. Genes with roles in histone modification were frequent targets of somatic mutation. For example, 32% of DLBCL and 89% of FL cases had somatic mutations in MLL2, which encodes a histone methyltransferase, and 11.4% and 13.4% of DLBCL and FL cases, respectively, had mutations in MEF2B, a calcium-regulated gene that cooperates with CREBBP and EP300 in acetylating histones. Our analysis suggests a previously unappreciated disruption of chromatin biology in lymphomagenesis

ResearchOnline@JCU

Crossref

ResearchOnline at James Cook University

PubMed Central

MDC Repository

Efficient assembly of large genomes

Author: Jackman Shaun Dunn
Publication venue: University of British Columbia Press
Publication date: 01/05/2019
Field of study

Genome sequence assembly presents a fascinating and frequently-changing challenge. As DNA sequencing technologies evolve, the bioinformatics methods used to assemble sequencing data must evolve along with it. Sequencing technology has evolved from slab gel sequencing, to capillary sequencing, to short read sequencing by synthesis, to long-read and linked-read single-molecule sequencing. Each evolutionary jump in sequencing technology required developing new bioinformatic tools to address the unique characteristics of its sequencing data. This work reports the development of efficient methods to assemble short-read and linked-read sequencing data, named ABySS 2.0 and Tigmint. ABySS 2.0 reduces the memory requirements of short-read genome sequencing assembly by ten fold compared to ABySS 1.0. It does so by using a Bloom filter probabilistic data structure to represent a de Bruijn graph. Tigmint uses linked reads to identify large-scale errors in a genome sequence assembly. Correcting assembly errors using Tigmint before scaffolding improves both the contiguity and correctness of a human genome assembly compared to scaffolding without correction. I have also applied these methods to assemble the 12 gigabase genome of western redcedar (Thuja plicata), which is four times the size of the human genome. Although numerous mitochondrial genomes of angiosperm are available, few mitochondria of gymnosperms have been sequenced. I assembled the plastid and mitochondrial genomes of white spruce (Picea glauca) using whole genome short read sequencing. I assembled the mitochondrial genome of Sitka spruce (Picea sitchensis) using whole genome long read sequencing, the largest complete genome assembly of a gymnosperm mitochondrion. The mitochondrial genomes of both species include a remarkable number of trans-spliced genes. I have developed two additional tools, UniqTag and ORCA. UniqTag assigns unique and stable gene identifiers to genes based on their sequence content. This gene labeling system addresses the inconvenience of gene identifiers changing between versions of a genome assembly. ORCA is a comprehensive bioinformatics computing environment, which includes hundreds of bioinformatics tools in a single easily-installed Docker image, and is useful for education and research. The assembly of linked read and long read sequencing of large molecules of DNA have yielded substantial improvements in the quality of genome assembly projects.Science, Faculty ofGraduat

University of British Columbia: cIRcle - UBC's Information Repository

Ethernet Communication in Lighting Control

Author: Shaun Jackman (99599)
Publication venue
Publication date
Field of study

Pathway Connectivity Inc. designs products to control entertainment and architectural lighting devices. Their products are typically installed in theatres, theme parks, and cruise ships. Lighting control devices currently use an industry standard protocol, DMX512, or digital multiplex 512, which allows 512 lighting fixtures to be controlled using a single cable. With the advent of more complex lighting fixtures, such as moving lights, this aging protocol is becoming less suitable. During my employ at Pathway Connectivity, the company designed the Pathport to serve as a bridge between the installed base of DMX products and today’s ubiquitous Ethernet networks. This thesis considers the design of the Pathport and measures a number of performance parameters such as network latency, dropped packet rate, and processor utilisation.</p

FigShare

Predicting Job Salaries from Text Descriptions

Author: Jackman Shaun
Reid Graham
Publication venue
Publication date: 17/09/2013
Field of study

An online job listing web site has extensive data that is primarily unstructured text descriptions of the posted jobs. Many listings provide a salary, but as many as half do not. For those listings that do not provide a salary, it is useful to predict a salary based on the description of that job. We tested a variety of regression methods, including maximum-likelihood regression, lasso regression, artificial neural net- works and random forests. We optimized the parameters of each of these methods, validated the performance of each model using cross validation and compared the performance of these methods on a withheld test data set.Science, Faculty ofStatistics, Department ofUnreviewedGraduat

University of British Columbia: cIRcle - UBC's Information Repository

UniqTag: Content-Derived Unique and Stable Identifiers for Gene Annotation

Author: İnanç Birol (5660038)
Joerg Bohlmann (280238)
Shaun D. Jackman (746366)
Publication venue
Publication date: 28/05/2015
Field of study

<div>When working on an ongoing genome sequencing and assembly project, it is rather inconvenient when gene identifiers change from one build of the assembly to the next. The gene labelling system described here, UniqTag, addresses this common challenge. UniqTag assigns a unique identifier to each gene that is a representative k-mer, a string of length k, selected from the sequence of that gene. Unlike serial numbers, these identifiers are stable between different assemblies and annotations of the same data without requiring that previous annotations be lifted over by sequence alignment. We assign UniqTag identifiers to ten builds of the Ensembl human genome spanning eight years to demonstrate this stability. The implementation of UniqTag in Ruby and an R package are available at <a href="https://github.com/sjackman/uniqtag" target="_blank">https://github.com/sjackman/uniqtag</a> sjackman/uniqtag. The R package is also available from CRAN: install.packages ("uniqtag"). Supplementary material and code to reproduce it is available at <a href="https://github.com/sjackman/uniqtag-paper" target="_blank">https://github.com/sjackman/uniqtag-paper</a>.</div

Directory of Open Access Journals

FigShare

Scaffolding large genomes using mate-pair sequencing and ABySS

Author: Anthony Raymond (277073)
Inanc Birol (277074)
Shaun Jackman (99599)
Publication venue
Publication date
Field of study

The sequencing and assembly of the white spruce genome (Picea glauca) using ABySS</p

FigShare