Search CORE

5,071 research outputs found

Fast and accurate short read alignment with Burrows–Wheeler transform

Author: H. Li
Langmead
Lippert
R. Durbin
Smith
Publication venue: Oxford University Press
Publication date: 01/01/2009
Field of study

Motivation: The enormous amount of short reads generated by the new DNA sequencing technologies call for the development of fast and accurate read alignment programs. A first generation of hash table-based methods has been developed, including MAQ, which is accurate, feature rich and fast enough to align short reads from a single individual. However, MAQ does not support gapped alignment for single-end reads, which makes it unsuitable for alignment of longer reads where indels may occur frequently. The speed of MAQ is also a concern when the alignment is scaled up to the resequencing of hundreds of individuals

CiteSeerX

Crossref

PubMed Central

Efficient construction of an assembly string graph using the FM-index

Author: J. T. Simpson
Langmead
Myers
Pevzner
R. Durbin
Publication venue: Oxford University Press
Publication date
Field of study

Motivation: Sequence assembly is a difficult problem whose importance has grown again recently as the cost of sequencing has dramatically dropped. Most new sequence assembly software has started by building a de Bruijn graph, avoiding the overlap-based methods used previously because of the computational cost and complexity of these with very large numbers of short reads. Here, we show how to use suffix array-based methods that have formed the basis of recent very fast sequence mapping algorithms to find overlaps and generate assembly string graphs asymptotically faster than previously described algorithms

Crossref

PubMed Central

Engaging international students in employability activities: an innovative approach

Author: Huang R
Langmead C
Turner R
Walker S
Publication venue: 'University of Plymouth'
Publication date: 01/01/2017
Field of study

This project aimed to investigate whether it will be effective to adopt social media to disseminate training opportunities and engage international students to develop their employability while they study in the UK. More specifically, three research objectives were: to examine international students’ opinions on usage of social media to engage them in different employability opportunities; to assess the effectiveness of the social media; to make recommendations to relevant student services for better engagement of international students. Background/context to project: Graduate employability has been widely debated by policy-makers and academics (Pegg et al., 2012). However, Waters (2009) points out that little reference is made in current literature to the increasingly international dimensions of higher education. Huang et al.’s (2014) research into graduate employability and Chinese international students in the UK argues that the students were fully aware of a range of opportunities available to support the development of their employability but their engagement with those opportunities could be better. Many authors recognise the importance of social media in engaging students in learning but few consider graduate employability. Furthermore, anecdotal evidence gathered through our previous research and our roles in supporting international students indicate that social media might address the current gapPedRI

Plymouth Electronic Archive and Research Library

galign: A Tool for Rapid Genome Polymorphism Discovery

Author: B Langmead
EA Perens
H Li
Ilya Ruvinsky
R Li
S Sarin
Shai Shaham
TF Smith
Z Ning
Publication venue: Public Library of Science
Publication date: 25/09/2009
Field of study

BACKGROUND: Highly parallel sequencing technologies have become important tools in the analysis of sequence polymorphisms on a genomic scale. However, the development of customized software to analyze data produced by these methods has lagged behind. METHODS/PRINCIPAL FINDINGS: Here I describe a tool, 'galign', designed to identify polymorphisms between sequence reads obtained using Illumina/Solexa technology and a reference genome. The 'galign' alignment tool does not use Smith-Waterman matrices for sequence comparisons. Instead, a simple algorithm comparing parsed sequence reads to parsed reference genome sequences is used. 'galign' output is geared towards immediate user application, displaying polymorphism locations, nucleotide changes, and relevant predicted amino-acid changes for ease of information processing. To do so, 'galign' requires several accessory files easily derived from an annotated reference genome. Direct sequencing as well as in silico studies demonstrate that 'galign' provides lesion predictions comparable in accuracy to available prediction programs, accompanied by greater processing speed and more user-friendly output. We demonstrate the use of 'galign' to identify mutations leading to phenotypic consequences in C. elegans. CONCLUSION/SIGNIFICANCE: Our studies suggest that 'galign' is a useful tool for polymorphism discovery, and is of immediate utility for sequence mining in C. elegans

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

The Sequence Alignment/Map format and SAMtools

Author: A. Wysoker
B. Handsaker
G. Abecasis
G. Marth
H. Li
J. Ruan
Langmead
Mardis
N. Homer
R. Durbin
T. Fennell
Publication venue: Oxford University Press
Publication date: 30/01/2013
Field of study

Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments

CiteSeerX

Crossref

Harvard University - DASH

PubMed Central

ShortRead: a bioconductor package for input, quality assessment and exploration of high-throughput sequence data

Author: Bentley
Gentleman
H. Pages
Langmead
M. Lawrence
M. Morgan
Mardis
Mortazavi
P. Aboyoun
R. Gentleman
S. Anders
Publication venue: Oxford University Press
Publication date
Field of study

Summary: ShortRead is a package for input, quality assessment, manipulation and output of high-throughput sequencing data. ShortRead is provided in the R and Bioconductor environments, allowing ready access to additional facilities for advanced statistical analysis, data transformation, visualization and integration with diverse genomic resources

Crossref

PubMed Central

Exploring behaviors of stochastic differential equation models of biological systems using change of measures

Author: A Gelman
A Pnueli
A Wald
B Finkbeiner
B Øksendal
C Langmead
Christopher James Langmead
CJ Langmead
D Harel
EL Lehmann
EM Clarke
GA Edgar
H Jeffreys
H Jeffreys
HLS Younes
HLS Younes
I Karatzas
IV Girsanov
J Berger
J Haigh
JR Faeder
K Gondi
K Sen
M Iosifescu
M Kwiatkowska
M Kwiatkowska
MC Wang
R Grosu
R Horhat
R Lassaigne
R Lefever
S Jha
SK Jha
SK Jha
SS Owicki
Sumit Kumar Jha
T Choi
T Hérault
V Kuznetsov
Publication venue: BioMed Central
Publication date: 01/01/2012
Field of study

Stochastic Differential Equations (SDE) are often used to model the stochastic dynamics of biological systems. Unfortunately, rare but biologically interesting behaviors (e.g., oncogenesis) can be difficult to observe in stochastic models. Consequently, the analysis of behaviors of SDE models using numerical simulations can be challenging. We introduce a method for solving the following problem: given a SDE model and a high-level behavioral specification about the dynamics of the model, algorithmically decide whether the model satisfies the specification. While there are a number of techniques for addressing this problem for discrete-state stochastic models, the analysis of SDE and other continuous-state models has received less attention. Our proposed solution uses a combination of Bayesian sequential hypothesis testing, non-identically distributed samples, and Girsanov's theorem for change of measures to examine rare behaviors. We use our algorithm to analyze two SDE models of tumor dynamics. Our use of non-identically distributed samples sampling contributes to the state of the art in statistical verification and model checking of stochastic models by providing an effective means for exposing rare events in SDEs, while retaining the ability to compute bounds on the probability that those events occur

Crossref

Springer - Publisher Connector

PubMed Central

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

A robust SNP barcode for typing Mycobacterium tuberculosis complex strains

Author: A McKenna
A Stamatakis
B Langmead
B López
B Tessema
CB Ford
D Stucki
DR Zerbino
E Abadia
F Coll
F Coll
G Thwaites
H Li
H Zhang
I Comas
I Filliol
J Felsenstein
J Wang
JH Bates
M Caws
M Kato-Maeda
M Kato-Maeda
N Casali
P Nahid
PF Barnes
PL Lin
R Firdessa
S Feuerriegel
S Gagneux
S Homolka
T Derrien
T Weniger
TA Rado
Y Blouin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Strain-specific genomic diversity in the Mycobacterium tuberculosis complex (MTBC) is an important factor in pathogenesis that may affect virulence, transmissibility, host response and emergence of drug resistance. Several systems have been proposed to classify MTBC strains into distinct lineages and families. Here, we investigate single-nucleotide polymorphisms (SNPs) as robust (stable) markers of genetic variation for phylogenetic analysis. We identify ~92k SNP across a global collection of 1,601 genomes. The SNP-based phylogeny is consistent with the gold-standard regions of difference (RD) classification system. Of the ~7k strain-specific SNPs identified, 62 markers are proposed to discriminate known circulating strains. This SNP-based barcode is the first to cover all main lineages, and classifies a greater number of sublineages than current alternatives. It may be used to classify clinical isolates to evaluate tools to control the disease, including therapeutics and vaccines whose effectiveness may vary by strain type

CiteSeerX

Crossref

LSHTM Research Online

Repositório da Universidade Nova de Lisboa

PubMed Central

Birkbeck Institutional Research Online

LSHTM Data Compass

HSRA: Hadoop-based spliced read aligner for RNA sequencing data

Author: A Dobin
A McKenna
A Mortazavi
A O’Driscoll
AD Smith
B Fjukstad
B Langmead
B Langmead
B Langmead
B Langmead
B Schmidt
D Decap
D Decap
D Hong
D Kim
D Kim
D Peters
G Baruzzo
H Li
H Li
H Li
H Nordberg
J Dean
J González-Domínguez
J Luo
J Sirén
JC Marioni
JM Abuín
JM Abuín
JM Mullaney
Jorge González-Domínguez
Juan Touriño
K Wang
KR Kukurba
L Pireddu
M Niemenmaa
M Zaharia
MC Schatz
NL Bray
Q Zou
R Li
R Patro
Roberto R. Expósito
RR Expósito
Ruslan Kalendar
RV Pandey
S Ghemawat
S Huang
S Pepke
T Nguyen
TD Wu
U Ferraro Petrillo
Z Wang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2018
Field of study

[Abstract] Nowadays, the analysis of transcriptome sequencing (RNA-seq) data has become the standard method for quantifying the levels of gene expression. In RNA-seq experiments, the mapping of short reads to a reference genome or transcriptome is considered a crucial step that remains as one of the most time-consuming. With the steady development of Next Generation Sequencing (NGS) technologies, unprecedented amounts of genomic data introduce significant challenges in terms of storage, processing and downstream analysis. As cost and throughput continue to improve, there is a growing need for new software solutions that minimize the impact of increasing data volume on RNA read alignment. In this work we introduce HSRA, a Big Data tool that takes advantage of the MapReduce programming model to extend the multithreading capabilities of a state-of-the-art spliced read aligner for RNA-seq data (HISAT2) to distributed memory systems such as multi-core clusters or cloud platforms. HSRA has been built upon the Hadoop MapReduce framework and supports both single- and paired-end reads from FASTQ/FASTA datasets, providing output alignments in SAM format. The design of HSRA has been carefully optimized to avoid the main limitations and major causes of inefficiency found in previous Big Data mapping tools, which cannot fully exploit the raw performance of the underlying aligner. On a 16-node multi-core cluster, HSRA is on average 2.3 times faster than previous Hadoop-based tools. Source code in Java as well as a user’s guide are publicly available for download at http://hsra.dec.udc.es.Ministerio de Economía, Industria y Competitividad; TIN2016-75845-PXunta de Galicia; ED431G/0

Repositorio da Universidade da Coruña

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

Directory of Open Access Journals

Protecting nationally important marine Biodiversity in Wales

Author: Ellis R
Evans J
Jackson EL
Langmead O
Tyler-Walters H
Publication venue: Marine Biological Association of the UK
Publication date: 01/11/2008
Field of study

Plymouth Marine Science Electronic Archive (PlyMSEA)