Search CORE

4,050 research outputs found

Sam2bam: High-Performance Framework for NGS Data Preprocessing Tools

Author: Cheng Yinhe
Ogasawara Takeshi
Tzeng Tzy-Hwa Kathy
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 05/08/2016
Field of study

This paper introduces a high-throughput software tool framework called {\it sam2bam} that enables users to significantly speedup pre-processing for next-generation sequencing data. The sam2bam is especially efficient on single-node multi-core large-memory systems. It can reduce the runtime of data pre-processing in marking duplicate reads on a single node system by 156-186x compared with de facto standard tools. The sam2bam consists of parallel software components that can fully utilize the multiple processors, available memory, high-bandwidth of storage, and hardware compression accelerators if available. The sam2bam provides file format conversion between well-known genome file formats, from SAM to BAM, as a basic feature. Additional features such as analyzing, filtering, and converting the input data are provided by {\it plug-in} tools, e.g., duplicate marking, which can be attached to sam2bam at runtime. We demonstrated that sam2bam could significantly reduce the runtime of NGS data pre-processing from about two hours to about one minute for a whole-exome data set on a 16-core single-node system using up to 130 GB of memory. The sam2bam could reduce the runtime for whole-genome sequencing data from about 20 hours to about nine minutes on the same system using up to 711 GB of memory

arXiv.org e-Print Archive

Directory of Open Access Journals

PubMed Central

FigShare

Evolution of foot-and-mouth disease virus intra-sample sequence diversity during serial transmission in bovine hosts

Author: Haydon D.T.
Juleff N.
King D.P.
Knowles N.J.
Morelli M.J.
Paton D.J.
Wright C.F.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

RNA virus populations within samples are highly heterogeneous, containing a large number of minority sequence variants which can potentially be transmitted to other susceptible hosts. Consequently, consensus genome sequences provide an incomplete picture of the within- and between-host viral evolutionary dynamics during transmission. Foot-and-mouth disease virus (FMDV) is an RNA virus that can spread from primary sites of replication, via the systemic circulation, to found distinct sites of local infection at epithelial surfaces. Viral evolution in these different tissues occurs independently, each of them potentially providing a source of virus to seed subsequent transmission events. This study employed the Illumina Genome Analyzer platform to sequence 18 FMDV samples collected from a chain of sequentially infected cattle. These data generated snap-shots of the evolving viral population structures within different animals and tissues. Analyses of the mutation spectra revealed polymorphisms at frequencies >0.5% at between 21 and 146 sites across the genome for these samples, while 13 sites acquired mutations in excess of consensus frequency (50%). Analysis of polymorphism frequency revealed that a number of minority variants were transmitted during host-to-host infection events, while the size of the intra-host founder populations appeared to be smaller. These data indicate that viral population complexity is influenced by small intra-host bottlenecks and relatively large inter-host bottlenecks. The dynamics of minority variants are consistent with the actions of genetic drift rather than strong selection. These results provide novel insights into the evolution of FMDV that can be applied to reconstruct both intra- and inter-host transmission routes

Crossref

Springer - Publisher Connector

Enlighten

Distinguishing low frequency mutations from RT-PCR and sequence errors in viral deep sequencing data

Author: Haydon Daniel T.
King David J.
King Donald
Morelli Marco J.
Orton Richard J.
Paton David
Wright Caroline F.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 24/03/2015
Field of study

There is a high prevalence of coronary artery disease (CAD) in patients with left bundle branch block (LBBB); however there are many other causes for this electrocardiographic abnormality. Non-invasive assessment of these patients remains difficult, and all commonly used modalities exhibit several drawbacks. This often leads to these patients undergoing invasive coronary angiography which may not have been necessary. In this review, we examine the uses and limitations of commonly performed non-invasive tests for diagnosis of CAD in patients with LBBB

Springer - Publisher Connector

PubMed Central

Enlighten

Computer architecture for efficient algorithmic executions in real-time systems: New technology for avionics systems and advanced space vehicles

Author: Carroll Chester C.
Saha Aindam
Youngblood John N.
Publication venue
Publication date
Field of study

Improvements and advances in the development of computer architecture now provide innovative technology for the recasting of traditional sequential solutions into high-performance, low-cost, parallel system to increase system performance. Research conducted in development of specialized computer architecture for the algorithmic execution of an avionics system, guidance and control problem in real time is described. A comprehensive treatment of both the hardware and software structures of a customized computer which performs real-time computation of guidance commands with updated estimates of target motion and time-to-go is presented. An optimal, real-time allocation algorithm was developed which maps the algorithmic tasks onto the processing elements. This allocation is based on the critical path analysis. The final stage is the design and development of the hardware structures suitable for the efficient execution of the allocated task graph. The processing element is designed for rapid execution of the allocated tasks. Fault tolerance is a key feature of the overall architecture. Parallel numerical integration techniques, tasks definitions, and allocation algorithms are discussed. The parallel implementation is analytically verified and the experimental results are presented. The design of the data-driven computer architecture, customized for the execution of the particular algorithm, is discussed

NASA Technical Reports Server

Genome-culture coevolution promotes rapid divergence of killer whale ecotypes.

Author: A Dornburg
A Keinan
A Powell
A Varki
AD Foote
AE Moura
AE Moura
AE Moura
DW Huang
EL Saulitis
ET Wang
GT Marth
H Li
H Li
H Li
J Hawks
J Jurka
J Reynolds
JD Finkelstein
JE Pool
JK Pickrell
JKB Ford
JW Durban
JW Locasale
JW Poelstra
KN Laland
KN Laland
L Excoffier
L Rendell
L Skotte
LJN Brent
LL Cavalli-Sforza
LM Gattepaille
M Fumagalli
M Fumagalli
M Fumagalli
M Fumagalli
M Kimura
M Rebhan
N Patterson
N Patterson
O Mazet
OP Forman
PA Morin
Q Atkinson
R Burri
R Lanfear
R Nielsen
R Riesch
RE Green
RL Pitman
S Lindgreen
S Liu
S Ptak
S Zhan
T Miyata
T Satoh
TE Cruickshank
TS Korneliussen
TS Korneliussen
V Sousa
WJ Swanson
X Liu
Yi
Z Shao
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Analysing population genomic data from killer whale ecotypes, which we estimate have globally radiated within less than 250,000 years, we show that genetic structuring including the segregation of potentially functional alleles is associated with socially inherited ecological niche. Reconstruction of ancestral demographic history revealed bottlenecks during founder events, likely promoting ecological divergence and genetic drift resulting in a wide range of genome-wide differentiation between pairs of allopatric and sympatric ecotypes. Functional enrichment analyses provided evidence for regional genomic divergence associated with habitat, dietary preferences and post-zygotic reproductive isolation. Our findings are consistent with expansion of small founder groups into novel niches by an initial plastic behavioural response, perpetuated by social learning imposing an altered natural selection regime. The study constitutes an important step towards an understanding of the complex interaction between demographic history, culture, ecological adaptation and evolution at the genomic level

Crossref

Publikationer från Uppsala Universitet

Open Access LMU

PubMed Central

Copenhagen University Research Information System

Spiral - Imperial College Digital Repository

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Bern Open Repository and Information System (BORIS)

espace@Curtin

MPG.PuRe

Demography and the age of rare variants

Author: Mathieson Iain
McVean Gil
Publication venue
Publication date: 06/06/2014
Field of study

Large whole-genome sequencing projects have provided access to much of the rare variation in human populations, which is highly informative about population structure and recent demography. Here, we show how the age of rare variants can be estimated from patterns of haplotype sharing and how these ages can be related to historical relationships between populations. We investigate the distribution of the age of variants occurring exactly twice (f2 variants) in a worldwide sample sequenced by the 1000 Genomes Project, revealing enormous variation across populations. The median age of haplotypes carrying f2 variants is 50 to 160 generations across populations within Europe or Asia, and 170 to 320 generations within Africa. Haplotypes shared between continents are much older with median ages for haplotypes shared between Europe and Asia ranging from 320 to 670 generations. The distribution of the ages of f2 haplotypes is informative about their demography, revealing recent bottlenecks, ancient splits, and more modern connections between populations. We see the signature of selection in the observation that functional variants are significantly younger than nonfunctional variants of the same frequency. This approach is relatively insensitive to mutation rate and complements other nonparametric methods for demographic inference.Comment: Revised versio

arXiv.org e-Print Archive

CiteSeerX

Directory of Open Access Journals

PubMed Central

FigShare

Recommended from our members

A tool of "barcoded viruses" to study influenza virus transmission dynamics

Author: Fu Jinqi
Publication venue: University of Cambridge
Publication date: 22/07/2019
Field of study

The aim of this study was to establish a novel version of powerful “barcode viruses” as a tool for studying the replication and transmission dynamics of influenza virus in vitro and in vivo. Five barcoded APR8 viruses were firstly used to investigate infection kinetics (e.g. single- and multi-hit events, particle clumping and temporal aspects of co-infection) in vitro. This work demonstrated that the majority of infectious events in cell culture were single-hit events, but a significant number of infections were initiated by more than one virus particles (consistent with virus aggregation during release). Reassortment was found to occur efficiently and ubiquitously when near-isogenic viruses co-infected cells. The timing of asynchronous co-infection revealed that super-infection was possible if the second virus encountered the cell within 4 hr of the first virus. The super-infecting virus showed accelerated replication and enhanced yield, suggesting the second virus can take advantage of the already initiated replication machinery. Beyond this time point (coincident with the onset of progeny release from the first virus) the second virus was blocked by the initial infecting viruses. Five virus libraries carrying ~2000 individually identifiable variants were then generated for in vivo study. Amplification of the viral libraries in Madin-Darby canine kidney (MDCK) cells was achieved without substantial bottlenecking or preferential selection of specific sequences. Thirdly, two pilot studies in pigs demonstrated that intranasal inoculation resulted in substantial bottlenecking and a relatively small proportion of the inoculum gave rise to productive infection. Consequently, distinct viral populations were found in different nostrils and could persist over the course of the infection due to anatomical partitioning. Distinct sub-populations could be distinguished in other tissue sites (e.g. trachea and lung). Super-infection of individual pigs could occur around 2 days following primary exposure. The identity of the donor pigs could be determined by the barcode identities. In the first pilot study, around 600 variants were seen in each donor pig directly inoculated with approximately 6000 variants of the barcoded viruses. When a pig was co-housed with 3 donors, a typical transmission dose of 73-151 variants were seen. To further study the transmission dose between a single donor and recipient, a transmission dose defined as 30-60 on 2 days post contact (d.p.c) and 20-50 on 3 d.p.c was observed. To conclude, my PhD project has developed a powerful tool with a wide range of applications in influenza biology, particularly for studying transmission dynamics in a natural host system

Apollo (Cambridge)

Ultraplex- A rapid, flexible, all-in-one fastq demultiplexer [version 1; peer review- 1 approved]

Author: Capitanchik C
Luscombe N
Ule J
Wilkins O
Publication venue
Publication date: 07/06/2021
Field of study

BACKGROUND: The first step of virtually all next generation sequencing analysis involves the splitting of the raw sequencing data into separate files using sample-specific barcodes, a process known as “demultiplexing”. However, we found that existing software for this purpose was either too inflexible or too computationally intensive for fast, streamlined processing of raw, single end fastq files containing combinatorial barcodes. RESULTS: Here, we introduce a fast and uniquely flexible demultiplexer, named Ultraplex, which splits a raw FASTQ file containing barcodes either at a single end or at both 5’ and 3’ ends of reads, trims the sequencing adaptors and low-quality bases, and moves unique molecular identifiers (UMIs) into the read header, allowing subsequent removal of PCR duplicates. Ultraplex is able to perform such single or combinatorial demultiplexing on both single- and paired-end sequencing data, and can process an entire Illumina HiSeq lane, consisting of nearly 500 million reads, in less than 20 minutes. CONCLUSIONS: Ultraplex greatly reduces computational burden and pipeline complexity for the demultiplexing of complex sequencing libraries, such as those produced by various CLIP and ribosome profiling protocols, and is also very user friendly, enabling streamlined, robust data processing. Ultraplex is available on PyPi and Conda and via Github

UCL Discovery

Supervised cross-modal factor analysis for multiple modal data classification

Author: Bensmail Halima
Duan Kanghong
Wang Jim Jing-Yan
Wang Jingbin
Zhou Yihua
Publication venue
Publication date: 18/08/2015
Field of study

In this paper we study the problem of learning from multiple modal data for purpose of document classification. In this problem, each document is composed two different modals of data, i.e., an image and a text. Cross-modal factor analysis (CFA) has been proposed to project the two different modals of data to a shared data space, so that the classification of a image or a text can be performed directly in this space. A disadvantage of CFA is that it has ignored the supervision information. In this paper, we improve CFA by incorporating the supervision information to represent and classify both image and text modals of documents. We project both image and text data to a shared data space by factor analysis, and then train a class label predictor in the shared space to use the class label information. The factor analysis parameter and the predictor parameter are learned jointly by solving one single objective function. With this objective function, we minimize the distance between the projections of image and text of the same document, and the classification error of the projection measured by hinge loss function. The objective function is optimized by an alternate optimization strategy in an iterative algorithm. Experiments in two different multiple modal document data sets show the advantage of the proposed algorithm over other CFA methods

arXiv.org e-Print Archive

CiteSeerX

Crossref

Host-selected mutations converging on a global regulator drive an adaptive leap towards symbiosis in bacteria

Author: Cooper Vaughn S.
Coyle Matthew
Donner Rachel A.
Foxall Randi L.
Pankey Molly Sabrina
Perry Lauren A.
Schuster Brian M.
Ster Ian M.
Whistler Cheryl A.
Publication venue: University of New Hampshire Scholars\u27 Repository
Publication date: 27/04/2017
Field of study

Host immune and physical barriers protect against pathogens but also impede the establishment of essential symbiotic partnerships. To reveal mechanisms by which beneficial organisms adapt to circumvent host defenses, we experimentally evolved ecologically distinct bioluminescent Vibrio fischeri by colonization and growth within the light organs of the squid Euprymna scolopes. Serial squid passaging of bacteria produced eight distinct mutations in the binK sensor kinase gene, which conferred an exceptional selective advantage that could be demonstrated through both empirical and theoretical analysis. Squid-adaptive binK alleles promoted colonization and immune evasion that were mediated by cell-associated matrices including symbiotic polysaccharide (Syp) and cellulose. binK variation also altered quorum sensing, raising the threshold for luminescence induction. Preexisting coordinated regulation of symbiosis traits by BinK presented an efficient solution where altered BinK function was the key to unlock multiple colonization barriers. These results identify a genetic basis for microbial adaptability and underscore the importance of hosts as selective agents that shape emergent symbiont populations

UNH Scholars' Repository