Search CORE

31 research outputs found

FAAST: Flow-space Assisted Alignment Search Tool

Author: Bengt Persson
Björn Andersson
DJ Lipman
Fredrik Lysholm
J Jerlström-Hultqvist
M Droege
M Margulies
MO Dayhoff
O Gotoh
R Kofler
S Balzer
SB Needleman
SF Altschul
SF Altschul
TF Smith
V Vacic
WR Pearson
Z Ning
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background High throughput pyrosequencing (454 sequencing) is the major sequencing platform for producing long read high throughput data. While most other sequencing techniques produce reading errors mainly comparable with substitutions, pyrosequencing produce errors mainly comparable with gaps. These errors are less efficiently detected by most conventional alignment programs and may produce inaccurate alignments. Results We suggest a novel algorithm for calculating the optimal local alignment which utilises flowpeak information in order to improve alignment accuracy. Flowpeak information can be retained from a 454 sequencing run through interpretation of the binary SFF-file format. This novel algorithm has been implemented in a program named FAAST (Flow-space Assisted Alignment Search Tool). Conclusions We present and discuss the results of simulations that show that FAAST, through the use of the novel algorithm, can gain several percentage points of accuracy compared to Smith-Waterman-Gotoh alignments, depending on the 454 data quality. Furthermore, through an efficient multi-thread aware implementation, FAAST is able to perform these high quality alignments at high speed. The tool is available at <url>http://www.ifm.liu.se/bioinfo/</url></p

Publikationer från Linköpings universitet

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Digitala Vetenskapliga Arkivet - Academic Archive On-line

FAAST: Flow-space Assisted Alignment Search Tool

Author: Fredrik Lysholm
Björn Andersson
Bengt Persson
M Margulies
M Droege
SB Needleman
TF Smith
O Gotoh
DJ Lipman
WR Pearson
SF Altschul
SF Altschul
MO Dayhoff
V Vacic
R Kofler
S Balzer
J Jerlström-Hultqvist
Z Ning
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Publikationer från Linköpings universitet

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Aston Publications Explorer

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Characterization of the Viral Microbiome in Patients with Severe Lower Respiratory Tract Infections, Using Metagenomic Sequencing

Author: A Djikeng
A. Michael Lindberg
AFA Smit
Anna Wetterbom
Annelie Bjerkner
B Chevreux
Bengt Persson
BG van den Hoogen
Björn Andersson
C Li
Cecilia Lindau
CL McIntyre
EL Delwart
EV Vasilyev
EW Sayers
Fredrik Lysholm
G Perrière
Hamid Darban
J Bizzintino
J Raes
JD Thompson
JG Victoria
K Rosario
K Rosario
K Thom
Kristina Fahlander
M Margulies
M Peiris
MB Hershenson
MJ Gibbs
MK Iwane
NS Young
P Froussard
P Simmonds
RA Edwards
RK Gunnarsson
S Goh
S Hino
Sarah K. Highlander
SF Altschul
SR Finkbeiner
T Allander
T Allander
T Allander
T Heikkinen
T Jartti
T Nishizawa
T Zhang
TG Ksiazek
Tobias Allander
Z Zhang
Publication venue: Public Library of Science
Publication date: 01/01/2012
Field of study

The human respiratory tract is heavily exposed to microorganisms. Viral respiratory tract pathogens, like RSV, influenza and rhinoviruses cause major morbidity and mortality from respiratory tract disease. Furthermore, as viruses have limited means of transmission, viruses that cause pathogenicity in other tissues may be transmitted through the respiratory tract. It is therefore important to chart the human virome in this compartment. We have studied nasopharyngeal aspirate samples submitted to the Karolinska University Laboratory, Stockholm, Sweden from March 2004 to May 2005 for diagnosis of respiratory tract infections. We have used a metagenomic sequencing strategy to characterize viruses, as this provides the most unbiased view of the samples. Virus enrichment followed by 454 sequencing resulted in totally 703,790 reads and 110,931 of these were found to be of viral origin by using an automated classification pipeline. The snapshot of the respiratory tract virome of these 210 patients revealed 39 species and many more strains of viruses. Most of the viral sequences were classified into one of three major families; Paramyxoviridae, Picornaviridae or Orthomyxoviridae. The study also identified one novel type of Rhinovirus C, and identified a number of previously undescribed viral genetic fragments of unknown origin

Public Library of Science (PLOS)

Publikationer från Linköpings universitet

Crossref

Directory of Open Access Journals

PubMed Central

Digitala Vetenskapliga Arkivet - Academic Archive On-line

FigShare

Bioinformatic methods for characterization of viral pathogens in metagenomic samples

Author: Lysholm Fredrik
Publication venue: 'Linkoping University Electronic Press'
Publication date: 01/01/2013
Field of study

Virus infections impose a huge disease burden on humanity and new viruses are continuously found. As most studies of viral disease are limited to theinvestigation of known viruses, it is important to characterize all circulating viruses. Thus, a broad and unselective exploration of the virus flora would be the most productive development of modern virology. Fueled by the reduction in sequencing costs and the unbiased nature of shotgun sequencing, viral metagenomics has rapidly become the strategy of choice for this exploration. This thesis mainly focuses on improving key methods used in viral metagenomics as well as the complete viral characterization of two sets of samples using these methods. The major methods developed are an efficient automated analysis pipeline for metagenomics data and two novel, more accurate, alignment algorithms for 454 sequencing data. The automated pipeline facilitates rapid, complete and effortless analysis of metagenomics samples, which in turn enables detection of potential pathogens, for instance in patient samples. The two new alignment algorithms developed cover comparisons both against nucleotide and protein databases, while retaining the underlying 454 data representation. Furthermore, a simulator for 454 data was developed in order to evaluate these methods. This simulator is currently the fastest and most complete simulator of 454 data, which enables further development of algorithms and methods. Finally, we have successfully used these methods to fully characterize a multitude of samples, including samples collected from children suffering from severe lower respiratory tract infections as well as patients diagnosed with chronic fatigue syndrome, both of which presented in this thesis. In these studies, a complete viral characterization has revealed the presence of both expected and unexpected viral pathogens as well as many potential novel viruses

Publikationer från Linköpings universitet

Highly improved homopolymer aware nucleotide-protein alignments with 454 data

Author: Lysholm Fredrik
Publication venue: BMC
Publication date: 01/01/2012
Field of study

Abstract Background Roche 454 sequencing is the leading sequencing technology for producing long read high throughput sequence data. Unlike most methods where sequencing errors translate to base uncertainties, 454 sequencing inaccuracies create nucleotide gaps. These gaps are particularly troublesome for translated search tools such as BLASTx where they introduce frame-shifts and result in regions of decreased identity and/or terminated alignments, which affect further analysis. Results To address this issue, the Homopolymer Aware Cross Alignment Tool (HAXAT) was developed. HAXAT uses a novel dynamic programming algorithm for solving the optimal local alignment between a 454 nucleotide and a protein sequence by allowing frame-shifts, guided by 454 flowpeak values. The algorithm is an efficient minimal extension of the Smith-Waterman-Gotoh algorithm that easily fits in into other tools. Experiments using HAXAT demonstrate, through the introduction of 454 specific frame-shift penalties, significantly increased accuracy of alignments spanning homopolymer sequence errors. The full effect of the new parameters introduced with this novel alignment model is explored. Experimental results evaluating homopolymer inaccuracy through alignments show a two to five-fold increase in Matthews Correlation Coefficient over previous algorithms, for 454-derived data. Conclusions This increased accuracy provided by HAXAT does not only result in improved homologue estimations, but also provides un-interrupted reading-frames, which greatly facilitate further analysis of protein space, for example phylogenetic analysis. The alignment tool is available at <url>http://bioinfo.ifm.liu.se/454tools/haxat</url>.</p

Publikationer från Linköpings universitet

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Structural characterization of overrepresented

Author: Lysholm Fredrik
Publication venue: Linköpings universitet, Institutionen för fysik, kemi och biologi
Publication date: 01/01/2008
Field of study

Background: Through the last decades vast amount of sequence information have been produced by various protein sequencing projects, which enables studies of sequential patterns. One of the bestknown efforts to chart short peptide sequences is the Prosite pattern data bank. While sequential patterns like those of Prosite have proved very useful for classifying protein families, functions etc. structural analysis may provide more information and possible crucial clues linked to protein folding. Today PDB, which is the main repository for protein structure, contains more than 50’000 entries which enables structural protein studies. Result: Strongly folded pentapeptides, defined as pentapeptides which retained a specific conformation in several significantly structurally different proteins, were studied out of PDB. Among these several groups were found. Possibly the most well defined is the “double Cys” pentapeptide group, with two amino acids in between (CXXCX|XCXXC) which were found to form backbone loops where the two Cysteine amino acids formed a possible Cys-Cys bridge. Other structural motifs were found both in helixes and in sheets like "ECSAM" and "TIKIW", respectively. Conclusion: There is much information to be extracted by structural analysis of pentapeptides and other oligopeptides. There is no doubt that some pentapeptides are more likely to obtain a specific fold than others and that there are many strongly folded pentapeptides. By combining the usage of such patterns in a protein folding model, such as the Hydrophobic-polar-model improvements in speed and accuracy can be obtained. Comparing structural conformations for important overrepresented pentapeptides can also help identify and refine both structural information data banks such as SCOP and sequential pattern data banks such as Prosite

Publikationer från Linköpings universitet

An efficient simulator of 454 data using configurable statistical models

Author: Andersson Björn
Lysholm Fredrik
Persson Bengt
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Background Roche 454 is one of the major 2nd generation sequencing platforms. The particular characteristics of 454 sequence data pose new challenges for bioinformatic analyses, e.g. assembly and alignment search algorithms. Simulation of these data is therefore useful, in order to further assess how bioinformatic applications and algorithms handle 454 data. Findings We developed a new application named 454sim for simulation of 454 data at high speed and accuracy. The program is multi-thread capable and is available as C++ source code or pre-compiled binaries. Sequence reads are simulated by 454sim using a set of statistical models for each chemistry. 454sim simulates recorded peak intensities, peak quality deterioration and it calculates quality values. All three generations of the Roche 454 chemistry ('GS20', 'GS FLX' and 'Titanium') are supported and defined in external text files for easy access and tweaking. Conclusions We present a new platform independent application named 454sim. 454sim is generally 200 times faster compared to previous programs and it allows for simple adjustments of the statistical models. These improvements make it possible to carry out more complex and rigorous algorithm evaluations in a reasonable time scale

Publikationer från Linköpings universitet

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Digitala Vetenskapliga Arkivet - Academic Archive On-line