Search CORE

2,358 research outputs found

Parallel CLUSTAL W for PC clusters

Author: Cheetham J. (James J.)
Dehne F. (Frank)
Pitre S. (Sylvain)
Rau-Chaplin A. (Andrew)
Taillon P.J. (Peter J.)
Publication venue
Publication date: 01/12/2003
Field of study

Carleton University's Institutional Repository

MC64-ClustalWP2: A Highly-Parallel Hybrid Strategy to Align Multiple Sequences in Many-Core Architectures

Author: Caballero J.A.
Dorado G.
Díaz David
Esteban F.J.
Guevara A
Gálvez Sergio
Hernández Pilar
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2014
Field of study

We have developed the MC64-ClustalWP2 as a new implementation of the Clustal W algorithm, integrating a novel parallelization strategy and significantly increasing the performance when aligning long sequences in architectures with many cores. It must be stressed that in such a process, the detailed analysis of both the software and hardware features and peculiarities is of paramount importance to reveal key points to exploit and optimize the full potential of parallelism in many-core CPU systems. The new parallelization approach has focused into the most time-consuming stages of this algorithm. In particular, the so-called progressive alignment has drastically improved the performance, due to a fine-grained approach where the forward and backward loops were unrolled and parallelized. Another key approach has been the implementation of the new algorithm in a hybrid-computing system, integrating both an Intel Xeon multi-core CPU and a Tilera Tile64 many-core card. A comparison with other Clustal W implementations reveals the high-performance of the new algorithm and strategy in many-core CPU architectures, in a scenario where the sequences to align are relatively long (more than 10 kb) and, hence, a many-core GPU hardware cannot be used. Thus, the MC64-ClustalWP2 runs multiple alignments more than 18x than the original Clustal W algorithm, and more than 7x than the best x86 parallel implementation to date, being publicly available through a web service. Besides, these developments have been deployed in cost-effective personal computers and should be useful for life-science researchers, including the identification of identities and differences for mutation/polymorphism analyses, biodiversity and evolutionary studies and for the development of molecular markers for paternity testing, germplasm management and protection, to assist breeding, illegal traffic control, fraud prevention and for the protection of the intellectual property (identification/traceability), including the protected designation of origin, among other applications

Repositorio Institucional de la Universidad de Córdoba

Directory of Open Access Journals

PubMed Central

Digital.CSIC

Cellulose expression in Pseudomonas fluorescens SBW25 and other environmental Pseudomonads

Author: Deeni Yusuf Y.
Folorunso Ayorinde O.
Koza Anna
Moshynets Olena
Spiers Andrew J.
Zawadzki Kamil
Publication venue: InTech
Publication date: 29/08/2013
Field of study

IntechOpen

Abertay Research Portal

Crossref

Computational strategies for dissecting the high-dimensional complexity of adaptive immune repertoires

Author: Afzal
Ahmed
Albert
Amanna
Andrews
Angermueller
Apeltsin
Arden
Atchley
Avnir
Barabási
Barak
Bashford-Rogers
Bastian
Baum
Becattini
Ben-Hamo
Berger
Betz
Boc
Bolen
Bolkhovskaya
Bolotin
Bouckaert
Boyd
Boyd
Breden
Brown
Burnet
Bürckert
Calis
Castro
Chang
Chao
Chen
Ching
Cobey
Collins
Corcoran
Covacu
Csardi
Cui
Dash
de Bourcy
DeKosky
DeKosky
DeWitt
Dziubianau
Elhanati
Elhanati
Elhanati
Ellebedy
Emerson
Felsenstein
Friedensohn
Gadala-Maria
Galson
Galson
Geering
Georgiou
Ghraichy
Giribet
Giudicelli
Glanville
Glanville
Glanville
Good
Granato
Greiff
Greiff
Greiff
Greiff
Greiff
Grigaityte
Guindon
Gupta
Gupta
Hagberg
Halliley
Hammarlund
Heather
Hershberg
Hochreiter
Hoehn
Hoehn
Hoehn
Horns
Howie
Iversen
Jackson
Janeway
Jiang
Johnston
Jost
Jurtz
Kaplinsky
Kaplinsky
Kendall
Khavrutskii
Kidd
Kidd
Kidera
Kirik
Konishi
Kumar
Landsverk
Larkin
Lavinder
Laydon
Laydon
Lee
Lee
Lewitus
Li
Lindeman
Lindner
Liu
Love
Lozupone
Madi
Madi
Malissen
Mamoshina
Mangul
Manz
Martin
Meng
Miho
Mora
Morisita
Murugan
Nazarov
Nouri
Oakes
Ostmeyer
Paradis
Parameswaran
Parola
Pinheiro
Pollok
Ralph
Ravn
Reddy
Rempala
Rempala
Revell
Rizzetto
Robinson
Ronquist
Roybal
Rubelt
Rubelt
Safonova
Schliep
Schramm
Schwab
Shannon
Sheng
Sheng
Shlemov
Shugay
Shugay
Shugay
Snir
Snir
Stamatakis
Stern
Strauli
Stubbington
Stubbington
Sun
Sun Cinelli
Swofford
Thomas
Tickotsky
Tonegawa
Torkamani
Trepel
VanDuijn
Venturi
Venturi
Vieira
Vita
Wang
Wardemann
Warren
Watson
Watson
Watson
Weinstein
Wine
Wine
Wu
Yaari
Yaari
Yang
Yeap
Yermanos
Yokota
Yu
Zhu
Publication venue: 'Frontiers Media SA'
Publication date: 29/11/2017
Field of study

The adaptive immune system recognizes antigens via an immense array of antigen-binding antibodies and T-cell receptors, the immune repertoire. The interrogation of immune repertoires is of high relevance for understanding the adaptive immune response in disease and infection (e.g., autoimmunity, cancer, HIV). Adaptive immune receptor repertoire sequencing (AIRR-seq) has driven the quantitative and molecular-level profiling of immune repertoires thereby revealing the high-dimensional complexity of the immune receptor sequence landscape. Several methods for the computational and statistical analysis of large-scale AIRR-seq data have been developed to resolve immune repertoire complexity in order to understand the dynamics of adaptive immunity. Here, we review the current research on (i) diversity, (ii) clustering and network, (iii) phylogenetic and (iv) machine learning methods applied to dissect, quantify and compare the architecture, evolution, and specificity of immune repertoires. We summarize outstanding questions in computational immunology and propose future directions for systems immunology towards coupling AIRR-seq with the computational discovery of immunotherapeutics, vaccines, and immunodiagnostics.Comment: 27 pages, 2 figure

arXiv.org e-Print Archive

Repository for Publications and Research Data

Crossref

NORA - Norwegian Open Research Archives

MISHIMA - a new method for high speed multiple alignment of nucleotide sequences of bacterial genome scale data

Author: ACC Shih
AL Delcher
B Morgenstern
C Notredame
D Mikhailov
DF Feng
DG Higgins
DJ Lipman
F Corpet
GJ Barton
J Cheetham
J Stoye
JD Thompson
K Katoh
K Kryukov
K Reinert
KB Li
Kirill Kryukov
M Brudno
M Brudno
M Brudno
M Kimura
N Bray
Naruya Saitou
O Gotoh
RC Edgar
U Tonges
WR Taylor
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Large nucleotide sequence datasets are becoming increasingly common objects of comparison. Complete bacterial genomes are reported almost everyday. This creates challenges for developing new multiple sequence alignment methods. Conventional multiple alignment methods are based on pairwise alignment and/or progressive alignment techniques. These approaches have performance problems when the number of sequences is large and when dealing with genome scale sequences. Results We present a new method of multiple sequence alignment, called MISHIMA (Method for Inferring Sequence History In terms of Multiple Alignment), that does not depend on pairwise sequence comparison. A new algorithm is used to quickly find rare oligonucleotide sequences shared by all sequences. Divide and conquer approach is then applied to break the sequences into fragments that can be aligned independently by an external alignment program. These partial alignments are assembled together to form a complete alignment of the original sequences. Conclusions MISHIMA provides improved performance compared to the commonly used multiple alignment methods. As an example, six complete genome sequences of bacteria species <it>Helicobacter pylori </it>(about 1.7 Mb each) were successfully aligned in about 6 hours using a single PC.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

A Domain Decomposition Strategy for Alignment of Multiple Biological Sequences on Multiprocessor Platforms

Author: Ashfaq Khokhar
Berger
Cline
Crandall
Do
Edgar
Edgar
Edgar
Fahad Saeed
Hambrusch
Hambrusch
Hanmao
Jones
Kaddoura
Kumar
Lassmann
Lassmann
Mikhailov
Morgenstern
Muller
Notredame
Notredame
Pilkington
Ronaghi
Saeed
Sauder
Schmollinger
Schwartz
SF
Smith
Stoye
Sze
Thompson
Thompson
Wang
Willebeek-LeMair
Publication venue: 'Elsevier BV'
Publication date: 11/05/2009
Field of study

Multiple Sequences Alignment (MSA) of biological sequences is a fundamental problem in computational biology due to its critical significance in wide ranging applications including haplotype reconstruction, sequence homology, phylogenetic analysis, and prediction of evolutionary origins. The MSA problem is considered NP-hard and known heuristics for the problem do not scale well with increasing number of sequences. On the other hand, with the advent of new breed of fast sequencing techniques it is now possible to generate thousands of sequences very quickly. For rapid sequence analysis, it is therefore desirable to develop fast MSA algorithms that scale well with the increase in the dataset size. In this paper, we present a novel domain decomposition based technique to solve the MSA problem on multiprocessing platforms. The domain decomposition based technique, in addition to yielding better quality, gives enormous advantage in terms of execution time and memory requirements. The proposed strategy allows to decrease the time complexity of any known heuristic of O(N)^x complexity by a factor of O(1/p)^x, where N is the number of sequences, x depends on the underlying heuristic approach, and p is the number of processing nodes. In particular, we propose a highly scalable algorithm, Sample-Align-D, for aligning biological sequences using Muscle system as the underlying heuristic. The proposed algorithm has been implemented on a cluster of workstations using MPI library. Experimental results for different problem sizes are analyzed in terms of quality of alignment, execution time and speed-up.Comment: 36 pages, 17 figures, Accepted manuscript in Journal of Parallel and Distributed Computing(JPDC

arXiv.org e-Print Archive

Crossref

Quantifying Wing Shape and Size of Saturniid Moths with Geometric Morphometrics

Author: Barber Jesse R.
Hill Geena M.
Kawahara Akito Y.
Zhong Minjia
Publication venue: 'IUScholarWorks'
Publication date: 01/06/2016
Field of study

Butterflies and moths exhibit a spectacular diversity of w in g sh ape and size. The extent of wing variation is particularly evident in wild silk moths (Saturniidae), which have large wing shape and size variation. Some species have jagged wing margins, rounded forewing apical lobes, or narrow hind wings with long tails, while others lack these traits entirely. Surprisingly, very little work has been done to formally quantify wing variation within the family. We analyzed the hind wing shape and size of 76 saturniid species representing 52 genera across five subfamilies using geometric morphometrics. We identified fifteen landmarks that we predict can be applied to families across Lepidoptera. PCA analyses grouped saturniid hind wings into six distinct morphological clusters. These groups did not appear to follow species relatedness—some phylogenetically and genetically distantly related taxa clustered in the same morphological group. We discuss ecological factors that might have led to the extraordinary wing variation within Saturniidae

Boise State University - ScholarWorks

PRT: Parallel program for a full backtranslation of oligopeptides

Author: Gouinaud Christophe
Hill David R.C.
Militon Cécile
Missaoui Mohieddine
Peyret Pierre
Publication venue: HAL CCSD
Publication date: 07/06/2007
Field of study

DNA hybridization methods have become the most widely used tools in molecular biology to identify organisms and evaluate gene expression levels. PCR (Polymerase Chain Reaction)-based methods, fluorescent in situ hybridization (FISH) and the recent development of DNA microarrays as a high throughput technology need efficient primers or probes design. Evaluation of the metabolic capacities of complex microbial communities found in terrestrial or aquatic environments requires new probe design algorithms that reflect the genetic diversity. As only a small part of the microbial diversity is known, gene sequences deposited in international databases do not reflect the entire diversity. In this context we propose to use oligopeptide sequences for the design of complete set of DNA probes that are able to target the entire genetic diversity of genes encoding enzymes. Due to the degenerated genetic code backtranslation must be managed efficiently. To our knowledge no software has been developed to propose a full backtranslation. This complexity is tractable since we only need to focus on short oligopeptides for DNA probe design. We propose new algorithms that perform a high performance oligopeptide backtranslation into all potential nucleic sequences. We use different efficient techniques such as memory mapping to perform such a computing. We also propose a MPI parallel computing that reduces the whole execution time using data load balancing and network file stream distribution on a cluster architecture

HAL Clermont Université