Search CORE

27 research outputs found

Applications on emerging paradigms in parallel computing

Author: Sarje Abhinav
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2010
Field of study

The area of computing is seeing parallelism increasingly being incorporated at various levels: from the lowest levels of vector processing units following Single Instruction Multiple Data (SIMD) processing, Simultaneous Multi-threading (SMT) architectures, and multi/many-cores with thread-level shared memory and SIMT parallelism, to the higher levels of distributed memory parallelism as in supercomputers and clusters, and scaling them to large distributed systems as server farms and clouds. All together these form a large hierarchy of parallelism. Developing high-performance parallel algorithms and efficient software tools, which make use of the available parallelism, is inevitable in order to harness the raw computational power these emerging systems have to offer. In the work presented in this thesis, we develop architecture-aware parallel techniques on such emerging paradigms in parallel computing, specifically, parallelism offered by the emerging multi- and many-core architectures, as well as the emerging area of cloud computing, to target large scientific applications. First, we develop efficient parallel algorithms to compute optimal pairwise alignments of genomic sequences on heterogeneous multi-core processors, and demonstrate them on the IBM Cell Broadband Engine. Then, we develop parallel techniques for scheduling all-pairs computations on heterogeneous systems, including clusters of Cell processors, and NVIDIA graphics processors. We compare the performance of our strategies on Cell, GPU and Intel Nehalem multi-core processors. Further, we apply our algorithms to specific applications taken from the areas of systems biology, fluid dynamics and materials science: pairwise Mutual Information computations for reconstruction of gene regulatory networks; pairwise Lp-norm distance computations for coherent structures discovery in the design of flapping-wing Micro Air Vehicles, and construction of stochastic models for a set of properties of heterogeneous materials. Lastly, in the area of cloud computing, we propose and develop an abstract framework to enable computations in parallel on large tree structures, to facilitate easy development of a class of scientific applications based on trees. Our framework, in the style of Google\u27s MapReduce paradigm, is based on two generic user-defined functions through which a user writes an application. We implement our framework as a generic programming library for a large cluster of homogeneous multi-core processor, and demonstrate its applicability through two applications: all-k-nearest neighbors computations, and Fast Multipole Method (FMM) based simulations

Digital Repository @ Iowa State University (ISU)

Conserved noncoding sequences highlight shared components of regulatory networks in dicotyledonous plants

Author: Barrington Christopher
Baxter Laura
Beynon Jim
Buchanan-Wollaston Vicky
Denby Katherine J.
Dyer Nigel
Hickman R. D. G.
Jironkin Aleksey
Krusche Peter
Moore Jonathan D.
Ott Sascha
Tiskin Alexander
Publication venue: 'American Society of Plant Biologists (ASPB)'
Publication date: 01/10/2012
Field of study

Conserved noncoding sequences (CNSs) in DNA are reliable pointers to regulatory elements controlling gene expression. Using a comparative genomics approach with four dicotyledonous plant species (Arabidopsis thaliana, papaya [Carica papaya], poplar [Populus trichocarpa], and grape [Vitis vinifera]), we detected hundreds of CNSs upstream of Arabidopsis genes. Distinct positioning, length, and enrichment for transcription factor binding sites suggest these CNSs play a functional role in transcriptional regulation. The enrichment of transcription factors within the set of genes associated with CNS is consistent with the hypothesis that together they form part of a conserved transcriptional network whose function is to regulate other transcription factors and control development. We identified a set of promoters where regulatory mechanisms are likely to be shared between the model organism Arabidopsis and other dicots, providing areas of focus for further research

Crossref

PubMed Central

Warwick Research Archives Portal Repository

Alteromonas Myovirus V22 Represents a New Genus of Marine Bacteriophages Requiring a Tail Fiber Chaperone for Host Recognition

Author: Dunne Matthew
Gonzalez-Serrano Rafael
Grosboillot Virginie
Loessner Martin J.
Martín Cuadrado Ana Belén
Roda-Garcia Juan J.
Rodriguez-Valera Francisco
Rosselli Riccardo
Zinsli Léa V.
Publication venue: 'American Society for Microbiology'
Publication date: 01/06/2020
Field of study

Marine phages play a variety of critical roles in regulating the microbial composition of our oceans. Despite constituting the majority of genetic diversity within these environments, there are relatively few isolates with complete genome sequences or in-depth analyses of their host interaction mechanisms, such as characterization of their receptor binding proteins (RBPs). Here, we present the 92,760-bp genome of the Alteromonas-targeting phage V22. Genomic and morphological analyses identify V22 as a myovirus; however, due to a lack of sequence similarity to any other known myoviruses, we propose that V22 be classified as the type phage of a new Myoalterovirus genus within the Myoviridae family. V22 shows gene homology and synteny with two different subfamilies of phages infecting enterobacteria, specifically within the structural region of its genome. To improve our understanding of the V22 adsorption process, we identified putative RBPs (gp23, gp24, and gp26) and tested their ability to decorate the V22 propagation strain, Alteromonas mediterranea PT11, as recombinant green fluorescent protein (GFP)-tagged constructs. Only GFP-gp26 was capable of bacterial recognition and identified as the V22 RBP. Interestingly, production of functional GFP-gp26 required coexpression with the downstream protein gp27. GFP-gp26 could be expressed alone but was incapable of host recognition. By combining size-exclusion chromatography with fluorescence microscopy, we reveal how gp27 is not a component of the final RBP complex but instead is identified as a new type of phage-encoded intermolecular chaperone that is essential for maturation of the gp26 RBP.This work was supported by grants ‘VIREVO’ CGL2016‐76273‐P (MCI/AEI/FEDER, EU) (cofounded with FEDER funds) from the Spanish Ministerio de Ciencia e Innovación and ‘HIDRAS3’ PROMETEU/2019/009 from Generalitat Valenciana. R.G.-S. was supported by a predoctoral fellowship from the Valencian Consellería de Educació, Investigació, Cultura i Esport (ACIF/2016/050) and was also a beneficiary of the BEFPI 2019 fellowship for predoctoral stays from Generalitat Valenciana and The European Social Fund. F.R.-V. was a beneficiary of the 5top100 program of the Ministry for Science and Education of Russia

Repositorio Institucional de la Universidad de Alicante

Repository for Publications and Research Data

Homology sequence analysis using GPU acceleration

Author: Truong Huan
Publication venue: 'University of Missouri Libraries'
Publication date
Field of study

A number of problems in bioinformatics, systems biology and computational biology field require abstracting physical entities to mathematical or computational models. In such studies, the computational paradigms often involve algorithms that can be solved by the Central Processing Unit (CPU). Historically, those algorithms benefit from the advancements of computing power in the serial processing capabilities of individual CPU cores. However, the growth has slowed down over recent years, as scaling out CPU has been shown to be both cost-prohibitive and insecure. To overcome this problem, parallel computing approaches that employ the Graphics Processing Unit (GPU) have gained attention as complementing or replacing traditional CPU approaches. The premise of this research is to investigate the applicability of various parallel computing platforms to several problems in the detection and analysis of homology in biological sequence. I hypothesize that by exploiting the sheer amount of computation power and sequencing data, it is possible to deduce information from raw sequences without supplying the underlying prior knowledge to come up with an answer. I have developed such tools to perform analysis at scales that are traditionally unattainable with general-purpose CPU platforms. I have developed a method to accelerate sequence alignment on the GPU, and I used the method to investigate whether the Operational Taxonomic Unit (OTU) classification problem can be improved with such sheer amount of computational power. I have developed a method to accelerate pairwise k-mer comparison on the GPU, and I used the method to further develop PolyHomology, a framework to scaffold shared sequence motifs across large numbers of genomes to illuminate the structure of the regulatory network in yeasts. The results suggest that such approach to heterogeneous computing could help to answer questions in biology and is a viable path to new discoveries in the present and the future.Includes bibliographical reference

University of Missouri: MOspace

Histone variants in archaea

Author: Stevens Kathryn
Publication venue: Institute of Clinical Sciences, Imperial College London
Publication date: 01/11/2022
Field of study

Eukaryotic histone variants are involved in a wide range of processes and play a key role in altering nucleosome dynamics to shape the architecture of chromatin. The importance of individual variants has been studied extensively in many eukaryotes. In comparison, we know relatively little about histones in archaea. Despite sequence variation and evidence for potential functional differences between histone paralogs in the same species, whether archaea have histone variants, and therefore the potential for complex histone-based chromatin, has not been comprehensively explored. In this work, I apply structural and sequence-based approaches and present evidence that histone variants exist in archaea. In silico modelling suggests that, similarly to some eukaryotic variants, paralogs in archaea can be identified by unique structural properties. In particular, I describe one such variant, a “capstone”, that can drastically alter histone-based chromatin by limiting oligomerisation. Other paralogs have less extreme structural properties but are shared between species which separated hundreds of millions of years ago, on par with some eukaryotic histone variants. Although there are shared features between the two, histones in archaea have appear to have explored a different sequence space to eukaryotic histones, evolving separately and in parallel.Open Acces

Spiral - Imperial College Digital Repository

Network and multi-scale signal analysis for the integration of large omic datasets: applications in \u3ci\u3ePopulus trichocarpa\u3c/i\u3e

Author: Weighill Deborah Ann
Publication venue: TRACE: Tennessee Research and Creative Exchange
Publication date: 01/05/2019
Field of study

Poplar species are promising sources of cellulosic biomass for biofuels because of their fast growth rate, high cellulose content and moderate lignin content. There is an increasing movement on integrating multiple layers of ’omics data in a systems biology approach to understand gene-phenotype relationships and assist in plant breeding programs. This dissertation involves the use of network and signal processing techniques for the combined analysis of these various data types, for the goals of (1) increasing fundamental knowledge of P. trichocarpa and (2) facilitating the generation of hypotheses about target genes and phenotypes of interest. A data integration “Lines of Evidence” method is presented for the identification and prioritization of target genes involved in functions of interest. A new post-GWAS method, Pleiotropy Decomposition, is presented, which extracts pleiotropic relationships between genes and phenotypes from GWAS results, allowing for identification of genes with signatures favorable to genome editing. Continuous wavelet transform signal processing analysis is applied in the characterization of genome distributions of various features (including variant density, gene density, and methylation profiles) in order to identify chromosome structures such as the centromere. This resulted in the approximate centromere locations on all P. trichocarpa chromosomes, which had previously not been adequately reported in the scientific literature. Discrete wavelet transform signal processing followed by correlation analysis was applied to genomic features from various data types including transposable element density, methylation density, SNP density, gene density, centromere position and putative ancestral centromere position. Subsequent correlation analysis of the resulting wavelet coefficients identified scale-specific relationships between these genomic features, and provide insights into the evolution of the genome structure of P. trichocarpa. These methods have provided strategies to both increase fundamental knowledge about the P. trichocarpa system, as well as to identify new target genes related to biofuels targets. We intend that these approaches will ultimately be used in the designing of better plants for more efficient and sustainable production of bioenergy

University of Tennessee, Knoxville: Trace

A high-performance computational workflow to accelerate GATK SNP detection across a 25-genome dataset

Author: A. Zuccolo
C. D. Green
D. Chebotarov
D. Ware
J. Zhang
K. Chougule
K. L. McNally
K. Manickam
L. F. Rivera
M. Thimma
N. Kathiresan
R. A. Wing
R. Mauleon
S. Wei
T. Gao
W. Xie
Y. Yang
Y. Zhou
Z. Yu
Publication venue
Publication date: 01/01/2024
Field of study

Background: Single-nucleotide polymorphisms (SNPs) are the most widely used form of molecular genetic variation studies. As reference genomes and resequencing data sets expand exponentially, tools must be in place to call SNPs at a similar pace. The genome analysis toolkit (GATK) is one of the most widely used SNP calling software tools publicly available, but unfortunately, high-performance computing versions of this tool have yet to become widely available and affordable. Results: Here we report an open-source high-performance computing genome variant calling workflow (HPC-GVCW) for GATK that can run on multiple computing platforms from supercomputers to desktop machines. We benchmarked HPC-GVCW on multiple crop species for performance and accuracy with comparable results with previously published reports (using GATK alone). Finally, we used HPC-GVCW in production mode to call SNPs on a “subpopulation aware” 16-genome rice reference panel with ~ 3000 resequenced rice accessions. The entire process took ~ 16 weeks and resulted in the identification of an average of 27.3 M SNPs/genome and the discovery of ~ 2.3 million novel SNPs that were not present in the flagship reference genome for rice (i.e., IRGSP RefSeq). Conclusions: This study developed an open-source pipeline (HPC-GVCW) to run GATK on HPC platforms, which significantly improved the speed at which SNPs can be called. The workflow is widely applicable as demonstrated successfully for four major crop species with genomes ranging in size from 400 Mb to 2.4 Gb. Using HPC-GVCW in production mode to call SNPs on a 25 multi-crop-reference genome data set produced over 1.1 billion SNPs that were publicly released for functional and breeding studies. For rice, many novel SNPs were identified and were found to reside within genes and open chromatin regions that are predicted to have functional consequences. Combined, our results demonstrate the usefulness of combining a high-performance SNP calling architecture solution with a subpopulation-aware reference genome panel for rapid SNP discovery and public deployment. © 2024, The Author(s).Open access journalThis item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at [email protected]

Directory of Open Access Journals

The University of Arizona

Archivio della ricerca della Scuola Superiore Sant'Anna

Organization And Introgression Mechanics Of Phaseolus Vulgaris (Common Bean)

Author: Munholland Seth
Publication venue: 'University of Windsor Leddy Library'
Publication date: 12/03/2020
Field of study

Phaseolus vulgaris is a major food crop grown and consumed around the world. A new world vegetable, the common bean underwent two separate domestication events, both pre-Columbus. These events generated two different land races, the Mesoamerican and Andean, named for the area where the domestication took place. Since the initial domestications the land races have been generally evenly cultivated, but despite its popularity the common bean has only very recently been fully sequenced. One of the issues faced by bean growers worldwide is Common Bacterial Blight (CBB). A disease caused by Xanthomonas axonopodis, CBB causes crop loses ranging from 20–40% every year but does not affect all species within Phaseolus evenly; P. acutifolius, for example, shows an innate resistance to CBB. To leverage this advantage, researchers at the University of Guelph, in partnership with the Ontario Agricultural College, developed a cultivar of Mesoamerican P. vulgaris that was introgressed with PI440795, a P. acutifolius accession, and backcrossed repeatedly with several other Mesoamerican P. vulgaris accessions to generate ‘OAC-Rex’, a plant that displays the crop-desired traits of P. vulgaris and the disease resistance traits of P. acutifolius. Genetic introgression is the process of crossing distantly related organisms followed by repeated backcrossing, resulting in a viable offspring that displays characteristics of each parent. Though rarely occurring, it can be observed in both plants and animals and is often exploited in a crop development context to generate new cultivars. Unfortunately, though regularly observed, introgression has been followed on a predominantly phenotypic level, usually many generations after the event, and as such molecular aspects of this phenomenon are largely unknown.By studying OAC-Rex, PI440795, and G-19833 (an Andean cultivar whose whole-genome has been published) introgression was examined directly and a method for the detection of regions within the introgressed genome uniquely donated from either paren

Scholarship at UWindsor

Evolutionary genomics : statistical and computational methods

Author: Anisimova Maria
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

This open access book addresses the challenge of analyzing and understanding the evolutionary dynamics of complex biological systems at the genomic level, and elaborates on some promising strategies that would bring us closer to uncovering of the vital relationships between genotype and phenotype. After a few educational primers, the book continues with sections on sequence homology and alignment, phylogenetic methods to study genome evolution, methodologies for evaluating selective pressures on genomic sequences as well as genomic evolution in light of protein domain architecture and transposable elements, population genomics and other omics, and discussions of current bottlenecks in handling and analyzing genomic data. Written for the highly successful Methods in Molecular Biology series, chapters include the kind of detail and expert implementation advice that lead to the best results. Authoritative and comprehensive, Evolutionary Genomics: Statistical and Computational Methods, Second Edition aims to serve both novices in biology with strong statistics and computational skills, and molecular biologists with a good grasp of standard mathematical concepts, in moving this important field of study forward

ZHAW digitalcollection

Directory of Open Access Books (DOAB)

Evolutionary Genomics

Author
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

OAPEN Library