130 research outputs found

    Accelerating pairwise sequence alignment on GPUs using the Wavefront Algorithm

    Get PDF
    Advances in genomics and sequencing technologies demand faster and more scalable analysis methods that can process longer sequences with higher accuracy. However, classical pairwise alignment methods, based on dynamic programming (DP), impose impractical computational requirements to align long and noisy sequences like those produced by PacBio, and Nanopore technologies. The recently proposed Wavefront Alignment (WFA) algorithm paves the way for more efficient alignment tools, improving time and memory complexity over previous methods. Notwithstanding the advantages of the WFA algorithm, modern high performance computing (HPC) platforms rely on accelerator-based architectures that exploit parallel computing resources to improve over classical computing CPUs. Hence, a GPU-enabled implementation of the WFA could exploit the hardware resources of modern GPUs and further accelerate sequence alignment in current genome analysis pipelines. This thesis presents two GPU-accelerated implementations based on the WFA for fast pairwise DNA sequence alignment: eWFA-GPU and WFA-GPU. Our first proposal, eWFA-GPU, computes the exact edit-distance alignment between two short sequences (up to a few thousand bases), taking full advantage of the massive parallel capabilities of modern GPUs. We propose a succinct representation of the alignment data that successfully reduces the overall amount of memory required, allowing the exploitation of the fast on-chip memory of a GPU. Our results show that eWFA-GPU outperforms by 3-9X the edit-distance WFA implementation running on a 20 core machine. Compared to other state-of-the-art tools computing the edit-distance, eWFA-GPU is up to 265X faster than CPU tools and up to 56 times faster than other GPU-enabled implementations. Our second contribution, the WFA-GPU tool, extends the work of eWFA-GPU to compute the exact gap-affine distance (i.e., a more general alignment problem) between arbitrary long sequences. In this work, we propose a CPU-GPU co-design capable of performing inter and intra-sequence parallel alignment of multiple sequences, combining a succinct WFA-data representation with an efficient GPU implementation. As a result, we demonstrate that our implementation outperforms the original WFA implementation between 1.5-7.7X times when computing the alignment path, and between 2.6-16X when computing only the alignment score. Moreover, compared to other state-of-the-art tools, the WFA-GPU is up to 26.7X faster than other GPU implementations and up to four orders of magnitude faster than other CPU implementations

    High performance reconfigurable architectures for biological sequence alignment

    Get PDF
    Bioinformatics and computational biology (BCB) is a rapidly developing multidisciplinary field which encompasses a wide range of domains, including genomic sequence alignments. It is a fundamental tool in molecular biology in searching for homology between sequences. Sequence alignments are currently gaining close attention due to their great impact on the quality aspects of life such as facilitating early disease diagnosis, identifying the characteristics of a newly discovered sequence, and drug engineering. With the vast growth of genomic data, searching for a sequence homology over huge databases (often measured in gigabytes) is unable to produce results within a realistic time, hence the need for acceleration. Since the exponential increase of biological databases as a result of the human genome project (HGP), supercomputers and other parallel architectures such as the special purpose Very Large Scale Integration (VLSI) chip, Graphic Processing Unit (GPUs) and Field Programmable Gate Arrays (FPGAs) have become popular acceleration platforms. Nevertheless, there are always trade-off between area, speed, power, cost, development time and reusability when selecting an acceleration platform. FPGAs generally offer more flexibility, higher performance and lower overheads. However, they suffer from a relatively low level programming model as compared with off-the-shelf microprocessors such as standard microprocessors and GPUs. Due to the aforementioned limitations, the need has arisen for optimized FPGA core implementations which are crucial for this technology to become viable in high performance computing (HPC). This research proposes the use of state-of-the-art reprogrammable system-on-chip technology on FPGAs to accelerate three widely-used sequence alignment algorithms; the Smith-Waterman with affine gap penalty algorithm, the profile hidden Markov model (HMM) algorithm and the Basic Local Alignment Search Tool (BLAST) algorithm. The three novel aspects of this research are firstly that the algorithms are designed and implemented in hardware, with each core achieving the highest performance compared to the state-of-the-art. Secondly, an efficient scheduling strategy based on the double buffering technique is adopted into the hardware architectures. Here, when the alignment matrix computation task is overlapped with the PE configuration in a folded systolic array, the overall throughput of the core is significantly increased. This is due to the bound PE configuration time and the parallel PE configuration approach irrespective of the number of PEs in a systolic array. In addition, the use of only two configuration elements in the PE optimizes hardware resources and enables the scalability of PE systolic arrays without relying on restricted onboard memory resources. Finally, a new performance metric is devised, which facilitates the effective comparison of design performance between different FPGA devices and families. The normalized performance indicator (speed-up per area per process technology) takes out advantages of the area and lithography technology of any FPGA resulting in fairer comparisons. The cores have been designed using Verilog HDL and prototyped on the Alpha Data ADM-XRC-5LX card with the Virtex-5 XC5VLX110-3FF1153 FPGA. The implementation results show that the proposed architectures achieved giga cell updates per second (GCUPS) performances of 26.8, 29.5 and 24.2 respectively for the acceleration of the Smith-Waterman with affine gap penalty algorithm, the profile HMM algorithm and the BLAST algorithm. In terms of speed-up improvements, comparisons were made on performance of the designed cores against their corresponding software and the reported FPGA implementations. In the case of comparison with equivalent software execution, acceleration of the optimal alignment algorithm in hardware yielded an average speed-up of 269x as compared to the SSEARCH 35 software. For the profile HMM-based sequence alignment, the designed core achieved speed-up of 103x and 8.3x against the HMMER 2.0 and the latest version of HMMER (version 3.0) respectively. On the other hand, the implementation of the gapped BLAST with the two-hit method in hardware achieved a greater than tenfold speed-up compared to the latest NCBI BLAST software. In terms of comparison against other reported FPGA implementations, the proposed normalized performance indicator was used to evaluate the designed architectures fairly. The results showed that the first architecture achieved more than 50 percent improvement, while acceleration of the profile HMM sequence alignment in hardware gained a normalized speed-up of 1.34. In the case of the gapped BLAST with the two-hit method, the designed core achieved 11x speed-up after taking out advantages of the Virtex-5 FPGA. In addition, further analysis was conducted in terms of cost and power performances; it was noted that, the core achieved 0.46 MCUPS per dollar spent and 958.1 MCUPS per watt. This shows that FPGAs can be an attractive platform for high performance computation with advantages of smaller area footprint as well as represent economic ‘green’ solution compared to the other acceleration platforms. Higher throughput can be achieved by redeploying the cores on newer, bigger and faster FPGAs with minimal design effort

    Mass & secondary structure propensity of amino acids explain their mutability and evolutionary replacements

    Get PDF
    Why is an amino acid replacement in a protein accepted during evolution? The answer given by bioinformatics relies on the frequency of change of each amino acid by another one and the propensity of each to remain unchanged. We propose that these replacement rules are recoverable from the secondary structural trends of amino acids. A distance measure between high-resolution Ramachandran distributions reveals that structurally similar residues coincide with those found in substitution matrices such as BLOSUM: Asn Asp, Phe Tyr, Lys Arg, Gln Glu, Ile Val, Met → Leu; with Ala, Cys, His, Gly, Ser, Pro, and Thr, as structurally idiosyncratic residues. We also found a high average correlation (\overline{R} R = 0.85) between thirty amino acid mutability scales and the mutational inertia (I X ), which measures the energetic cost weighted by the number of observations at the most probable amino acid conformation. These results indicate that amino acid substitutions follow two optimally-efficient principles: (a) amino acids interchangeability privileges their secondary structural similarity, and (b) the amino acid mutability depends directly on its biosynthetic energy cost, and inversely with its frequency. These two principles are the underlying rules governing the observed amino acid substitutions. © 2017 The Author(s)

    Conformational Control of Modular Proteins

    Get PDF

    Molecular Dynamics for Synthetic Biology

    Get PDF
    Synthetic biology is the field concerned with the design, engineering, and construction of organisms and biomolecules. Biomolecules such as proteins are nature's nano-bots, and provide both a shortcut to the construction of nano-scale tools and insight into the design of abiotic nanotechnology. A fundamental technique in protein engineering is protein fusion, the concatenation of two proteins so that they form domains of a new protein. The resulting fusion protein generally retains both functions, especially when a linker sequence is introduced between the two domains to allow them to fold independently. Fusion proteins can have features absent from all of their components; for example, FRET biosensors are fusion proteins of two fluorescent proteins with a binding domain. When the binding domain forms a complex with a ligand, its dynamics translate the concentration of the ligand to the ratio of fluorescence intensities via FRET. Despite these successes, protein engineering remains laborious and expensive. Computer modelling has the potential to improve the situation by enabling some design work to occur virtually. Synthetic biologists commonly use fast, heuristic structure prediction tools like ROSETTA, I-TASSER and FoldX, despite their inaccuracy. By contrast, molecular dynamics with modern force fields has proven itself accurate, but sampling sufficiently to solve problems accurately and quickly enough to be relevant to experimenters remains challenging. In this thesis, I introduce molecular dynamics to a structural biology audience, and discuss the challenges and theory behind the technique. With this knowledge, I introduce synthetic biology through a review of fluorescent sensors. I then develop a simple computational tool, Rangefinder, for the design of one variety of these sensors, and demonstrate its ability to predict sensor performance experimentally. I demonstrate the importance of the choice of linker with yet another sensor whose performance depends critically thereon. In chapter 6, I investigate the structure of a conserved, repeating linker sequence connecting two domains of the malaria circumsporozoite protein. Finally, I develop a multi-scale enhanced sampling molecular dynamics approach to predicting the structure and dynamics of fusion proteins. It is my hope that this work contributes to the structural biology community's understanding of molecular dynamics and inspires new techniques developed for protein engineering

    In-silico discovery and experimental verification of excipients for biologics

    Get PDF
    Protein-based pharmaceuticals such as monoclonal antibodies are the fastest growing class of therapeutic agent. As with all protein therapeutics, antibody aggregation must be avoided during production, storage and use. With recent advances in computing power, it is becoming feasible to simulate protein-protein interactions in-silico. Combining computational and experimental studies may offer a platform solution to design specific in-process stabilisers and excipients to accelerate the development of aggregation-resistant formulations. An antibody Fv fragment was first evaluated to understand the early stages of aggregate formation by identifying aggregation-prone regions. Three-dimensional structural information and protein-protein docking were used to identify exposed hydrophobic patches. Virtual screening was used to identify compounds that bind to the exposed hydrophobic patches as a means to prevent Fv-Fv interactions that could result in aggregation. An excipient with the highest calculated binding affinity was found to prevent Fv-Fv interactions as determined with the diffusion interaction parameter (kD) using DLS. Excipient performance was then evaluated using coarse-grained molecular dynamics (MD) simulations with MARTINI force field to provide a more in-depth view on Fv fragment dimer complex formation. Simulation results were further evaluated with free-energy calculations but these free-energy calculations were found to produce highly variable and therefore unreliable results. This coarse-grained MD approach was also used to virtually screen a library of dipeptides to identify peptide excipients. The results revealed a positive correlation between the calculated mean interaction energies and the diffusion interaction parameter measured with DLS. Use of the MD approach was further extended to accommodate challenging an antibody without published structural data through homology modelling and to suggest possible excipients to prevent high-affinity antibody-antibody interactions. Therefore this MD approach could potentially be used as a first step for the selection of excipients for antibodies

    Programming and parallelising applications for distributed infrastructures

    Get PDF
    The last decade has witnessed unprecedented changes in parallel and distributed infrastructures. Due to the diminished gains in processor performance from increasing clock frequency, manufacturers have moved from uniprocessor architectures to multicores; as a result, clusters of computers have incorporated such new CPU designs. Furthermore, the ever-growing need of scienti c applications for computing and storage capabilities has motivated the appearance of grids: geographically-distributed, multi-domain infrastructures based on sharing of resources to accomplish large and complex tasks. More recently, clouds have emerged by combining virtualisation technologies, service-orientation and business models to deliver IT resources on demand over the Internet. The size and complexity of these new infrastructures poses a challenge for programmers to exploit them. On the one hand, some of the di culties are inherent to concurrent and distributed programming themselves, e.g. dealing with thread creation and synchronisation, messaging, data partitioning and transfer, etc. On the other hand, other issues are related to the singularities of each scenario, like the heterogeneity of Grid middleware and resources or the risk of vendor lock-in when writing an application for a particular Cloud provider. In the face of such a challenge, programming productivity - understood as a tradeo between programmability and performance - has become crucial for software developers. There is a strong need for high-productivity programming models and languages, which should provide simple means for writing parallel and distributed applications that can run on current infrastructures without sacri cing performance. In that sense, this thesis contributes with Java StarSs, a programming model and runtime system for developing and parallelising Java applications on distributed infrastructures. The model has two key features: first, the user programs in a fully-sequential standard-Java fashion - no parallel construct, API call or pragma must be included in the application code; second, it is completely infrastructure-unaware, i.e. programs do not contain any details about deployment or resource management, so that the same application can run in di erent infrastructures with no changes. The only requirement for the user is to select the application tasks, which are the model's unit of parallelism. Tasks can be either regular Java methods or web service operations, and they can handle any data type supported by the Java language, namely les, objects, arrays and primitives. For the sake of simplicity of the model, Java StarSs shifts the burden of parallelisation from the programmer to the runtime system. The runtime is responsible from modifying the original application to make it create asynchronous tasks and synchronise data accesses from the main program. Moreover, the implicit inter-task concurrency is automatically found as the application executes, thanks to a data dependency detection mechanism that integrates all the Java data types. This thesis provides a fairly comprehensive evaluation of Java StarSs on three di erent distributed scenarios: Grid, Cluster and Cloud. For each of them, a runtime system was designed and implemented to exploit their particular characteristics as well as to address their issues, while keeping the infrastructure unawareness of the programming model. The evaluation compares Java StarSs against state-of-the-art solutions, both in terms of programmability and performance, and demonstrates how the model can bring remarkable productivity to programmers of parallel distributed applications
    corecore