277 research outputs found

    Applications Development for the Computational Grid

    Get PDF

    FPGA acceleration of sequence analysis tools in bioinformatics

    Full text link
    Thesis (Ph.D.)--Boston UniversityWith advances in biotechnology and computing power, biological data are being produced at an exceptional rate. The purpose of this study is to analyze the application of FPGAs to accelerate high impact production biosequence analysis tools. Compared with other alternatives, FPGAs offer huge compute power, lower power consumption, and reasonable flexibility. BLAST has become the de facto standard in bioinformatic approximate string matching and so its acceleration is of fundamental importance. It is a complex highly-optimized system, consisting of tens of thousands of lines of code and a large number of heuristics. Our idea is to emulate the main phases of its algorithm on FPGA. Utilizing our FPGA engine, we quickly reduce the size of the database to a small fraction, and then use the original code to process the query. Using a standard FPGA-based system, we achieved 12x speedup over a highly optimized multithread reference code. Multiple Sequence Alignment (MSA)--the extension of pairwise Sequence Alignment to multiple Sequences--is critical to solve many biological problems. Previous attempts to accelerate Clustal-W, the most commonly used MSA code, have directly mapped a portion of the code to the FPGA. We use a new approach: we apply prefiltering of the kind commonly used in BLAST to perform the initial all-pairs alignments. This results in a speedup of from 8Ox to 190x over the CPU code (8 cores). The quality is comparable to the original according to a commonly used benchmark suite evaluated with respect to multiple distance metrics. The challenge in FPGA-based acceleration is finding a suitable application mapping. Unfortunately many software heuristics do not fall into this category and so other methods must be applied. One is restructuring: an entirely new algorithm is applied. Another is to analyze application utilization and develop accuracy/performance tradeoffs. Using our prefiltering approach and novel FPGA programming models we have achieved significant speedup over reference programs. We have applied approximation, seeding, and filtering to this end. The bulk of this study is to introduce the pros and cons of these acceleration models for biosequence analysis tools

    Ontwerp en evaluatie van content distributie netwerken voor multimediale streaming diensten.

    Get PDF
    Traditionele Internetgebaseerde diensten voor het verspreiden van bestanden, zoals Web browsen en het versturen van e-mails, worden aangeboden via één centrale server. Meer recente netwerkdiensten zoals interactieve digitale televisie of video-op-aanvraag vereisen echter hoge kwaliteitsgaranties (QoS), zoals een lage en constante netwerkvertraging, en verbruiken een aanzienlijke hoeveelheid bandbreedte op het netwerk. Architecturen met één centrale server kunnen deze garanties moeilijk bieden en voldoen daarom niet meer aan de hoge eisen van de volgende generatie multimediatoepassingen. In dit onderzoek worden daarom nieuwe netwerkarchitecturen bestudeerd, die een dergelijke dienstkwaliteit kunnen ondersteunen. Zowel peer-to-peer mechanismes, zoals bij het uitwisselen van muziekbestanden tussen eindgebruikers, als servergebaseerde oplossingen, zoals gedistribueerde caches en content distributie netwerken (CDN's), komen aan bod. Afhankelijk van de bestudeerde dienst en de gebruikte netwerktechnologieën en -architectuur, worden gecentraliseerde algoritmen voor netwerkontwerp voorgesteld. Deze algoritmen optimaliseren de plaatsing van de servers of netwerkcaches en bepalen de nodige capaciteit van de servers en netwerklinks. De dynamische plaatsing van de aangeboden bestanden in de verschillende netwerkelementen wordt aangepast aan de heersende staat van het netwerk en aan de variërende aanvraagpatronen van de eindgebruikers. Serverselectie, herroutering van aanvragen en het verspreiden van de belasting over het hele netwerk komen hierbij ook aan bod

    Graph analytics on modern massively parallel systems

    Get PDF
    Graphs provide a very flexible abstraction for understanding and modeling complex systems in many fields such as physics, biology, neuroscience, engineering, and social science. Only in the last two decades, with the advent of Big Data era, supercomputers equipped by accelerators –i.e., Graphics Processing Unit (GPUs)–, advanced networking, and highly parallel file systems have been used to analyze graph properties such as reachability, diameter, connected components, centrality, and clustering coefficient. Today graphs of interest may be composed by millions, sometimes billions, of nodes and edges and exhibit a highly irregular structure. As a consequence, the design of efficient and scalable graph algorithms is an extraordinary challenge due to irregular communication and memory access patterns, high synchronization costs, and lack of data locality. In the present dissertation, we start off with a brief and gentle introduction for the reader to graph analytics and massively parallel systems. In particular, we present the intersection between graph analytics and parallel architectures in the current state-of-the-art and discuss the challenges encountered when solving such problems on large-scale graphs on these architectures (Chapter 1). In Chapter 2, some preliminary definitions and graph-theoretical notions are provided together with a description of the synthetic graphs used in the literature to model real-world networks. In Chapters 3-5, we present and tackle three different relevant problems in graph analysis: reachability (Chapter 3), Betweenness Centrality (Chapter 4), and clustering coefficient (Chapter 5). In detail, Chapter 3 tackles reachability problems by providing two scalable algorithms and implementations which efficiently solve st-connectivity problems on very large-scale graphs Chapter 4 considers the problem of identifying most relevant nodes in a network which plays a crucial role in several applications, including transportation and communication networks, social network analysis, and biological networks. In particular, we focus on a well-known centrality metrics, namely Betweenness Centrality (BC), and present two different distributed algorithms for the BC computation on unweighted and weighted graphs. For unweighted graphs, we present a new communication-efficient algorithm based on the combination of bi-dimensional (2D) decomposition and multi-level parallelism. Furthermore, new algorithms which exploit the underlying graph topology to reduce the time and space usage of betweenness centrality computations are described as well. Concerning weighted graphs, we provide a scalable algorithm based on an algebraic formulation of the problem. Finally, thorough comprehensive experimental results on synthetic and real- world large-scale graphs, we show that the proposed techniques are effective in practice and achieve significant speedups against state-of-the-art solutions. Chapter 5 considers clustering coefficients problem. Similarly to Betweenness Centrality, it is a fundamental tool in network analysis, as it specifically measures how nodes tend to cluster together in a network. In the chapter, we first extend caching techniques to Remote Memory Access (RMA) operations on distributed-memory system. The caching layer is mainly designed to avoid inter-node communications in order to achieve similar benefits for irregular applications as communication-avoiding algorithms. We also show how cached RMA is able to improve the performance of a new distributed asynchronous algorithm for the computation of local clustering coefficients. Finally, Chapter 6 contains a brief summary of the key contributions described in the dissertation and presents potential future directions of the work

    Semantic search and composition in unstructured peer-to-peer networks

    Get PDF
    This dissertation focuses on several research questions in the area of semantic search and composition in unstructured peer-to-peer (P2P) networks. Going beyond the state of the art, the proposed semantic-based search strategy S2P2P offers a novel path-suggestion based query routing mechanism, providing a reasonable tradeoff between search performance and network traffic overhead. In addition, the first semantic-based data replication scheme DSDR is proposed. It enables peers to use semantic information to select replica numbers and target peers to address predicted future demands. With DSDR, k-random search can achieve better precision and recall than it can with a near-optimal non-semantic replication strategy. Further, this thesis introduces a functional automatic semantic service composition method, SPSC. Distinctively, it enables peers to jointly compose complex workflows with high cumulative recall but low network traffic overhead, using heuristic-based bidirectional haining and service memorization mechanisms. Its query branching method helps to handle dead-ends in a pruned search space. SPSC is proved to be sound and a lower bound of is completeness is given. Finally, this thesis presents iRep3D for semantic-index based 3D scene selection in P2P search. Its efficient retrieval scales to answer hybrid queries involving conceptual, functional and geometric aspects. iRep3D outperforms previous representative efforts in terms of search precision and efficiency.Diese Dissertation bearbeitet Forschungsfragen zur semantischen Suche und Komposition in unstrukturierten Peer-to-Peer Netzen(P2P). Die semantische Suchstrategie S2P2P verwendet eine neuartige Methode zur Anfrageweiterleitung basierend auf Pfadvorschlägen, welche den Stand der Wissenschaft übertrifft. Sie bietet angemessene Balance zwischen Suchleistung und Kommunikationsbelastung im Netzwerk. Außerdem wird das erste semantische System zur Datenreplikation genannt DSDR vorgestellt, welche semantische Informationen berücksichtigt vorhergesagten zukünftigen Bedarf optimal im P2P zu decken. Hierdurch erzielt k-random-Suche bessere Präzision und Ausbeute als mit nahezu optimaler nicht-semantischer Replikation. SPSC, ein automatisches Verfahren zur funktional korrekten Komposition semantischer Dienste, ermöglicht es Peers, gemeinsam komplexe Ablaufpläne zu komponieren. Mechanismen zur heuristischen bidirektionalen Verkettung und Rückstellung von Diensten ermöglichen hohe Ausbeute bei geringer Belastung des Netzes. Eine Methode zur Anfrageverzweigung vermeidet das Feststecken in Sackgassen im beschnittenen Suchraum. Beweise zur Korrektheit und unteren Schranke der Vollständigkeit von SPSC sind gegeben. iRep3D ist ein neuer semantischer Selektionsmechanismus für 3D-Modelle in P2P. iRep3D beantwortet effizient hybride Anfragen unter Berücksichtigung konzeptioneller, funktionaler und geometrischer Aspekte. Der Ansatz übertrifft vorherige Arbeiten bezüglich Präzision und Effizienz

    Semantic search and composition in unstructured peer-to-peer networks

    Get PDF
    This dissertation focuses on several research questions in the area of semantic search and composition in unstructured peer-to-peer (P2P) networks. Going beyond the state of the art, the proposed semantic-based search strategy S2P2P offers a novel path-suggestion based query routing mechanism, providing a reasonable tradeoff between search performance and network traffic overhead. In addition, the first semantic-based data replication scheme DSDR is proposed. It enables peers to use semantic information to select replica numbers and target peers to address predicted future demands. With DSDR, k-random search can achieve better precision and recall than it can with a near-optimal non-semantic replication strategy. Further, this thesis introduces a functional automatic semantic service composition method, SPSC. Distinctively, it enables peers to jointly compose complex workflows with high cumulative recall but low network traffic overhead, using heuristic-based bidirectional haining and service memorization mechanisms. Its query branching method helps to handle dead-ends in a pruned search space. SPSC is proved to be sound and a lower bound of is completeness is given. Finally, this thesis presents iRep3D for semantic-index based 3D scene selection in P2P search. Its efficient retrieval scales to answer hybrid queries involving conceptual, functional and geometric aspects. iRep3D outperforms previous representative efforts in terms of search precision and efficiency.Diese Dissertation bearbeitet Forschungsfragen zur semantischen Suche und Komposition in unstrukturierten Peer-to-Peer Netzen(P2P). Die semantische Suchstrategie S2P2P verwendet eine neuartige Methode zur Anfrageweiterleitung basierend auf Pfadvorschlägen, welche den Stand der Wissenschaft übertrifft. Sie bietet angemessene Balance zwischen Suchleistung und Kommunikationsbelastung im Netzwerk. Außerdem wird das erste semantische System zur Datenreplikation genannt DSDR vorgestellt, welche semantische Informationen berücksichtigt vorhergesagten zukünftigen Bedarf optimal im P2P zu decken. Hierdurch erzielt k-random-Suche bessere Präzision und Ausbeute als mit nahezu optimaler nicht-semantischer Replikation. SPSC, ein automatisches Verfahren zur funktional korrekten Komposition semantischer Dienste, ermöglicht es Peers, gemeinsam komplexe Ablaufpläne zu komponieren. Mechanismen zur heuristischen bidirektionalen Verkettung und Rückstellung von Diensten ermöglichen hohe Ausbeute bei geringer Belastung des Netzes. Eine Methode zur Anfrageverzweigung vermeidet das Feststecken in Sackgassen im beschnittenen Suchraum. Beweise zur Korrektheit und unteren Schranke der Vollständigkeit von SPSC sind gegeben. iRep3D ist ein neuer semantischer Selektionsmechanismus für 3D-Modelle in P2P. iRep3D beantwortet effizient hybride Anfragen unter Berücksichtigung konzeptioneller, funktionaler und geometrischer Aspekte. Der Ansatz übertrifft vorherige Arbeiten bezüglich Präzision und Effizienz

    Bioinformatics

    Get PDF
    This book is divided into different research areas relevant in Bioinformatics such as biological networks, next generation sequencing, high performance computing, molecular modeling, structural bioinformatics, molecular modeling and intelligent data analysis. Each book section introduces the basic concepts and then explains its application to problems of great relevance, so both novice and expert readers can benefit from the information and research works presented here

    High-Performance Modelling and Simulation for Big Data Applications

    Get PDF
    This open access book was prepared as a Final Publication of the COST Action IC1406 “High-Performance Modelling and Simulation for Big Data Applications (cHiPSet)“ project. Long considered important pillars of the scientific method, Modelling and Simulation have evolved from traditional discrete numerical methods to complex data-intensive continuous analytical optimisations. Resolution, scale, and accuracy have become essential to predict and analyse natural and complex systems in science and engineering. When their level of abstraction raises to have a better discernment of the domain at hand, their representation gets increasingly demanding for computational and data resources. On the other hand, High Performance Computing typically entails the effective use of parallel and distributed processing units coupled with efficient storage, communication and visualisation systems to underpin complex data-intensive applications in distinct scientific and technical domains. It is then arguably required to have a seamless interaction of High Performance Computing with Modelling and Simulation in order to store, compute, analyse, and visualise large data sets in science and engineering. Funded by the European Commission, cHiPSet has provided a dynamic trans-European forum for their members and distinguished guests to openly discuss novel perspectives and topics of interests for these two communities. This cHiPSet compendium presents a set of selected case studies related to healthcare, biological data, computational advertising, multimedia, finance, bioinformatics, and telecommunications

    Information management applied to bioinformatics

    Get PDF
    Bioinformatics, the discipline concerned with biological information management is essential in the post-genome era, where the complexity of data processing allows for contemporaneous multi level research including that at the genome level, transcriptome level, proteome level, the metabolome level, and the integration of these -omic studies towards gaining an understanding of biology at the systems level. This research is also having a major impact on disease research and drug discovery, particularly through pharmacogenomics studies. In this study innovative resources have been generated via the use of two case studies. One was of the Research & Development Genetics (RDG) department at AstraZeneca, Alderley Park and the other was of the Pharmacogenomics Group at the Sanger Institute in Cambridge UK. In the AstraZeneca case study senior scientists were interviewed using semi-structured interviews to determine information behaviour through the study scientific workflows. Document analysis was used to generate an understanding of the underpinning concepts and fonned one of the sources of context-dependent information on which the interview questions were based. The objectives of the Sanger Institute case study were slightly different as interviews were carried out with eight scientists together with the use of participation observation, to collect data to develop a database standard for one process of their Pharmacogenomics workflow. The results indicated that AstraZeneca would benefit through upgrading their data management solutions in the laboratory and by development of resources for the storage of data from larger scale projects such as whole genome scans. These studies will also generate very large amounts of data and the analysis of these will require more sophisticated statistical methods. At the Sanger Institute a minimum information standard was reported for the manual design of primers and included in a decision making tree developed for Polymerase Chain Reactions (PCRs). This tree also illustrates problems that can be encountered when designing primers along with procedures that can be taken to address such issues.EThOS - Electronic Theses Online ServiceGBUnited Kingdo
    • …
    corecore