6,627 research outputs found

    Prospects and limitations of full-text index structures in genome analysis

    Get PDF
    The combination of incessant advances in sequencing technology producing large amounts of data and innovative bioinformatics approaches, designed to cope with this data flood, has led to new interesting results in the life sciences. Given the magnitude of sequence data to be processed, many bioinformatics tools rely on efficient solutions to a variety of complex string problems. These solutions include fast heuristic algorithms and advanced data structures, generally referred to as index structures. Although the importance of index structures is generally known to the bioinformatics community, the design and potency of these data structures, as well as their properties and limitations, are less understood. Moreover, the last decade has seen a boom in the number of variant index structures featuring complex and diverse memory-time trade-offs. This article brings a comprehensive state-of-the-art overview of the most popular index structures and their recently developed variants. Their features, interrelationships, the trade-offs they impose, but also their practical limitations, are explained and compared

    High Performance Computing for DNA Sequence Alignment and Assembly

    Get PDF
    Recent advances in DNA sequencing technology have dramatically increased the scale and scope of DNA sequencing. These data are used for a wide variety of important biological analyzes, including genome sequencing, comparative genomics, transcriptome analysis, and personalized medicine but are complicated by the volume and complexity of the data involved. Given the massive size of these datasets, computational biology must draw on the advances of high performance computing. Two fundamental computations in computational biology are read alignment and genome assembly. Read alignment maps short DNA sequences to a reference genome to discover conserved and polymorphic regions of the genome. Genome assembly computes the sequence of a genome from many short DNA sequences. Both computations benefit from recent advances in high performance computing to efficiently process the huge datasets involved, including using highly parallel graphics processing units (GPUs) as high performance desktop processors, and using the MapReduce framework coupled with cloud computing to parallelize computation to large compute grids. This dissertation demonstrates how these technologies can be used to accelerate these computations by orders of magnitude, and have the potential to make otherwise infeasible computations practical

    Hardware Acceleration of Network Intrusion Detection System Using FPGA

    Get PDF
    This thesis presents new algorithms and hardware designs for Signature-based Network Intrusion Detection System (SB-NIDS) optimisation exploiting a hybrid hardwaresoftware co-designed embedded processing platform. The work describe concentrates on optimisation of a complete SB-NIDS Snort application software on a FPGA based hardware-software target rather than on the implementation of a single functional unit for hardware acceleration. Pattern Matching Hardware Accelerator (PMHA) based on Bloom filter was designed to optimise SB-NIDS performance for execution on a Xilinx MicroBlaze soft-core processor. The Bloom filter approach enables the potentially large number of network intrusion attack patterns to be efficiently represented and searched primarily using accesses to FPGA on-chip memory. The thesis demonstrates, the viability of hybrid hardware-software co-designed approach for SB-NIDS. Future work is required to investigate the effects of later generation FPGA technology and multi-core processors in order to clearly prove the benefits over conventional processor platforms for SB-NIDS. The strengths and weaknesses of the hardware accelerators and algorithms are analysed, and experimental results are examined to determine the effectiveness of the implementation. Experimental results confirm that the PMHA is capable of performing network packet analysis for gigabit rate network traffic. Experimental test results indicate that our SB-NIDS prototype implementation on relatively low clock rate embedded processing platform performance is approximately 1.7 times better than Snort executing on a general purpose processor on PC when comparing processor cycles rather than wall clock time

    Personal genome editing algorithms to identify increased variant-induced off-target potential

    Get PDF
    Clustered regularly interspaced short palindromic repeats (CRISPR) technologies allow for facile genomic modification in a site-specific manner. A key step in this process is the in-silico design of single guide RNAs (sgRNAs) to efficiently and specifically target a site of interest. To this end, it is necessary to enumerate all potential off-target sites within a given genome that could be inadvertently altered by nuclease-mediated cleavage. Off-target sites are quasi-complementary regions of the genome in which the specified sgRNA can bind, even without a perfect complementary nucleotides sequence. This problem is known as off-target sites enumeration and became common after discovery of CRISPR technology. To solve this problem, many in-silico solutions were proposed in the last years but, currently available software for this task are limited by computational efficiency, variant support, genetic annotation, assessment of the functional impact of potential off-target effects at population and individual level, and a user-friendly graphical interface designed to be usable by non-informatician without any programming knowledge. This thesis addresses all these topics by proposing two software to directly answer the off-target enumeration problem and perform all the related analysis. In details, the thesis proposes CRISPRitz, a tool designed and developed to compute fast and exhaustive searches on reference and alternative genome to enumerate all the possible off-target for a user-defined set of sgRNAs with specific thresholds of mismatches (non-complementary bps in RNA-DNA binding) and bulges (bubbles that alters the physical structure of RNA and DNA limiting the binding activity). The thesis also proposes CRISPRme, a tool developed starting from CRISPRitz, which answers the requests of professionals and technicians to implement a comprehensive and easy to use interface to perform off-target enumeration, analysis and assessment, with graphical reports, a graphical interface and the capability of performing real-time query on the resulting data to extract desired targets, with a focus on individual and personalized genome analysis

    Sublinear Computation Paradigm

    Get PDF
    This open access book gives an overview of cutting-edge work on a new paradigm called the “sublinear computation paradigm,” which was proposed in the large multiyear academic research project “Foundations of Innovative Algorithms for Big Data.” That project ran from October 2014 to March 2020, in Japan. To handle the unprecedented explosion of big data sets in research, industry, and other areas of society, there is an urgent need to develop novel methods and approaches for big data analysis. To meet this need, innovative changes in algorithm theory for big data are being pursued. For example, polynomial-time algorithms have thus far been regarded as “fast,” but if a quadratic-time algorithm is applied to a petabyte-scale or larger big data set, problems are encountered in terms of computational resources or running time. To deal with this critical computational and algorithmic bottleneck, linear, sublinear, and constant time algorithms are required. The sublinear computation paradigm is proposed here in order to support innovation in the big data era. A foundation of innovative algorithms has been created by developing computational procedures, data structures, and modelling techniques for big data. The project is organized into three teams that focus on sublinear algorithms, sublinear data structures, and sublinear modelling. The work has provided high-level academic research results of strong computational and algorithmic interest, which are presented in this book. The book consists of five parts: Part I, which consists of a single chapter on the concept of the sublinear computation paradigm; Parts II, III, and IV review results on sublinear algorithms, sublinear data structures, and sublinear modelling, respectively; Part V presents application results. The information presented here will inspire the researchers who work in the field of modern algorithms

    A Modular Approach to Adaptive Reactive Streaming Systems

    Get PDF
    The latest generations of FPGA devices offer large resource counts that provide the headroom to implement large-scale and complex systems. However, there are increasing challenges for the designer, not just because of pure size and complexity, but also in harnessing effectively the flexibility and programmability of the FPGA. A central issue is the need to integrate modules from diverse sources to promote modular design and reuse. Further, the capability to perform dynamic partial reconfiguration (DPR) of FPGA devices means that implemented systems can be made reconfigurable, allowing components to be changed during operation. However, use of DPR typically requires low-level planning of the system implementation, adding to the design challenge. This dissertation presents ReShape: a high-level approach for designing systems by interconnecting modules, which gives a ‘plug and play’ look and feel to the designer, is supported by tools that carry out implementation and verification functions, and is carried through to support system reconfiguration during operation. The emphasis is on the inter-module connections and abstracting the communication patterns that are typical between modules – for example, the streaming of data that is common in many FPGA-based systems, or the reading and writing of data to and from memory modules. ShapeUp is also presented as the static precursor to ReShape. In both, the details of wiring and signaling are hidden from view, via metadata associated with individual modules. ReShape allows system reconfiguration at the module level, by supporting type checking of replacement modules and by managing the overall system implementation, via metadata associated with its FPGA floorplan. The methodology and tools have been implemented in a prototype for a broad domain-specific setting – networking systems – and have been validated on real telecommunications design projects

    Reading as Active Sensing: A Computational Model of Gaze Planning in Word Recognition

    Get PDF
    We offer a computational model of gaze planning during reading that consists of two main components: a lexical representation network, acquiring lexical representations from input texts (a subset of the Italian CHILDES database), and a gaze planner, designed to recognize written words by mapping strings of characters onto lexical representations. The model implements an active sensing strategy that selects which characters of the input string are to be fixated, depending on the predictions dynamically made by the lexical representation network. We analyze the developmental trajectory of the system in performing the word recognition task as a function of both increasing lexical competence, and correspondingly increasing lexical prediction ability. We conclude by discussing how our approach can be scaled up in the context of an active sensing strategy applied to a robotic setting
    corecore