1,743 research outputs found

    Ab initio RNA folding

    Full text link
    RNA molecules are essential cellular machines performing a wide variety of functions for which a specific three-dimensional structure is required. Over the last several years, experimental determination of RNA structures through X-ray crystallography and NMR seems to have reached a plateau in the number of structures resolved each year, but as more and more RNA sequences are being discovered, need for structure prediction tools to complement experimental data is strong. Theoretical approaches to RNA folding have been developed since the late nineties when the first algorithms for secondary structure prediction appeared. Over the last 10 years a number of prediction methods for 3D structures have been developed, first based on bioinformatics and data-mining, and more recently based on a coarse-grained physical representation of the systems. In this review we are going to present the challenges of RNA structure prediction and the main ideas behind bioinformatic approaches and physics-based approaches. We will focus on the description of the more recent physics-based phenomenological models and on how they are built to include the specificity of the interactions of RNA bases, whose role is critical in folding. Through examples from different models, we will point out the strengths of physics-based approaches, which are able not only to predict equilibrium structures, but also to investigate dynamical and thermodynamical behavior, and the open challenges to include more key interactions ruling RNA folding.Comment: 28 pages, 18 figure

    Fast search of third-order epistatic interactions on CPU and GPU clusters

    Get PDF
    [Abstract] Genome-Wide Association Studies (GWASs), analyses that try to find a link between a given phenotype (such as a disease) and genetic markers, have been growing in popularity in the recent years. Relations between phenotypes and genotypes are not easy to identify, as most of the phenotypes are a product of the interaction between multiple genes, a phenomenon known as epistasis. Many authors have resorted to different approaches and hardware architectures in order to mitigate the exponential time complexity of the problem. However, these studies make some compromises in order to keep a reasonable execution time, such as limiting the number of genetic markers involved in the interaction, or discarding some of these markers in an initial filtering stage. This work presents MPI3SNP, a tool that implements a three-way exhaustive search for cluster architectures with the aim of mitigating the exponential growth of the run-time. Modern cluster solutions usually incorporate GPUs. Thus, MPI3SNP includes implementations for both multi-CPU and multi-GPU clusters. To contextualize the performance achieved, MPI3SNP is able to analyze an input of 6300 genetic markers and 3200 samples in less than 6 min using 768 CPU cores or 4 min using 8 NVIDIA K80 GPUs. The source code is available at https://github.com/chponte/mpi3snp.Ministerio de EconomĂ­a y Competitividad and FEDER; TIN2016-75845-PXunta de Galicia and FEDER funds; ED431G/01Consolidation Program of Competitive Research; ED431C 2017/04Ministerio de EducaciĂłn; FPU16/0133

    Decoding genomic information

    Get PDF
    Our work here outlines and follows some trends of research which analyze and interpret (i.e., decode) genomic information, by assuming the genome to be a book encrypted in an unknown language. This analysis is performed by sequence alignment-free methods, based on information theoretical concepts, in order to convert the genomic information into a comprehensible mathematical form and understand its complexity

    Biodiversity assessment of marine benthic communities with COI metabarcoding: methods and applications

    Full text link
    [eng] Ecosystem biomonitoring is crucial for proper management of natural communities during the Anthropocene era. With the advent of new sequencing technologies, DNA metabarcoding has been proposed as a game-changing tool for biomonitoring. In this Thesis we plead for the use of metabarcoding of a highly variable marker to infer not only the interspecies but also the intraspecies variability to assess both biogeographic, at the species level, and metaphylogeographic patterns, at the haplotype level. We focused on highly complex hard-substratum benthic littoral communities. The term "Metaphylogeography", coined in this Thesis, refers to the study of phylogeographic patterns of many species at the same time using metabarcoding data. However, as of the start of this Thesis, only a few studies had tested the metabarcoding method to directly characterize the whole eukaryotic community in highly diverse benthic ecosystems. This required to set up and calibrate methods for these communities as a prior step. We first evaluated both the sampling methods and the bioinformatic pipelines. We assessed the viability of detecting the environmental DNA released from the benthic community into the adjacent water layer using metabarcoding of COI with highly degenerated primers targeting the whole eukaryotic community. We sampled water from 0 to 20m from shallow rocky benthic communities and compared the DNA signal with the results obtained from metabarcoding directly the benthic communities by traditional quadrat sampling. We also designed a pipeline combining clustering and denoising methods to treat metabarcoding data of COI. We considered the entropy of each codon position of this coding fragment both to improve the detection of spurious sequences and to calibrate the best performing parameters of the software used. In addition, we created our own denoising program, DnoisE, to incorporate information on the codon position. This new code and parameter calibration were required as the commonly used bioinformatic pipelines had been designed and tested mostly for less variable ribosomal fragments and, particularly, in prokaryotes. Results showed that the DNA signal from the benthos decreased with the distance but was too weak for a correct assessment of benthic biodiversity. The proportion of eukaryotic DNA sequenced was also very low in water samples due to the amplification of prokaryotic DNA. We thus concluded that the benthos must be sampled directly to properly assess its biodiversity composition. The new bioinformatic developments allowed us to propose new methods for processing metabarcoding reads, combining clustering and denoising steps, and to set optimal values for the parameters used at each step. These contributions effectively expanded the field to the novel analysis of inter- and intraspecies genetic variability with metabarcoding data. Finally, we applied this methodology to 12 localities of the Western Iberian Coast along two well studied fronts, the Almeria-Oran Front (AOF) and the Ibiza Channel (IC). We analysed the species and haplotypes using the COI barcode. From a biogeographical perspective, the AOF had a strong effect in separating regions, while IC effect was less marked, but still half of the MOTUs were found in only one side of this divide. For the metaphylogeographic analysis, only 10% of the MOTUs could be used. However, they showed a good separation between populations of the three regions with a strong effect of the AOF break. The IC, on the other hand, seemed to be more a transitional zone than a fixed break. This Thesis laid the ground for the efficient use of metabarcoding in the biomonitoring of benthic reef habitats, allowing community composition, β-diversity, and biogeographic patterns to be analysed in a fast, repeatable, and cost-efficient way. We also developed the metaphylogeography approach as a new tool to assess population genetic structure at the community-wide level

    MOLECULAR DIET ANALYSES OF NORTH AMERICAN BATS

    Get PDF
    A food web is a model of the feeding relationships among organisms in an environment. The fidelity of this model is limited principally by the ability to detect these interactions. Researchers who study cryptic interactions such as nocturnal insectivory in bats typically rely on fecal samples to identify trophic connections. Historically these diet analyses were limited to morphological inspection of arthropod fragments, however modern metabarcoding techniques have improved the richness and specificity of consumed prey: rather than bats foraging for a few arthropod orders, we observe hundreds of species among guano samples. Animal metabarcoding is not without bias; nevertheless, a decade of improvements upon such biases have focused largely on molecular portions while bioinformatic considerations remain unresolved. When researchers use distinct software to perform their analyses—tools that have not yet been compared in animal metabarcoding studies—it is unclear if distinct perspectives between two experiments represent meaningful biological differences, or if they arise because of the alternative programs and parameters deployed. We investigated three fundamental bioinformatic tasks that impact a metabarcoding experiment: sequence processing, database construction, and classification (Chapter I). These comparisons offer guidance regarding which steps are most sensitive to parameterization and are therefore in need of optimizing for individual experiments, as well as highlight areas that are in need of critical improvement. We applied these bioinformatic lessons to a molecular diet analysis of Indiana bats, the first ever for this endangered species (Chapter II). While management decisions currently focus on protecting roosting habitat, our molecular analyses provide evidence that site-specific data is needed to more effectively inform conservation practices. For example, while these bats forage a broad swath of the arthropod community, the molecular data suggests they rely on particular aquatic habitats that are not currently protected. Finally, we investigated the diets of New Hampshire bats by collaborating with citizen scientist volunteers throughout the state to perform an extensive sampling regime in that spanned 20 locations over 2015 and 2016, and sequenced more than 900 guano samples (Chapter III). Molecular analysis of these data suggested these bats are foraging hundreds of arthropod species, including some turf and forest pests, demonstrating that our local bats provide ecosystem services. Individual diets varied across season and site, providing evidence of highly flexible and local foraging behaviors

    Computational strategies for dissecting the high-dimensional complexity of adaptive immune repertoires

    Full text link
    The adaptive immune system recognizes antigens via an immense array of antigen-binding antibodies and T-cell receptors, the immune repertoire. The interrogation of immune repertoires is of high relevance for understanding the adaptive immune response in disease and infection (e.g., autoimmunity, cancer, HIV). Adaptive immune receptor repertoire sequencing (AIRR-seq) has driven the quantitative and molecular-level profiling of immune repertoires thereby revealing the high-dimensional complexity of the immune receptor sequence landscape. Several methods for the computational and statistical analysis of large-scale AIRR-seq data have been developed to resolve immune repertoire complexity in order to understand the dynamics of adaptive immunity. Here, we review the current research on (i) diversity, (ii) clustering and network, (iii) phylogenetic and (iv) machine learning methods applied to dissect, quantify and compare the architecture, evolution, and specificity of immune repertoires. We summarize outstanding questions in computational immunology and propose future directions for systems immunology towards coupling AIRR-seq with the computational discovery of immunotherapeutics, vaccines, and immunodiagnostics.Comment: 27 pages, 2 figure
    • …
    corecore