5,735 research outputs found

    The Parallelism Motifs of Genomic Data Analysis

    Get PDF
    Genomic data sets are growing dramatically as the cost of sequencing continues to decline and small sequencing devices become available. Enormous community databases store and share this data with the research community, but some of these genomic data analysis problems require large scale computational platforms to meet both the memory and computational requirements. These applications differ from scientific simulations that dominate the workload on high end parallel systems today and place different requirements on programming support, software libraries, and parallel architectural design. For example, they involve irregular communication patterns such as asynchronous updates to shared data structures. We consider several problems in high performance genomics analysis, including alignment, profiling, clustering, and assembly for both single genomes and metagenomes. We identify some of the common computational patterns or motifs that help inform parallelization strategies and compare our motifs to some of the established lists, arguing that at least two key patterns, sorting and hashing, are missing

    CRISPRstrand: predicting repeat orientations to determine the crRNA-encoding strand at CRISPR loci

    Get PDF
    Motivation: The discovery of CRISPR-Cas systems almost 20 years ago rapidly changed our perception of the bacterial and archaeal immune systems. CRISPR loci consist of several repetitive DNA sequences called repeats, inter-spaced by stretches of variable length sequences called spacers. This CRISPR array is transcribed and processed into multiple mature RNA species (crRNAs). A single crRNA is integrated into an interference complex, together with CRISPR-associated (Cas) proteins, to bind and degrade invading nucleic acids. Although existing bioinformatics tools can recognize CRISPR loci by their characteristic repeat-spacer architecture, they generally output CRISPR arrays of ambiguous orientation and thus do not determine the strand from which crRNAs are processed. Knowledge of the correct orientation is crucial for many tasks, including the classification of CRISPR conservation, the detection of leader regions, the identification of target sites (protospacers) on invading genetic elements and the characterization of protospacer-adjacent motifs. Results: We present a fast and accurate tool to determine the crRNA-encoding strand at CRISPR loci by predicting the correct orientation of repeats based on an advanced machine learning approach. Both the repeat sequence and mutation information were encoded and processed by an efficient graph kernel to learn higher-order correlations. The model was trained and tested on curated data comprising >4500 CRISPRs and yielded a remarkable performance of 0.95 AUC ROC (area under the curve of the receiver operator characteristic). In addition, we show that accurate orientation information greatly improved detection of conserved repeat sequence families and structure motifs. We integrated CRISPRstrand predictions into our CRISPRmap web server of CRISPR conservation and updated the latter to version 2.0. Availability: CRISPRmap and CRISPRstrand are available at http://rna.informatik.uni-freiburg.de/CRISPRmap. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online

    Selective solid phase extraction of JWH synthetic cannabinoids by using computationally designed peptides

    Get PDF
    The objective of the present work is to demonstrate a rational way to prepare selective sorbents able to extract simultaneously several structural analogs. For this purpose the binding specificity of two hexapeptides computationally designed (VYWLVW and YYIGGF) versus four synthetic cannabinoids Naphthalen-1-yl-(1- pentylindol-3-yl)methanone (JWH 018), naphthalen-1-yl-(1-butylindol-3-yl)methanone (JWH 073), (R)-(1- ((1-methylpiperidin-2-yl)methyl)-1H-indol-3-yl)(naphthalen-1-yl)methanone (AM 1220) and (R)-(+)-[2,3- Dihydro-5-methyl-3-(4-morpholinylmethyl)pyrrolo[1,2,3-de]-1,4-benzoxazin-6-yl]-1-napthalenylmethanone (WIN 55) was computationally studied and then experimentally tested by solid-phase extraction (SPE) clean-up and ultra-high performance liquid chromatography-tandem mass spectrometry (UHPLC-MS/MS) analysis. The two peptides were chosen using a semi combinatorial virtual technique by generating 4 cycles of peptide libraries (around 2.3×104 elements). To select the two peptides, the simulated binding scores between synthetic cannabinoids and peptides was used by maximizing the recognition properties of amino acid motif between the two JWH and the other synthetic cannabinoids. In particular, the peptide YYIGGF, having also affinity for AM 120, was selected as control because it was the only one without tryptophan residues within the best peptides obtained from simulation. Experimentally, the two hexapeptides were tested as SPE sorbent using nanomolar solutions of the four drugs. After optimization of best retentions the binding constants were calculated by loading synthetic cannabinoids solutions at different concentrations. The results indicated a strong interaction between hexapeptide VYWLVW and JWH 018 (15.58 ± 2.03×106 M–1 ), 3-fold and 40-fold larger compared to the analog JWH 073 and both AM 1220 and the WIN 55. Similar trend was observed for the hexapeptide YYIGGF but the binding constants were at least three times lower highlighting the key role of the tryptophan. To demonstrate the hexapeptides specific interaction with only synthetic cannabinoids, a cross-reactivity study was carried out using other drugs (cocaine, morphine, phencyclidine and methamphetamine) in the same SPE condition. Finally the practical utility of these peptide modified sorbent materials was further demonstrated by detecting the synthetic cannabinoids in real samples using hair matrix.Depto. de Química AnalíticaFac. de Ciencias QuímicasTRUEUnión Europea. H2020NBCRpu

    Smart systems related to polypeptide sequences

    Get PDF
    Increasing interest for the application of polypeptide-based smart systems in the biomedical field has developed due to the advantages given by the peptidic sequence. This is due to characteristics of these systems, which include: biocompatibility, potential control of degradation, capability to provide a rich repertoire of biologically specific interactions, feasibility to self-assemble, possibility to combine different functionalities, and capability to give an environmentally responsive behavior. Recently, applications concerning the development of these systems are receiving greater attention since a targeted and programmable release of drugs (e.g. anti-cancer agents) can be achieved. Block copolymers are discussed due to their capability to render differently assembled architectures. Hybrid systems based on silica nanoparticles are also discussed. In both cases, the selected systems must be able to undergo fast changes in properties like solubility, shape, and dissociation or swelling capabilities. This review is structured in different chapters which explain the most recent advances on smart systems depending on the stimuli to which they are sensitive. Amphiphilic block copolymers based on polyanionic or polycationic peptides are, for example, typically employed for obtaining pH-responsive systems. Elastin-like polypeptides are usually used as thermoresponsive polymers, but performance can be increased by using techniques which utilize layer-by-layer electrostatic self-assembly. This approach offers a great potential to create multilayered systems, including nanocapsules, with different functionality. Recent strategies developed to get redox-, magnetic-, ultrasound-, enzyme-, light-and electric-responsive systems are extensively discussed. Finally, some indications concerning the possibilities of multi-responsive systems are discussed.Postprint (published version

    Entropy-scaling search of massive biological data

    Get PDF
    Many datasets exhibit a well-defined structure that can be exploited to design faster search tools, but it is not always clear when such acceleration is possible. Here, we introduce a framework for similarity search based on characterizing a dataset's entropy and fractal dimension. We prove that searching scales in time with metric entropy (number of covering hyperspheres), if the fractal dimension of the dataset is low, and scales in space with the sum of metric entropy and information-theoretic entropy (randomness of the data). Using these ideas, we present accelerated versions of standard tools, with no loss in specificity and little loss in sensitivity, for use in three domains---high-throughput drug screening (Ammolite, 150x speedup), metagenomics (MICA, 3.5x speedup of DIAMOND [3,700x BLASTX]), and protein structure search (esFragBag, 10x speedup of FragBag). Our framework can be used to achieve "compressive omics," and the general theory can be readily applied to data science problems outside of biology.Comment: Including supplement: 41 pages, 6 figures, 4 tables, 1 bo
    • …
    corecore