Search CORE

5,882 research outputs found

The Parallelism Motifs of Genomic Data Analysis

Author: Awan Muaaz
Azad Ariful
Brock Benjamin
Buluc Aydin
Egan Rob
Ekanayake Saliya
Ellis Marquita
Georganas Evangelos
Guidi Giulia
Hofmeyr Steven
Oliker Leonid
Selvitopi Oguz
Teodoropol Cristina
Yelick Katherine
Publication venue: 'The Royal Society'
Publication date: 20/01/2020
Field of study

Genomic data sets are growing dramatically as the cost of sequencing continues to decline and small sequencing devices become available. Enormous community databases store and share this data with the research community, but some of these genomic data analysis problems require large scale computational platforms to meet both the memory and computational requirements. These applications differ from scientific simulations that dominate the workload on high end parallel systems today and place different requirements on programming support, software libraries, and parallel architectural design. For example, they involve irregular communication patterns such as asynchronous updates to shared data structures. We consider several problems in high performance genomics analysis, including alignment, profiling, clustering, and assembly for both single genomes and metagenomes. We identify some of the common computational patterns or motifs that help inform parallelization strategies and compare our motifs to some of the established lists, arguing that at least two key patterns, sorting and hashing, are missing

arXiv.org e-Print Archive

eScholarship - University of California

CRISPRstrand: predicting repeat orientations to determine the crRNA-encoding strand at CRISPR loci

Author: Alkhnbashi Omer S.
Backofen Rolf
Costa Fabrizio
Garrett Roger Antony
Saunders Sita J.
Shah Shiraz Ali
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2014
Field of study

Motivation: The discovery of CRISPR-Cas systems almost 20 years ago rapidly changed our perception of the bacterial and archaeal immune systems. CRISPR loci consist of several repetitive DNA sequences called repeats, inter-spaced by stretches of variable length sequences called spacers. This CRISPR array is transcribed and processed into multiple mature RNA species (crRNAs). A single crRNA is integrated into an interference complex, together with CRISPR-associated (Cas) proteins, to bind and degrade invading nucleic acids. Although existing bioinformatics tools can recognize CRISPR loci by their characteristic repeat-spacer architecture, they generally output CRISPR arrays of ambiguous orientation and thus do not determine the strand from which crRNAs are processed. Knowledge of the correct orientation is crucial for many tasks, including the classification of CRISPR conservation, the detection of leader regions, the identification of target sites (protospacers) on invading genetic elements and the characterization of protospacer-adjacent motifs. Results: We present a fast and accurate tool to determine the crRNA-encoding strand at CRISPR loci by predicting the correct orientation of repeats based on an advanced machine learning approach. Both the repeat sequence and mutation information were encoded and processed by an efficient graph kernel to learn higher-order correlations. The model was trained and tested on curated data comprising >4500 CRISPRs and yielded a remarkable performance of 0.95 AUC ROC (area under the curve of the receiver operator characteristic). In addition, we show that accurate orientation information greatly improved detection of conserved repeat sequence families and structure motifs. We integrated CRISPRstrand predictions into our CRISPRmap web server of CRISPR conservation and updated the latter to version 2.0. Availability: CRISPRmap and CRISPRstrand are available at http://rna.informatik.uni-freiburg.de/CRISPRmap. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online

Copenhagen University Research Information System

PubMed Central

Selective solid phase extraction of JWH synthetic cannabinoids by using computationally designed peptides

Author: Compagnone Dario
Mascini Marcello
Montesano Camilla
Perez German
Sergi Manuel
Wang Joseph
Publication venue: Elsevier
Publication date: 01/01/2017
Field of study

The objective of the present work is to demonstrate a rational way to prepare selective sorbents able to extract simultaneously several structural analogs. For this purpose the binding specificity of two hexapeptides computationally designed (VYWLVW and YYIGGF) versus four synthetic cannabinoids Naphthalen-1-yl-(1- pentylindol-3-yl)methanone (JWH 018), naphthalen-1-yl-(1-butylindol-3-yl)methanone (JWH 073), (R)-(1- ((1-methylpiperidin-2-yl)methyl)-1H-indol-3-yl)(naphthalen-1-yl)methanone (AM 1220) and (R)-(+)-[2,3- Dihydro-5-methyl-3-(4-morpholinylmethyl)pyrrolo[1,2,3-de]-1,4-benzoxazin-6-yl]-1-napthalenylmethanone (WIN 55) was computationally studied and then experimentally tested by solid-phase extraction (SPE) clean-up and ultra-high performance liquid chromatography-tandem mass spectrometry (UHPLC-MS/MS) analysis. The two peptides were chosen using a semi combinatorial virtual technique by generating 4 cycles of peptide libraries (around 2.3×104 elements). To select the two peptides, the simulated binding scores between synthetic cannabinoids and peptides was used by maximizing the recognition properties of amino acid motif between the two JWH and the other synthetic cannabinoids. In particular, the peptide YYIGGF, having also affinity for AM 120, was selected as control because it was the only one without tryptophan residues within the best peptides obtained from simulation. Experimentally, the two hexapeptides were tested as SPE sorbent using nanomolar solutions of the four drugs. After optimization of best retentions the binding constants were calculated by loading synthetic cannabinoids solutions at different concentrations. The results indicated a strong interaction between hexapeptide VYWLVW and JWH 018 (15.58 ± 2.03×106 M–1 ), 3-fold and 40-fold larger compared to the analog JWH 073 and both AM 1220 and the WIN 55. Similar trend was observed for the hexapeptide YYIGGF but the binding constants were at least three times lower highlighting the key role of the tryptophan. To demonstrate the hexapeptides specific interaction with only synthetic cannabinoids, a cross-reactivity study was carried out using other drugs (cocaine, morphine, phencyclidine and methamphetamine) in the same SPE condition. Finally the practical utility of these peptide modified sorbent materials was further demonstrated by detecting the synthetic cannabinoids in real samples using hair matrix.Depto. de Química AnalíticaFac. de Ciencias QuímicasTRUEUnión Europea. H2020NBCRpu

Docta Complutense

Crossref

Archivio della Ricerca - Università degli Studi di Teramo

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Archivio della ricerca- Università di Roma La Sapienza

Smart systems related to polypeptide sequences

Author: Franco García María Lourdes
Puiggalí Bellalta Jordi
Valle Mendoza Luis Javier del
Publication venue: 'American Institute of Mathematical Sciences (AIMS)'
Publication date: 01/01/2016
Field of study

Increasing interest for the application of polypeptide-based smart systems in the biomedical field has developed due to the advantages given by the peptidic sequence. This is due to characteristics of these systems, which include: biocompatibility, potential control of degradation, capability to provide a rich repertoire of biologically specific interactions, feasibility to self-assemble, possibility to combine different functionalities, and capability to give an environmentally responsive behavior. Recently, applications concerning the development of these systems are receiving greater attention since a targeted and programmable release of drugs (e.g. anti-cancer agents) can be achieved. Block copolymers are discussed due to their capability to render differently assembled architectures. Hybrid systems based on silica nanoparticles are also discussed. In both cases, the selected systems must be able to undergo fast changes in properties like solubility, shape, and dissociation or swelling capabilities. This review is structured in different chapters which explain the most recent advances on smart systems depending on the stimuli to which they are sensitive. Amphiphilic block copolymers based on polyanionic or polycationic peptides are, for example, typically employed for obtaining pH-responsive systems. Elastin-like polypeptides are usually used as thermoresponsive polymers, but performance can be increased by using techniques which utilize layer-by-layer electrostatic self-assembly. This approach offers a great potential to create multilayered systems, including nanocapsules, with different functionality. Recent strategies developed to get redox-, magnetic-, ultrasound-, enzyme-, light-and electric-responsive systems are extensively discussed. Finally, some indications concerning the possibilities of multi-responsive systems are discussed.Postprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Entropy-scaling search of massive biological data

Author: Berger Bonnie
Daniels Noah M.
Danko David Christian
Yu Y. William
Publication venue: 'Elsevier BV'
Publication date: 01/06/2015
Field of study

Many datasets exhibit a well-defined structure that can be exploited to design faster search tools, but it is not always clear when such acceleration is possible. Here, we introduce a framework for similarity search based on characterizing a dataset's entropy and fractal dimension. We prove that searching scales in time with metric entropy (number of covering hyperspheres), if the fractal dimension of the dataset is low, and scales in space with the sum of metric entropy and information-theoretic entropy (randomness of the data). Using these ideas, we present accelerated versions of standard tools, with no loss in specificity and little loss in sensitivity, for use in three domains---high-throughput drug screening (Ammolite, 150x speedup), metagenomics (MICA, 3.5x speedup of DIAMOND [3,700x BLASTX]), and protein structure search (esFragBag, 10x speedup of FragBag). Our framework can be used to achieve "compressive omics," and the general theory can be readily applied to data science problems outside of biology.Comment: Including supplement: 41 pages, 6 figures, 4 tables, 1 bo

arXiv.org e-Print Archive

Elsevier - Publisher Connector

DSpace@MIT

Crossref

PubMed Central