Search CORE

20,802 research outputs found

The Parallelism Motifs of Genomic Data Analysis

Author: Awan Muaaz
Azad Ariful
Brock Benjamin
Buluc Aydin
Egan Rob
Ekanayake Saliya
Ellis Marquita
Georganas Evangelos
Guidi Giulia
Hofmeyr Steven
Oliker Leonid
Selvitopi Oguz
Teodoropol Cristina
Yelick Katherine
Publication venue: 'The Royal Society'
Publication date: 20/01/2020
Field of study

Genomic data sets are growing dramatically as the cost of sequencing continues to decline and small sequencing devices become available. Enormous community databases store and share this data with the research community, but some of these genomic data analysis problems require large scale computational platforms to meet both the memory and computational requirements. These applications differ from scientific simulations that dominate the workload on high end parallel systems today and place different requirements on programming support, software libraries, and parallel architectural design. For example, they involve irregular communication patterns such as asynchronous updates to shared data structures. We consider several problems in high performance genomics analysis, including alignment, profiling, clustering, and assembly for both single genomes and metagenomes. We identify some of the common computational patterns or motifs that help inform parallelization strategies and compare our motifs to some of the established lists, arguing that at least two key patterns, sorting and hashing, are missing

arXiv.org e-Print Archive

eScholarship - University of California

Comprehensive structural classification of ligand binding motifs in proteins

Author: Akira R. Kinjo
Altschul
Andreeva
Bachhawat
Barber
Berman
Berry
Beuth
Brakoulias
Carvalho
Chen
Davies
Diamond
Dias
Du
Dunn
Friedberg
Garcia-Molina
Gold
Goldstein
Gonzalez
Grishin
Grishin
Gross
Guilloteau
Gutteridge
Haruki Nakamura
Herter
Hoff
Ikura
Jonassen
Jones
Kawabata
Kawabata
Kinjo
Kinoshita
Kinoshita
Kobayashi
Kolodny
Krishna
Krishna
Krissinel
Lang
Laronde-Leblanc
Lawler
Lee
Malikayil
Minai
Murzin
Nagano
Orengo
Pattabhi
Polacco
Porter
Ridder
Rognan
Russell
Schubert
Shulman-Peleg
Standley
Stark
Stewart
Stoll
Tari
Tari
Taylor
Wallace
Wangikar
Watts
Westbrook
Whitlow
Wolfson
Xiao
Xie
Publication venue: 'Elsevier BV'
Publication date: 07/10/2008
Field of study

Comprehensive knowledge of protein-ligand interactions should provide a useful basis for annotating protein functions, studying protein evolution, engineering enzymatic activity, and designing drugs. To investigate the diversity and universality of ligand binding sites in protein structures, we conducted the all-against-all atomic-level structural comparison of over 180,000 ligand binding sites found in all the known structures in the Protein Data Bank by using a recently developed database search and alignment algorithm. By applying a hybrid top-down-bottom-up clustering analysis to the comparison results, we determined approximately 3000 well-defined structural motifs of ligand binding sites. Apart from a handful of exceptions, most structural motifs were found to be confined within single families or superfamilies, and to be associated with particular ligands. Furthermore, we analyzed the components of the similarity network and enumerated more than 4000 pairs of ligand binding sites that were shared across different protein folds.Comment: 13 pages, 8 figure

arXiv.org e-Print Archive

Elsevier - Publisher Connector

Crossref

Protein Secondary Structure Prediction Using Cascaded Convolutional and Recurrent Neural Networks

Author: Li Zhen
Yu Yizhou
Publication venue
Publication date: 01/01/2016
Field of study

Protein secondary structure prediction is an important problem in bioinformatics. Inspired by the recent successes of deep neural networks, in this paper, we propose an end-to-end deep network that predicts protein secondary structures from integrated local and global contextual features. Our deep architecture leverages convolutional neural networks with different kernel sizes to extract multiscale local contextual features. In addition, considering long-range dependencies existing in amino acid sequences, we set up a bidirectional neural network consisting of gated recurrent unit to capture global contextual features. Furthermore, multi-task learning is utilized to predict secondary structure labels and amino-acid solvent accessibility simultaneously. Our proposed deep network demonstrates its effectiveness by achieving state-of-the-art performance, i.e., 69.7% Q8 accuracy on the public benchmark CB513, 76.9% Q8 accuracy on CASP10 and 73.1% Q8 accuracy on CASP11. Our model and results are publicly available.Comment: 8 pages, 3 figures, Accepted by International Joint Conferences on Artificial Intelligence (IJCAI

arXiv.org e-Print Archive

HKU Scholars Hub

Coevolved mutations reveal distinct architectures for two core proteins in the bacterial flagellar motor

Author: A Pandini
A Pandini
AC Lowenthal
Alessandro Pandini
AM Waterhouse
Anna Roujeinikova
AS Vartanian
B Ruhnau
BJ Grant
BJ Lowder
CJ Tsai
CM Dyer
D de Juan
D Stock
DL Guzman
DR Livesay
DR Thomas
DR Thomas
DS Bischoff
DT Jones
F Pazos
F Pazos
H Ashkenazy
H Ashkenazy
H Shimodaira
H Sockett
H Szurmant
HC Berg
J Friedman
J Yuan
Jens Kleinjung
JP Armitage
JP Armitage
JS Parkinson
K Paul
K Paul
K Paul
KA Reynolds
KH Lam
KH Lam
L Cavallo
LK Lee
M Punta
MK Sarkar
MN Price
NA Rosenberg
NJ Delalez
P Cluzel
PN Brown
PN Brown
Q Ma
R Saito
RC Edgar
RD Finn
RW Branch
S Chen
S Pronk
SA Lloyd
SD Dunn
Shafqat Rasool
Shahid Khan
SM Van Way
SY Park
SY Park
T Minamino
T Pilizota
TA Duke
VM Irikura
WR Taylor
WR Taylor
X Zhao
Y Tu
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2015
Field of study

Switching of bacterial flagellar rotation is caused by large domain movements of the FliG protein triggered by binding of the signal protein CheY to FliM. FliG and FliM form adjacent multi-subunit arrays within the basal body C-ring. The movements alter the interaction of the FliG C-terminal (FliGC) "torque" helix with the stator complexes. Atomic models based on the Salmonella entrovar C-ring electron microscopy reconstruction have implications for switching, but lack consensus on the relative locations of the FliG armadillo (ARM) domains (amino-terminal (FliGN), middle (FliGM) and FliGC) as well as changes during chemotaxis. The generality of the Salmonella model is challenged by the variation in motor morphology and response between species. We studied coevolved residue mutations to determine the unifying elements of switch architecture. Residue interactions, measured by their coevolution, were formalized as a network, guided by structural data. Our measurements reveal a common design with dedicated switch and motor modules. The FliM middle domain (FliMM) has extensive connectivity most simply explained by conserved intra and inter-subunit contacts. In contrast, FliG has patchy, complex architecture. Conserved structural motifs form interacting nodes in the coevolution network that wire FliMM to the FliGC C-terminal, four-helix motor module (C3-6). FliG C3-6 coevolution is organized around the torque helix, differently from other ARM domains. The nodes form separated, surface-proximal patches that are targeted by deleterious mutations as in other allosteric systems. The dominant node is formed by the EHPQ motif at the FliMMFliGM contact interface and adjacent helix residues at a central location within FliGM. The node interacts with nodes in the N-terminal FliGc α-helix triad (ARM-C) and FliGN. ARM-C, separated from C3-6 by the MFVF motif, has poor intra-network connectivity consistent with its variable orientation revealed by structural data. ARM-C could be the convertor element that provides mechanistic and species diversity.JK was supported by Medical Research Council grant U117581331. SK was supported by seed funds from Lahore University of Managment Sciences (LUMS) and the Molecular Biology Consortium

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Brunel University Research Archive

FigShare

Kernel methods in genomics and computational biology

Author: Vert Jean-Philippe
Publication venue
Publication date: 17/10/2005
Field of study

Support vector machines and kernel methods are increasingly popular in genomics and computational biology, due to their good performance in real-world applications and strong modularity that makes them suitable to a wide range of problems, from the classification of tumors to the automatic annotation of proteins. Their ability to work in high dimension, to process non-vectorial data, and the natural framework they provide to integrate heterogeneous data are particularly relevant to various problems arising in computational biology. In this chapter we survey some of the most prominent applications published so far, highlighting the particular developments in kernel methods triggered by problems in biology, and mention a few promising research directions likely to expand in the future

arXiv.org e-Print Archive

HAL-MINES ParisTech