Search CORE

126,547 research outputs found

PhyloCSF: a comparative genomics method to distinguish protein-coding and non-coding regions

Author: Arvestad
Blanchette
Brent
Butler
Clark
Goldman
Guttman
Guttman
Holmes
I. Jungreis
Kellis
Lin
M. F. Lin
M. Kellis
Ota
Ozsolak
Stark
Whelan
Yang
Publication venue
Publication date: 17/08/2010
Field of study

As high-throughput transcriptome sequencing provides evidence for novel transcripts in many species, there is a renewed need for accurate methods to classify small genomic regions as protein-coding or non-coding. We present PhyloCSF, a novel comparative genomics method that analyzes a multi-species nucleotide sequence alignment to determine whether it is likely to represent a conserved protein-coding region, based on a formal statistical comparison of phylogenetic codon models. We show that PhyloCSF's classification performance in 12-species _Drosophila_ genome alignments exceeds all other methods we compared in a previous study, and we provide a software implementation for use by the community. We anticipate that this method will be widely applicable as the transcriptomes of many additional species, tissues, and subcellular compartments are sequenced, particularly in the context of ENCODE and modENCODE

Pair HMM based gap statistics for re-evaluation of indels in alignments with affine gap penalties: Extended Version

Author: Sahinalp S. Cenk
Salari Raheleh
Schönhuth Alexander
Publication venue
Publication date: 11/06/2010
Field of study

Although computationally aligning sequence is a crucial step in the vast majority of comparative genomics studies our understanding of alignment biases still needs to be improved. To infer true structural or homologous regions computational alignments need further evaluation. It has been shown that the accuracy of aligned positions can drop substantially in particular around gaps. Here we focus on re-evaluation of score-based alignments with affine gap penalty costs. We exploit their relationships with pair hidden Markov models and develop efficient algorithms by which to identify gaps which are significant in terms of length and multiplicity. We evaluate our statistics with respect to the well-established structural alignments from SABmark and find that indel reliability substantially increases with their significance in particular in worst-case twilight zone alignments. This points out that our statistics can reliably complement other methods which mostly focus on the reliability of match positions.Comment: 17 pages, 7 figure

arXiv.org e-Print Archive

CWI's Institutional Repository

In silico comparative genomics analysis of Plasmodium falciparum for the identification of putative essential genes and therapeutic candidates.

Author: Altschul
Andrews
Anishetty
Arama
Berman
Bernstein
Bhasin
Boeckmann
Chang
Chawley
Chen
Colovos
Corpet
David Charles Warhurst
Eisenberg
Finn
Franceschini
Ghosh
Irwin
Irwin
Kanehisa
Konc
Krogh
Kushwaha
Larsen
Laskowski
Ludin
Miller
Morris
Mrutyunjay Suar
Mulder
Pieper
Pierleoni
Pontius
Rajani Kanta Mahapatra
Sali
Subhashree Rout
Sun
Tam
von Mering
Wallner
Wiederstein
World Health Organization
Yang
Yeh
Zhexin
Publication venue: 'Elsevier BV'
Publication date: 05/12/2014
Field of study

A sequence of computational methods was used for predicting novel drug targets against drug resistant malaria parasite Plasmodium falciparum. Comparative genomics, orthologous protein analysis among same and other malaria parasites and protein-protein interaction study provide us new insights into determining the essential genes and novel therapeutic candidates. Among the predicted list of 21 essential proteins from unique pathways, 11 proteins were prioritized as anti-malarial drug targets. As a case study, we built homology models of two uncharacterized proteins using MODELLER v9.13 software from possible templates. Functional annotation of these proteins was done by the InterPro databases and from ProBiS server by comparison of predicted binding site residues. The model has been subjected to in silico docking study with screened potent lead compounds from the ZINC database by Dock Blaster software using AutoDock 4. Results from this study facilitate the selection of proteins and putative inhibitors for entry into drug design production pipelines

Crossref

LSHTM Research Online

The Protein Model Portal

Author: Arnold Konstantin
Battey James
Berman Helen
Bordoli Lorenza
Kiefer Florian
Kopp Jürgen
Podvinec Michael
Schwede Torsten
Westbrook John
Publication venue
Publication date: 18/06/2018
Field of study

Structural Genomics has been successful in determining the structures of many unique proteins in a high throughput manner. Still, the number of known protein sequences is much larger than the number of experimentally solved protein structures. Homology (or comparative) modeling methods make use of experimental protein structures to build models for evolutionary related proteins. Thereby, experimental structure determination efforts and homology modeling complement each other in the exploration of the protein structure space. One of the challenges in using model information effectively has been to access all models available for a specific protein in heterogeneous formats at different sites using various incompatible accession code systems. Often, structure models for hundreds of proteins can be derived from a given experimentally determined structure, using a variety of established methods. This has been done by all of the PSI centers, and by various independent modeling groups. The goal of the Protein Model Portal (PMP) is to provide a single portal which gives access to the various models that can be leveraged from PSI targets and other experimental protein structures. A single interface allows all existing pre-computed models across these various sites to be queried simultaneously, and provides links to interactive services for template selection, target-template alignment, model building, and quality assessment. The current release of the portal consists of 7.6million model structures provided by different partner resources (CSMP, JCSG, MCSG, NESG, NYSGXRC, JCMM, ModBase, SWISS-MODEL Repository). The PMP is available at http://www.proteinmodelportal.org and from the PSI Structural Genomics Knowledgebas

RERO DOC Digital Library

Faster than Neutral Evolution of Constrained Sequences: The Complex Interplay of Mutational Biases and Weak Selection

Author: Berglund
Birney
Bustamante
Bustamante
Comeron
Cooper
David S. Lawrie
Davydov
Dmitri A. Petrov
Doniger
Durbin
Duret
Duret
Duret
Eory
Eyre-Walker
Felsenstein
Galtier
Goode
Haag-Liautard
Halpern
Hardison
Hasegawa
Hershberg
Hildebrand
Jukes
Keightley
Kimura
Lio
Lipatov
Lu
Lynch
Margulies
McVean
Messer
Montooth
Montooth
Moses
Nagylaki
Ohta
Ossowski
Pheasant
Philipp W. Messer
Pollard
Pollard
Pollard
Rodrigue
Siepel
Waterston
Yang
Publication venue: Oxford University Press
Publication date
Field of study

Comparative genomics has become widely accepted as the major framework for the ascertainment of functionally important regions in genomes. The underlying paradigm of this approach is that most of the functional regions are assumed to be under selective constraint, which in turn reduces the rate of evolution relative to neutrality. This assumption allows detection of functional regions through sequence conservation. However, constraint does not always lead to sequence conservation. When purifying selection is weak and mutation is biased, constrained regions can even evolve faster than neutral sequences and thus can appear to be under positive selection. Moreover, conservation estimates depend also on the orientation of selection relative to mutational biases and can vary over time. In the light of recent data of the ubiquity of mutational biases and weak selective forces, these effects should reduce the power of conservation analyses to define functional regions using comparative genomics data. We argue that the estimation of true mutational biases and the use of explicit evolutionary models are essential to improve methods inferring the action of natural selection and functionality in genome sequences

Crossref

PubMed Central

The Protein Model Portal

Author: A Bairoch
A Hillisch
A Tramontano
AM Jenkinson
AR Ortiz
C Yeats
D Baker
D Chivian
EA Merritt
Florian Kiefer
H Berman
H Berman
H Huang
Helen M. Berman
HM Berman
HM Berman
J Kopp
J Kopp
J Kopp
James N. D. Battey
JN Battey
John D. Westbrook
Jürgen Kopp
Konstantin Arnold
Lorenza Bordoli
MC Peitsch
Michael Podvinec
MJ Hartshorn
N Mirkovic
PJ Kraulis
RD Finn
S Yooseph
SF Altschul
T Schwede
T Schwede
Torsten Schwede
U Pieper
Y Zhang
Publication venue: Springer Netherlands
Publication date: 01/01/2008
Field of study

Structural Genomics has been successful in determining the structures of many unique proteins in a high throughput manner. Still, the number of known protein sequences is much larger than the number of experimentally solved protein structures. Homology (or comparative) modeling methods make use of experimental protein structures to build models for evolutionary related proteins. Thereby, experimental structure determination efforts and homology modeling complement each other in the exploration of the protein structure space. One of the challenges in using model information effectively has been to access all models available for a specific protein in heterogeneous formats at different sites using various incompatible accession code systems. Often, structure models for hundreds of proteins can be derived from a given experimentally determined structure, using a variety of established methods. This has been done by all of the PSI centers, and by various independent modeling groups. The goal of the Protein Model Portal (PMP) is to provide a single portal which gives access to the various models that can be leveraged from PSI targets and other experimental protein structures. A single interface allows all existing pre-computed models across these various sites to be queried simultaneously, and provides links to interactive services for template selection, target-template alignment, model building, and quality assessment. The current release of the portal consists of 7.6 million model structures provided by different partner resources (CSMP, JCSG, MCSG, NESG, NYSGXRC, JCMM, ModBase, SWISS-MODEL Repository). The PMP is available at http://www.proteinmodelportal.org and from the PSI Structural Genomics Knowledgebase

Crossref

Springer - Publisher Connector

edoc

PubMed Central

Reranking candidate gene models with cross-species comparison for improved gene prediction

Author: Crammer Koby
Liu Qian
Pereira Fernando CN
Roos David S
Publication venue: ScholarlyCommons
Publication date: 14/10/2008
Field of study

Background: Most gene finders score candidate gene models with state-based methods, typically HMMs, by combining local properties (coding potential, splice donor and acceptor patterns, etc). Competing models with similar state-based scores may be distinguishable with additional information. In particular, functional and comparative genomics datasets may help to select among competing models of comparable probability by exploiting features likely to be associated with the correct gene models, such as conserved exon/intron structure or protein sequence features. Results: We have investigated the utility of a simple post-processing step for selecting among a set of alternative gene models, using global scoring rules to rerank competing models for more accurate prediction. For each gene locus, we first generate the K best candidate gene models using the gene finder Evigan, and then rerank these models using comparisons with putative orthologous genes from closely-related species. Candidate gene models with lower scores in the original gene finder may be selected if they exhibit strong similarity to probable orthologs in coding sequence, splice site location, or signal peptide occurrence. Experiments on Drosophila melanogaster demonstrate that reranking based on cross-species comparison outperforms the best gene models identified by Evigan alone, and also outperforms the comparative gene finders GeneWise and Augustus+. Conclusion: Reranking gene models with cross-species comparison improves gene prediction accuracy. This straightforward method can be readily adapted to incorporate additional lines of evidence, as it requires only a ranked source of candidate gene models

ScholarlyCommons@Penn

MODBASE, a database of annotated comparative protein structure models and associated resources.

Author: Barkan David T
Carter Hannah
Davis Fred P
Eramian David
Eswar Narayanan
Karchin Rachel
Kelly Libusha
Mankoo Parminder
Marti-Renom Marc A
Pieper Ursula
Sali Andrej
Webb Ben M
Publication venue: eScholarship, University of California
Publication date: 23/10/2008
Field of study

MODBASE (http://salilab.org/modbase) is a database of annotated comparative protein structure models. The models are calculated by MODPIPE, an automated modeling pipeline that relies primarily on MODELLER for fold assignment, sequence-structure alignment, model building and model assessment (http:/salilab.org/modeller). MODBASE currently contains 5,152,695 reliable models for domains in 1,593,209 unique protein sequences; only models based on statistically significant alignments and/or models assessed to have the correct fold are included. MODBASE also allows users to calculate comparative models on demand, through an interface to the MODWEB modeling server (http://salilab.org/modweb). Other resources integrated with MODBASE include databases of multiple protein structure alignments (DBAli), structurally defined ligand binding sites (LIGBASE), predicted ligand binding sites (AnnoLyze), structurally defined binary domain interfaces (PIBASE) and annotated single nucleotide polymorphisms and somatic mutations found in human proteins (LS-SNP, LS-Mut). MODBASE models are also available through the Protein Model Portal (http://www.proteinmodelportal.org/)

PubMed Central

eScholarship - University of California

Comparative Genomic Characterization of the Multimammate Mouse Mastomys coucha.

Author: Aaron Hardin
Andersen
Bao
Besemer
Bickhart
Blanchette
Bolger
Bonwitt
Booker
Bourque
Camacho
Capra
Chapman
Chen
Chikhi
Chu
Colangelo
Dewey
Dierckxsens
Eblaghie
Grabherr
Hayssen
Heinz
Helfrich
Holloway
Holt
Jiang
Kannan
Kiełbasa
Kim
Kimberly A Nevonen
Kolmogorov
Korf
Krueger
Lecompte
Li
Li
Lok
Lowe
Lucia Carbone
MacManes
McLean
Modlin
Nadav Ahituv
Nagy
Nilsson
Närhi
Pennacchio
Pertea
Pimentel
Pollard
Sands
Schep
Scott
Siepel
Siepel
Simão
Smit
Snell
Song
Stanke
UniProt Consortium
Van der Auwera
Veltmaat
Veltmaat
Walter L Eckalbar
Publication venue: eScholarship, University of California
Publication date: 01/12/2019
Field of study

Mastomys are the most widespread African rodent and carriers of various diseases such as the plague or Lassa virus. In addition, mastomys have rapidly gained a large number of mammary glands. Here, we generated a genome, variome, and transcriptomes for Mastomys coucha. As mastomys diverged at similar times from mouse and rat, we demonstrate their utility as a comparative genomic tool for these commonly used animal models. Furthermore, we identified over 500 mastomys accelerated regions, often residing near important mammary developmental genes or within their exons leading to protein sequence changes. Functional characterization of a noncoding mastomys accelerated region, located in the HoxD locus, showed enhancer activity in mouse developing mammary glands. Combined, our results provide genomic resources for mastomys and highlight their potential both as a comparative genomic tool and for the identification of mammary gland number determining factors

Crossref

eScholarship - University of California