Search CORE

164 research outputs found

Users Guide for SnadiOpt: A Package Adding Automatic Differentiation to Snopt

Author: Gertz E. Michael
Gill Philip E.
Muetherig Julia
Publication venue
Publication date: 21/06/2001
Field of study

SnadiOpt is a package that supports the use of the automatic differentiation package ADIFOR with the optimization package Snopt. Snopt is a general-purpose system for solving optimization problems with many variables and constraints. It minimizes a linear or nonlinear function subject to bounds on the variables and sparse linear or nonlinear constraints. It is suitable for large-scale linear and quadratic programming and for linearly constrained optimization, as well as for general nonlinear programs. The method used by Snopt requires the first derivatives of the objective and constraint functions to be available. The SnadiOpt package allows users to avoid the time-consuming and error-prone process of evaluating and coding these derivatives. Given Fortran code for evaluating only the values of the objective and constraints, SnadiOpt automatically generates the code for evaluating the derivatives and builds the relevant Snopt input files and sparse data structures.Comment: pages i-iv, 1-2

arXiv.org e-Print Archive

UNT Digital Library

Evaluating annotations of an Agilent expression chip suggests that many features cannot be interpreted

Author: Difilippantonio Michael J
Gertz E Michael
Ried Thomas
Schäffer Alejandro A
Sengupta Kundan
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background While attempting to reanalyze published data from Agilent 4 × 44 human expression chips, we found that some of the 60-mer olignucleotide features could not be interpreted as representing single human genes. For example, some of the oligonucleotides align with the transcripts of more than one gene. We decided to check the annotations for all autosomes and the X chromosome systematically using bioinformatics methods. Results Out of 42683 reporters, we found that 25505 (60%) passed all our tests and are considered "fully valid". 9964 (23%) reporters did not have a meaningful identifier, mapped to the wrong chromosome, or did not pass basic alignment tests preventing us from correlating the expression values of these reporters with a unique annotated human gene. The remaining 7214 (17%) reporters could be associated with either a unique gene or a unique intergenic location, but could not be mapped to a transcript in RefSeq. The 7214 reporters are further partitioned into three different levels of validity. Conclusion Expression array studies should evaluate the annotations of reporters and remove those reporters that have suspect annotations. This evaluation can be done systematically and semi-automatically, but one must recognize that data sources are frequently updated leading to slightly changing validation results over time.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Genome-wide changes in protein translation efficiency are associated with autism

Author: Baranov Pavel V.
Gertz E. Michael
Poliakov Eugenia
Rogozin Igor B.
Schaffer Alejandro A.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2018
Field of study

We previously proposed that changes in the efficiency of protein translation are associated with autism spectrum disorders (ASDs). This hypothesis connects environmental factors and genetic factors because each can alter translation efficiency. For genetic factors, we previously tested our hypothesis using a small set of ASD-associated genes, a small set of ASD-associated variants, and a statistic to quantify by how much a single nucleotide variant (SNV) in a protein coding region changes translation speed. In this study, we confirm and extend our hypothesis using a published set of 1,800 autism quartets (parents, one affected child and one unaffected child) and genome-wide variants. Then, we extend the test statistic to combine translation efficiency with other possibly relevant variables: ribosome profiling data, presence/absence of CpG dinucleotides, and phylogenetic conservation. The inclusion of ribosome profiling abundances strengthens our results for male–male sibling pairs. The inclusion of CpG information strengthens our results for female–female pairs, giving an insight into the significant gender differences in autism incidence. By combining the single-variant test statistic for all variants in a gene, we obtain a single gene score to evaluate how well a gene distinguishes between affected and unaffected siblings. Using statistical methods, we compute gene sets that have some power to distinguish between affected and unaffected siblings by translation efficiency of gene variants. Pathway and enrichment analysis of those gene sets suggest the importance of Wnt signaling pathways, some other pathways related to cancer, ATP binding, and ATP-ase pathways in the etiology of ASDs

Irish Universities

Cork Open Research Archive

Promoter-distal RNA polymerase II binding discriminates active from inactive CCAAT/ enhancer-binding protein beta binding sites

Author: Carleton Julia B
Cohen Barak A
Cooper Gregory M
Gertz Jason
Myers Richard M
Partridge E. Christopher
Roberts Brian S
Savic Daniel
White Michael A
Publication venue: Digital Commons@Becker
Publication date: 01/01/2015
Field of study

Transcription factors (TFs) bind to thousands of DNA sequences in mammalian genomes, but most of these binding events appear to have no direct effect on gene expression. It is unclear why only a subset of TF bound sites are actively involved in transcriptional regulation. Moreover, the key genomic features that accurately discriminate between active and inactive TF binding events remain ambiguous. Recent studies have identified promoter-distal RNA polymerase II (RNAP2) binding at enhancer elements, suggesting that these interactions may serve as a marker for active regulatory sequences. Despite these correlative analyses, a thorough functional validation of these genomic co-occupancies is still lacking. To characterize the gene regulatory activity of DNA sequences underlying promoter-distal TF binding events that co-occur with RNAP2 and TF sites devoid of RNAP2 occupancy using a functional reporter assay, we performed cis-regulatory element sequencing (CRE-seq). We tested more than 1000 promoter-distal CCAAT/enhancer-binding protein beta (CEBPB)-bound sites in HepG2 and K562 cells, and found that CEBPB-bound sites co-occurring with RNAP2 were more likely to exhibit enhancer activity. CEBPB-bound sites further maintained substantial cell-type specificity, indicating that local DNA sequence can accurately convey cell-type–specific regulatory information. By comparing our CRE-seq results to a comprehensive set of genome annotations, we identified a variety of genomic features that are strong predictors of regulatory element activity and cell-type–specific activity. Collectively, our functional assay results indicate that RNAP2 occupancy can be used as a key genomic marker that can distinguish active from inactive TF bound sites

Crossref

Digital Commons@Becker

PubMed Central

Composition-based statistics and translated nucleotide searches: Improving the TBLASTN module of BLAST

Author: AA Schäffer
AL Delcher
Alejandro A Schäffer
B Brejová
B Hao
BG Barrell
DJ States
E Birney
E Birney
E Boy-Marcotte
E Boy-Marcotte
E Halperin
E Michael Gertz
EM Gertz
F Damak
F Zinoni
G Macino
H Peltola
IG Young
J Hein
J Hein
JC Wootton
L Knecht
M Gribskov
MS Boguski
MS Boguski
MS Gelfand
O Gotoh
P Steneberg
P Steneberg
R Durbin
Richa Agarwala
S Henikoff
S Kurtz
SA Chervitz
SC Low
SF Altschul
SF Altschul
SF Altschul
SF Altschul
Stephen F Altschul
TF Smith
W Gish
WJ Kent
WR Pearson
WR Pearson
WR Pearson
X Guan
X Huang
Yi-Kuo Yu
YK Yu
YK Yu
Z Zhang
Z Zhang
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: TBLASTN is a mode of operation for BLAST that aligns protein sequences to a nucleotide database translated in all six frames. We present the first description of the modern implementation of TBLASTN, focusing on new techniques that were used to implement composition-based statistics for translated nucleotide searches. Composition-based statistics use the composition of the sequences being aligned to generate more accurate E-values, which allows for a more accurate distinction between true and false matches. Until recently, composition-based statistics were available only for protein-protein searches. They are now available as a command line option for recent versions of TBLASTN and as an option for TBLASTN on the NCBI BLAST web server. RESULTS: We evaluate the statistical and retrieval accuracy of the E-values reported by a baseline version of TBLASTN and by two variants that use different types of composition-based statistics. To test the statistical accuracy of TBLASTN, we ran 1000 searches using scrambled proteins from the mouse genome and a database of human chromosomes. To test retrieval accuracy, we modernize and adapt to translated searches a test set previously used to evaluate the retrieval accuracy of protein-protein searches. We show that composition-based statistics greatly improve the statistical accuracy of TBLASTN, at a small cost to the retrieval accuracy. CONCLUSION: TBLASTN is widely used, as it is common to wish to compare proteins to chromosomes or to libraries of mRNAs. Composition-based statistics improve the statistical accuracy, and therefore the reliability, of TBLASTN results. The algorithms used by TBLASTN are not widely known, and some of the most important are reported here. The data used to test TBLASTN are available for download and may be useful in other studies of translated search algorithms

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Retrieval accuracy, statistical significance and compositional similarity in protein sequence database searches

Author: Alejandro A. Schäffer
Altschul
Altschul
Altschul
Altschul
Altschul
Altschul
Bailey
Berger
Brenner
Chandonia
Dembo
E. Michael Gertz
Eddy
Elston
Endres
Fisher
Green
Gribskov
Gumbel
Henikoff
Kann
Karlin
Karplus
Karplus
Lupas
McDonnell
Mott
Murzin
Pearson
Pearson
Richa Agarwala
Robinson
Rost
Schäffer
Schäffer
Sharon
Smith
Smith
Stephen F. Altschul
Sueoka
Wan
Wheeler
Wolf
Wootton
Yi-Kuo Yu
Yu
Yu
Publication venue: Oxford University Press
Publication date: 01/01/2006
Field of study

Protein sequence database search programs may be evaluated both for their retrieval accuracy—the ability to separate meaningful from chance similarities—and for the accuracy of their statistical assessments of reported alignments. However, methods for improving statistical accuracy can degrade retrieval accuracy by discarding compositional evidence of sequence relatedness. This evidence may be preserved by combining essentially independent measures of alignment and compositional similarity into a unified measure of sequence similarity. A version of the BLAST protein database search program, modified to employ this new measure, outperforms the baseline program in both retrieval and statistical accuracy on ASTRAL, a SCOP-based test set

CiteSeerX

Crossref

PubMed Central

PSI-BLAST pseudocounts and the minimum description length principle

Author: Alejandro A. Schäffer
Altschul
Altschul
Altschul
Bailey
Boeckmann
Brenner
Brown
Chandonia
Cover
Dayhoff
E. Michael Gertz
Eddy
Fisher
Gerstein
Gotoh
Gribskov
Grünwald
Grünwald
Henikoff
Henikoff
Henikoff
Karlin
Krogh
Lawrence
Murzin
Nishida
Richa Agarwala
Sander
Schwartz
Schäffer
Schäffer
Sibbald
Sjölander
Smith
Stephen F. Altschul
Tatusov
Thompson
Wheeler
Yi-Kuo Yu
Publication venue: Oxford University Press
Publication date
Field of study

Position specific score matrices (PSSMs) are derived from multiple sequence alignments to aid in the recognition of distant protein sequence relationships. The PSI-BLAST protein database search program derives the column scores of its PSSMs with the aid of pseudocounts, added to the observed amino acid counts in a multiple alignment column. In the absence of theory, the number of pseudocounts used has been a completely empirical parameter. This article argues that the minimum description length principle can motivate the choice of this parameter. Specifically, for realistic alignments, the principle supports the practice of using a number of pseudocounts essentially independent of alignment size. However, it also implies that more highly conserved columns should use fewer pseudocounts, increasing the inter-column contrast of the implied PSSMs. A new method for calculating pseudocounts that significantly improves PSI-BLAST's; retrieval accuracy is now employed by default

Crossref

PubMed Central

Trees on networks: resolving statistical patterns of phylogenetic similarities among interacting proteins

Author: A Valencia
A Wagner
AC Gavin
AK Ramani
AK Ramani
B Lemos
BL Drees
C Brun
CM Deane
CS Goh
CS Goh
D Fitzpatrick
D Juan
D Juan
DA Drummond
DR Strong
E Alm
E Bender
E de Silva
E de Silva
F Pazos
F Pazos
F Tajima
HB Fraser
I Agrafioti
I Jordan
I Xenarios
J Berg
J Felsenstein
J Felsenstein
J Gertz
J Yu
JD Thompson
JS Bader
K Wolfe
L Hakes
L Salwinski
L Skrabanek
M Ashburner
M Pellegrini
Michael PH Stumpf
N Bhardwaj
P Erdös
P Harvey
P Pamilo
P Sharp
R Cho
R Jothi
R Milo
RM May
T Reguly
T Schlitt
T Thorne
William P Kelly
WP Kelly
Z Yang
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Phylogenies capture the evolutionary ancestry linking extant species. Correlations and similarities among a set of species are mediated by and need to be understood in terms of the phylogenic tree. In a similar way it has been argued that biological networks also induce correlations among sets of interacting genes or their protein products. Results We develop suitable statistical resampling schemes that can incorporate these two potential sources of correlation into a single inferential framework. To illustrate our approach we apply it to protein interaction data in yeast and investigate whether the phylogenetic trees of interacting proteins in a panel of yeast species are more similar than would be expected by chance. Conclusions While we find only negligible evidence for such increased levels of similarities, our statistical approach allows us to resolve the previously reported contradictory results on the levels of co-evolution induced by protein-protein interactions. We conclude with a discussion as to how we may employ the statistical framework developed here in further functional and evolutionary analyses of biological networks and systems.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

University of Melbourne Institutional Repository

The Drosophila Gap Gene Network Is Composed of Two Parallel Toggle Switches

Author: A Pisarev
AB Owen
AD Lander
AM Berezhkovskii
AP Lifanov
C Sample
C Schulz
D Lebrecht
D Papatsenko
D Papatsenko
D Papatsenko
D Stanojevic
D Wilson
DE Clyde
DM Holloway
Dmitri Papatsenko
DS Burz
DS Burz
E Mjolsness
E Poustelnikova
EH Davidson
FJ Lopes
G Struhl
G Struhl
GK Ackers
H Bolouri
H Bolouri
H Janssens
J Gertz
J Jaeger
J Jaeger
JA Langeland
JS Margolis
K Fujimoto
L Bintu
L Bintu
M Hulskamp
M Kazemian
M Ptashne
Manu
MD Librizzi
Michael Levine
P Zuo
R Dilao
R Kraut
R Kraut
R Milo
R Rivera-Pomar
RP Zinzen
RP Zinzen
S Ishihara
S Ladame
T Gregor
T Gregor
TJ Perkins
Vladimir N. Uversky
W Driever
W Driever
WK Hastings
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

Drosophila “gap” genes provide the first response to maternal gradients in the early fly embryo. Gap genes are expressed in a series of broad bands across the embryo during first hours of development. The gene network controlling the gap gene expression patterns includes inputs from maternal gradients and mutual repression between the gap genes themselves. In this study we propose a modular design for the gap gene network, involving two relatively independent network domains. The core of each network domain includes a toggle switch corresponding to a pair of mutually repressive gap genes, operated in space by maternal inputs. The toggle switches present in the gap network are evocative of the phage lambda switch, but they are operated positionally (in space) by the maternal gradients, so the synthesis rates for the competing components change along the embryo anterior-posterior axis. Dynamic model, constructed based on the proposed principle, with elements of fractional site occupancy, required 5–7 parameters to fit quantitative spatial expression data for gap gradients. The identified model solutions (parameter combinations) reproduced major dynamic features of the gap gradient system and explained gap expression in a variety of segmentation mutants

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Streptococcus pneumoniae Clonal Complex 199: Genetic Diversity and Tissue-Specific Virulence

Streptococcus pneumoniae is an important cause of otitis media and invasive disease. Since introduction of the heptavalent pneumococcal conjugate vaccine, there has been an increase in replacement disease due to serotype 19A clonal complex (CC)199 isolates. The goals of this study were to 1) describe genetic diversity among nineteen CC199 isolates from carriage, middle ear, blood, and cerebrospinal fluid, 2) compare CC199 19A (n = 3) and 15B/C (n = 2) isolates in the chinchilla model for pneumococcal disease, and 3) identify accessory genes associated with tissue-specific disease among a larger collection of S. pneumoniae isolates. CC199 isolates were analyzed by comparative genome hybridization. One hundred and twenty-seven genes were variably present. The CC199 phylogeny split into two main clades, one comprised predominantly of carriage isolates and another of disease isolates. Ability to colonize and cause disease did not differ by serotype in the chinchilla model. However, isolates from the disease clade were associated with faster time to bacteremia compared to carriage clade isolates. One 19A isolate exhibited hypervirulence. Twelve tissue-specific genes/regions were identified by correspondence analysis. After screening a diverse collection of 326 isolates, spr0282 was associated with carriage. Four genes/regions, SP0163, SP0463, SPN05002 and RD8a were associated with middle ear isolates. SPN05002 also associated with blood and CSF, while RD8a associated with blood isolates. The hypervirulent isolate's genome was sequenced using the Solexa paired-end sequencing platform and compared to that of a reference serotype 19A isolate, revealing the presence of a novel 20 kb region with sequence similarity to bacteriophage genes. Genetic factors other than serotype may modulate virulence potential in CC199. These studies have implications for the long-term effectiveness of conjugate vaccines. Ideally, future vaccines would target common proteins to effectively reduce carriage and disease in the vaccinated population

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central