Search CORE

36 research outputs found

Automatic categorization of diverse experimental information in the bioscience literature

Author: Brown Nick
Chen Wen
Davis Paul
Fang Ruihua
Fernandes Jolene
Gelbart William M.
Marygold Steven J.
Matthews Beverley
Millburn Gillian
Schindelman Gary
Sternberg Paul W.
Tuli Mary Ann
Van Auken Kimberly
Wang Xiaodong
Zhang Haiyan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Background: Curation of information from bioscience literature into biological knowledge databases is a crucial way of capturing experimental information in a computable form. During the biocuration process, a critical first step is to identify from all published literature the papers that contain results for a specific data type the curator is interested in annotating. This step normally requires curators to manually examine many papers to ascertain which few contain information of interest and thus, is usually time consuming. We developed an automatic method for identifying papers containing these curation data types among a large pool of published scientific papers based on the machine learning method Support Vector Machine (SVM). This classification system is completely automatic and can be readily applied to diverse experimental data types. It has been in use in production for automatic categorization of 10 different experimental datatypes in the biocuration process at WormBase for the past two years and it is in the process of being adopted in the biocuration process at FlyBase and the Saccharomyces Genome Database (SGD). We anticipate that this method can be readily adopted by various databases in the biocuration community and thereby greatly reducing time spent on an otherwise laborious and demanding task. We also developed a simple, readily automated procedure to utilize training papers of similar data types from different bodies of literature such as C. elegans and D. melanogaster to identify papers with any of these data types for a single database. This approach has great significance because for some data types, especially those of low occurrence, a single corpus often does not have enough training papers to achieve satisfactory performance. Results: We successfully tested the method on ten data types from WormBase, fifteen data types from FlyBase and three data types from Mouse Genomics Informatics (MGI). It is being used in the curation work flow at WormBase for automatic association of newly published papers with ten data types including RNAi, antibody, phenotype, gene regulation, mutant allele sequence, gene expression, gene product interaction, overexpression phenotype, gene interaction, and gene structure correction. Conclusions: Our methods are applicable to a variety of data types with training set containing several hundreds to a few thousand documents. It is completely automatic and, thus can be readily incorporated to different workflow at different literature-based databases. We believe that the work presented here can contribute greatly to the tremendous task of automating the important yet labor-intensive biocuration effort

Crossref

Springer - Publisher Connector

Harvard University - DASH

Caltech Authors

Analysis of 14 BAC sequences from the Aedes aegypti genome: a benchmark for genome annotation and assembly

Author: Campbell Kathy S
Collins Frank H
deBruyn Becky
Gelbart William M
Koo Hean
Lobo Neil F
Loftus Brendan J
Severson David W
Thaner Daniel
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

In order to provide a set of manually curated and annotated sequences from the Aedes aegypti genome, mapped BAC clones encompassing 1.57 Mb were sequenced, assembled and manually annotated using computational gene-finding, EST matches as well as comparative protein homology

CiteSeerX

Harvard University - DASH

Springer - Publisher Connector

PubMed Central

The Evolution of the Anopheles 16 Genomes Project

Author: Besansky Nora J.
Christophides George K.
Collins Frank H.
Emrich Scott J.
Fontaine Michael C.
Gelbart William
Hahn Matthew W.
Howell Paul I.
Kafatos Fotis C.
Lawson Daniel
Muskavitch Marc A. T.
Neafsey Daniel E.
Waterhouse Robert M.
Williams Louise J.
Publication venue: 'Genetics Society of America'
Publication date: 01/01/2013
Field of study

We report the imminent completion of a set of reference genome assemblies for 16 species of Anopheles mosquitoes. In addition to providing a generally useful resource for comparative genomic analyses, these genome sequences will greatly facilitate exploration of the capacity exhibited by some Anopheline mosquito species to serve as vectors for malaria parasites. A community analysis project will commence soon to perform a thorough comparative genomic investigation of these newly sequenced genomes. Completion of this project via the use of short next-generation sequence reads required innovation in both the bioinformatic and laboratory realms, and the resulting knowledge gained could prove useful for genome sequencing projects targeting other unconventional genomes

DSpace@MIT

Crossref

Harvard University - DASH

PubMed Central

Archive ouverte UNIGE

Crystallization of Opals from Polydisperse Nanoparticles

Author: A. Vrij
B. J. Alder
B. O. Dabbousi
D. Bargeman
Daniel V. Leff
H. C. Hamaker
J. Bibette
J. L. Barrat
James R. Heath
M. Brust
M. D. Eldridge
P. Bartlett
P. N. Pusey
P. N. Pusey
Pamela C. Ohara
R. C. Ball
R. J. Hunter
R. McRae
S. W. Rick
W. G. Hoover
W. G. T. Kranendonk
W. L. Wilson
W. van Megen
William M. Gelbart
Publication venue: 'American Physical Society (APS)'
Publication date
Field of study

Crossref

Sizes of Long RNA Molecules Are Determined by the Branching Patterns of Their Secondary Structures

Author: Alexander Borodavka
Athavale
Avinoam Ben-Shaul
Borodavka
Borodavka
Borodavka
Bringloe
Bundschuh
Bustamante
Cadena-Nava
Clemson
Comas-Garcia
Dent
Devkota
Ding
Draper
Dykeman
Fang
Fang
Fire
Garmann
Gopal
Gopal
Grilley
Gupta
Hammann
Harvey
Hyeon
Johnson
Johnson
Kramers
Lee
Li
Lorenz
Mandiyan
Muroga
Necsulea
Novikova
Patel
Perlmutter
Peter G. Stockley
Richards
Rinn
Rinn
Roman Tuma
Rubinstein
Schluenzen
Seol
Sim
Singaram
Somarowthu
Strauss
Surendra W. Singaram
Tam
Toropova
Toropova
Vallot
Wapinski
Werner
William M. Gelbart
Willingham
Winkler
Woodson
Yoffe
Zahran
Zaug
Publication venue: 'Elsevier BV'
Publication date: 15/11/2016
Field of study

Long RNA molecules are at the core of gene regulation across all kingdoms of life, whilst also serving as genomes in RNA viruses. Few studies have addressed the basic physical properties of long single-stranded RNAs. Long RNAs with non-repeating sequences usually adopt highly ramified secondary structures and are better described as branched polymers. In order to test whether a branched polymer model can estimate the overall sizes of large RNAs we employed fluorescence correlation spectroscopy to examine the hydrodynamic radii of a broad spectrum of biologically important RNAs, ranging from viral genomes to long non-coding regulatory RNAs. The relative sizes of long RNAs measured at low ionic strength correspond well to those predicted by two theoretical approaches that treat the effective branching associated with secondary structure formation – one employing the Kramers theorem for calculating radii of gyration, and the other featuring the metric of “maximum ladder distance”. Upon addition of multivalent cations, most RNAs are found to be compacted as compared with their original, low-ionic-strength sizes. These results suggest that sizes of long RNAmolecules are determined by the branching pattern of their secondary structures. They also experimentally validate the proposed computational approaches for estimating hydrodynamic radii of single-stranded RNAs, which use generic RNA structure prediction tools and thus can be universally applied to a wide range of long RNAs

Elsevier - Publisher Connector

Crossref

PubMed Central

White Rose Research Online

VectorBase: a home for invertebrate vectors of human pathogens

Author: Arensburger Peter
Atkinson Peter
Besansky Nora J.
Birney Ewan
Bruggner Robert V.
Butler Ryan
Campbell Kathryn S.
Christley Scott
Christophides George K.
Collins Frank H.
Dialynas Emmanuel
Emmert David
Gelbart William M.
Hammond Martin
Hill Catherine A.
Kafatos Fotis C.
Kennedy Ryan C.
Lawson Daniel
Lobo Neil F.
Louis Christos
MacCallum M. Robert
Madey Greg
Megy Karine
Redmond Seth
Russo Susan
Severson David W.
Stinson Eric O.
Topalis Pantelis
Zdobnov Evgeny M.
Publication venue: Oxford University Press
Publication date: 01/12/2006
Field of study

VectorBase () is a web-accessible data repository for information about invertebrate vectors of human pathogens. VectorBase annotates and maintains vector genomes providing an integrated resource for the research community. Currently, VectorBase contains genome information for two organisms: Anopheles gambiae, a vector for the Plasmodium protozoan agent causing malaria, and Aedes aegypti, a vector for the flaviviral agents causing Yellow fever and Dengue fever

CiteSeerX

PubMed Central

Archive ouverte UNIGE

A vision for the future of genomics research

Author: / Spl
A Barrington Brown
Alan E Guttmacher
Alison Hopkins
Bronya J Keats
Darryl Leja
David R Burgess
Eric D Green
Eric T Juengst
Francis S Collins
Janet D Rowley
Kim J Nickerson
Mark S
Maynard V Olson
Raju Kucherlapati
Richard P Lifton
Robert H Waterston
Robert Tepper
Ronald W Davis
Tadataka Yamada
Vickie Yates Brown
William M Gelbart
Wylie Burke
Publication venue
Publication date: 01/01/2003
Field of study

CiteSeerX

Chromosomal Rearrangement Inferred From Comparisons of 12 Drosophila Genomes

Author: Arjun Bhutkar
Mu Xu
Stephen W. Schaeffer
Susan M. Russo
Temple F. Smith
William M. Gelbart
Publication venue: 'Genetics Society of America'
Publication date
Field of study

Crossref

Role of RNA Branchedness in the Competition for Viral Capsid Proteins

Author: Avinoam Ben-Shaul (270367)
Charles M. Knobler (1491661)
Rees F. Garmann (1368174)
Surendra W. Singaram (1491664)
William M. Gelbart (22449)
Publication venue
Publication date
Field of study

To optimize bindingand packagingby their capsid proteins (CP), single-stranded (ss) RNA viral genomes often have local secondary/tertiary structures with high CP affinity, with these “packaging signals” serving as heterogeneous nucleation sites for the formation of capsids. Under typical <i>in vitro</i> self-assembly conditions, however, and in particular for the case of many ssRNA viruses whose CP have cationic N-termini, the adsorption of CP by RNA is nonspecific because the CP concentration exceeds the largest dissociation constant for CP–RNA binding. Consequently, the RNA is saturated by bound protein before lateral interactions between CP drive the homogeneous nucleation of capsids. But, before capsids are formed, the binding of protein remains reversible and introduction of another RNA specieswith a different length and/or sequenceis found experimentally to result in significant redistribution of protein. Here we argue that, for a given RNA mass, the sequence with the highest affinity for protein is the one with the most compact secondary structure arising from self-complementarity; similarly, a long RNA steals protein from an equal mass of shorter ones. In both cases, it is the lateral attractions between bound proteins that determines the relative CP affinities of the RNA templates, even though the individual binding sites are identical. We demonstrate this with Monte Carlo simulations, generalizing the Rosenbluth method for excluded-volume polymers to include branching of the polymers and their reversible binding by protein

FigShare

Characterization of Viral Capsid Protein Self-Assembly around Short Single-Stranded RNA

Author: Avinoam Ben-Shaul (270367)
Charles M. Knobler (1491661)
Mauricio Comas-Garcia (1780162)
Rees F. Garmann (1368174)
Surendra W. Singaram (1780159)
William M. Gelbart (22449)
Publication venue
Publication date
Field of study

For many viruses, the packaging of a single-stranded RNA (ss-RNA) genome is spontaneous, driven by capsid protein–capsid protein (CP) and CP–RNA interactions. Furthermore, for some multipartite ss-RNA viruses, copackaging of two or more RNA molecules is a common strategy. Here we focus on RNA copackaging <i>in vitro</i> by using cowpea chlorotic mottle virus (CCMV) CP and an RNA molecule that is short (500 nucleotides (nts)) compared to the lengths (≈3000 nts) packaged in wild-type virions. We show that the degree of cooperativity of virus assembly depends not only on the relative strength of the CP–CP and CP–RNA interactions but also on the RNA being short: a 500-nt RNA molecule cannot form a capsid by itself, so its packaging requires the aggregation of multiple CP–RNA complexes. By using fluorescence correlation spectroscopy (FCS), we show that at neutral pH and sufficiently low concentrations RNA and CP form complexes that are smaller than the wild-type capsid and that four 500-nt RNAs are packaged into virus-like particles (VLPs) only upon lowering the pH. Further, a variety of bulk-solution techniques confirm that fully ordered VLPs are formed only upon acidification. On the basis of these results, we argue that the observed high degree of cooperativity involves equilibrium between multiple CP/RNA complexes

FigShare