2,354 research outputs found
The Ashbya Genome Database (AGD)—a tool for the yeast community and genome biologists
The Ashbya Genome Database (AGD) is a comprehensive online source of information covering genes from the filamentous fungus Ashbya gossypii. The database content is based upon comparative genome annotation between A.gossypii and the closely related budding yeast Saccharomyces cerevisiae taking both sequence similarity and synteny (conserved order and orientation) into account. Release 2 of AGD contains 4718 protein-encoding loci located across seven chromosomes. Information can be retrieved using systematic or standard locus names from A.gossypii as well as budding and fission yeast. Approximately 90% of the genes in the genome of A.gossypii are homologous and syntenic to loci of budding yeast. Therefore, AGD is a useful tool not only for the various yeast communities in general but also for biologists who are interested in evolutionary aspects of genome research and comparative genome annotation. The database provides scientists with a convenient graphical user interface that includes various locus search and genome browsing options, data download and export functionalities and numerous reciprocal links to external databases including SGD, MIPS, GeneDB, KEGG, GermOnline and Swiss-Prot/TrEMBL. AGD is accessible at http://agd.unibas.c
Applicability of tandem affinity purification MudPIT to pathway proteomics in yeast
A combined multidimensional chromatography-mass spectrometry approach known as "MudPIT" enables rapid identification of proteins that interact with a tagged bait while bypassing some of the problems associated with analysis of polypeptides excised from SDS-polyacrylamide gels. However, the reproducibility, success rate, and applicability of MudPIT to the rapid characterization of dozens of proteins have not been reported. We show here that MudPIT reproducibly identified bona fide partners for budding yeast Gcn5p. Additionally, we successfully applied MudPIT to rapidly screen through a collection of tagged polypeptides to identify new protein interactions. Twenty-five proteins involved in transcription and progression through mitosis were modified with a new tandem affinity purification (TAP) tag. TAP-MudPIT analysis of 22 yeast strains that expressed these tagged proteins uncovered known or likely interacting partners for 21 of the baits, a figure that compares favorably with traditional approaches. The proteins identified here comprised 102 previously known and 279 potential physical interactions. Even for the intensively studied Swi2p/Snf2p, the catalytic subunit of the Swi/Snf chromatin remodeling complex, our analysis uncovered a new interacting protein, Rtt102p. Reciprocal tagging and TAP-MudPIT analysis of Rtt102p revealed subunits of both the Swi/Snf and RSC complexes, identifying Rtt102p as a common interactor with, and possible integral component of, these chromatin remodeling machines. Our experience indicates it is feasible for an investigator working with a single ion trap instrument in a conventional molecular/cellular biology laboratory to carry out proteomic characterization of a pathway, organelle, or process (i.e. "pathway proteomics") by systematic application of TAP-MudPIT
How long is co-operation in genomics sustainable?
Publications on the 16 yeast chromosome sequences group together over 400 different authors from Europe, Japan, Australia and the USA. When research is not organised in networks, it is carried out in large sequencing centres such as the Sanger Centre in Britain, the Helix Institute in Japan or Saint Louis University in the USA. Both cases illustrate the collective nature of knowledge creation. Other examples of co-operation between numerous researchers in various countries, more closely related to innovation, might also be mentioned, such as the development of software for comparing proteins or DNA sequences. Collective publications reveal the collective nature of research, whether it is carried out by major consortia (the case of yeast) or around large research facilities (such as the synchrotron or major genome sequencing centres). This collective nature stems from two factors: (1) the advantages of co-ordinating efforts on major projects (e.g. economies of scale and of collection) and (2) very strong interdependency in the creation and utilisation of knowledge (related to cumulativeness).
Ranking relations using analogies in biological and information networks
Analogical reasoning depends fundamentally on the ability to learn and
generalize about relations between objects. We develop an approach to
relational learning which, given a set of pairs of objects
,
measures how well other pairs A:B fit in with the set . Our work
addresses the following question: is the relation between objects A and B
analogous to those relations found in ? Such questions are
particularly relevant in information retrieval, where an investigator might
want to search for analogous pairs of objects that match the query set of
interest. There are many ways in which objects can be related, making the task
of measuring analogies very challenging. Our approach combines a similarity
measure on function spaces with Bayesian analysis to produce a ranking. It
requires data containing features of the objects of interest and a link matrix
specifying which relationships exist; no further attributes of such
relationships are necessary. We illustrate the potential of our method on text
analysis and information networks. An application on discovering functional
interactions between pairs of proteins is discussed in detail, where we show
that our approach can work in practice even if a small set of protein pairs is
provided.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS321 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Length, Protein-Protein Interactions, and Complexity
The evolutionary reason for the increase in gene length from archaea to
prokaryotes to eukaryotes observed in large scale genome sequencing efforts has
been unclear. We propose here that the increasing complexity of protein-protein
interactions has driven the selection of longer proteins, as longer proteins
are more able to distinguish among a larger number of distinct interactions due
to their greater average surface area. Annotated protein sequences available
from the SWISS-PROT database were analyzed for thirteen eukaryotes, eight
bacteria, and two archaea species. The number of subcellular locations to which
each protein is associated is used as a measure of the number of interactions
to which a protein participates. Two databases of yeast protein-protein
interactions were used as another measure of the number of interactions to
which each \emph{S. cerevisiae} protein participates. Protein length is shown
to correlate with both number of subcellular locations to which a protein is
associated and number of interactions as measured by yeast two-hybrid
experiments. Protein length is also shown to correlate with the probability
that the protein is encoded by an essential gene. Interestingly, average
protein length and number of subcellular locations are not significantly
different between all human proteins and protein targets of known, marketed
drugs. Increased protein length appears to be a significant mechanism by which
the increasing complexity of protein-protein interaction networks is
accommodated within the natural evolution of species. Consideration of protein
length may be a valuable tool in drug design, one that predicts different
strategies for inhibiting interactions in aberrant and normal pathways.Comment: 13 pages, 5 figures, 2 tables, to appear in Physica
PROPHECY—a database for high-resolution phenomics
The rapid recent evolution of the field phenomics—the genome-wide study of gene dispensability by quantitative analysis of phenotypes—has resulted in an increasing demand for new data analysis and visualization tools. Following the introduction of a novel approach for precise, genome-wide quantification of gene dispensability in Saccharomyces cerevisiae we here announce a public resource for mining, filtering and visualizing phenotypic data—the PROPHECY database. PROPHECY is designed to allow easy and flexible access to physiologically relevant quantitative data for the growth behaviour of mutant strains in the yeast deletion collection during conditions of environmental challenges. PROPHECY is publicly accessible at http://prophecy.lundberg.gu.se
Yeast Protein Interactome Topology Provides Framework for Coordinated-Functionality
The architecture of the network of protein-protein physical interactions in
Saccharomyces cerevisiae is exposed through the combination of two
complementary theoretical network measures, betweenness centrality and
`Q-modularity'. The yeast interactome is characterized by well-defined
topological modules connected via a small number of inter-module protein
interactions. Should such topological inter-module connections turn out to
constitute a form of functional coordination between the modules, we speculate
that this coordination is occurring typically in a pair-wise fashion, rather
than by way of high-degree hub proteins responsible for coordinating multiple
modules. The unique non-hub-centric hierarchical organization of the
interactome is not reproduced by gene duplication-and-divergence stochastic
growth models that disregard global selective pressures.Comment: Final, revised version. 13 pages. Please see Nucleic Acids open
access article for higher resolution figure
Diffusion Component Analysis: Unraveling Functional Topology in Biological Networks
Complex biological systems have been successfully modeled by biochemical and
genetic interaction networks, typically gathered from high-throughput (HTP)
data. These networks can be used to infer functional relationships between
genes or proteins. Using the intuition that the topological role of a gene in a
network relates to its biological function, local or diffusion based
"guilt-by-association" and graph-theoretic methods have had success in
inferring gene functions. Here we seek to improve function prediction by
integrating diffusion-based methods with a novel dimensionality reduction
technique to overcome the incomplete and noisy nature of network data. In this
paper, we introduce diffusion component analysis (DCA), a framework that plugs
in a diffusion model and learns a low-dimensional vector representation of each
node to encode the topological properties of a network. As a proof of concept,
we demonstrate DCA's substantial improvement over state-of-the-art
diffusion-based approaches in predicting protein function from molecular
interaction networks. Moreover, our DCA framework can integrate multiple
networks from heterogeneous sources, consisting of genomic information,
biochemical experiments and other resources, to even further improve function
prediction. Yet another layer of performance gain is achieved by integrating
the DCA framework with support vector machines that take our node vector
representations as features. Overall, our DCA framework provides a novel
representation of nodes in a network that can be used as a plug-in architecture
to other machine learning algorithms to decipher topological properties of and
obtain novel insights into interactomes.Comment: RECOMB 201
- …