257 research outputs found
A model for Bioinformatics training : the Marine Biological Laboratory
Author Posting. © The Authors, 2010. This is the author's version of the work. It is posted here by permission of Oxford University Press for personal use, not for redistribution. The definitive version was published in Briefings in Bioinformatics 6 (2010): 610-615, doi:10.1093/bib/bbq029.Many areas of science such as biology, medicine, and oceanography are becoming increasingly
data-rich and most programs that train scientists do not address informatics techniques or
technologies that are necessary for managing and analyzing large amounts of data. Educational
resources for scientists in informatics are scarce, yet scientists need the skills and knowledge to
work with informaticians and manage graduate students and post-docs in informatics projects.
The Marine Biological Laboratory houses a world-renowned library and is involved in a number
of informatics projects in the sciences. The MBL has been home to the National Library of
Medicine's BioMedical Informatics Course for nearly two decades and is committed to educating
scientists and other scholars in informatics. In an innovative, immersive learning experience,
Grant Yamashita, a biologist and post-doc at Arizona State University, visited the Science
Informatics Group at MBL to learn first hand how informatics is done and how informatics
teams work. Hands-on work with developers, systems administrators, librarians, and other
scientists provided an invaluable education in informatics and is a model for future science
informatics training.This work was supported by the National Science Foundation [0926026 to G.Y., SES-0623176];
Jewett Foundation; Ellison Medical Foundation
Does Collocation Inform the Impact of Collaboration?
Background
It has been shown that large interdisciplinary teams working across geography are more likely to be impactful. We asked whether the physical proximity of collaborators remained a strong predictor of the scientific impact of their research as measured by citations of the resulting publications.
Methodology/Principal Findings
Articles published by Harvard investigators from 1993 to 2003 with at least two authors were identified in the domain of biomedical science. Each collaboration was geocoded to the precise three-dimensional location of its authors. Physical distances between any two coauthors were calculated and associated with corresponding citations. Relationship between distance of coauthors and citations for four author relationships (first-last, first-middle, last-middle, and middle-middle) were investigated at different spatial scales. At all sizes of collaborations (from two authors to dozens of authors), geographical proximity between first and last author is highly informative of impact at the microscale (i.e. within building) and beyond. The mean citation for first-last author relationship decreased as the distance between them increased in less than one km range as well as in the three categorized ranges (in the same building, same city, or different city). Such a trend was not seen in other three author relationships.
Conclusions/Significance
Despite the positive impact of emerging communication technologies on scientific research, our results provide striking evidence for the role of physical proximity as a predictor of the impact of collaborations.Ewing Marion Kauffman FoundationHarvard University. Office of the Provost (1992-
Developing and applying heterogeneous phylogenetic models with XRate
Modeling sequence evolution on phylogenetic trees is a useful technique in
computational biology. Especially powerful are models which take account of the
heterogeneous nature of sequence evolution according to the "grammar" of the
encoded gene features. However, beyond a modest level of model complexity,
manual coding of models becomes prohibitively labor-intensive. We demonstrate,
via a set of case studies, the new built-in model-prototyping capabilities of
XRate (macros and Scheme extensions). These features allow rapid implementation
of phylogenetic models which would have previously been far more
labor-intensive. XRate's new capabilities for lineage-specific models,
ancestral sequence reconstruction, and improved annotation output are also
discussed. XRate's flexible model-specification capabilities and computational
efficiency make it well-suited to developing and prototyping phylogenetic
grammar models. XRate is available as part of the DART software package:
http://biowiki.org/DART .Comment: 34 pages, 3 figures, glossary of XRate model terminolog
Subgraphs in random networks
Understanding the subgraph distribution in random networks is important for
modelling complex systems. In classic Erdos networks, which exhibit a
Poissonian degree distribution, the number of appearances of a subgraph G with
n nodes and g edges scales with network size as \mean{G} ~ N^{n-g}. However,
many natural networks have a non-Poissonian degree distribution. Here we
present approximate equations for the average number of subgraphs in an
ensemble of random sparse directed networks, characterized by an arbitrary
degree sequence. We find new scaling rules for the commonly occurring case of
directed scale-free networks, in which the outgoing degree distribution scales
as P(k) ~ k^{-\gamma}. Considering the power exponent of the degree
distribution, \gamma, as a control parameter, we show that random networks
exhibit transitions between three regimes. In each regime the subgraph number
of appearances follows a different scaling law, \mean{G} ~ N^{\alpha}, where
\alpha=n-g+s-1 for \gamma<2, \alpha=n-g+s+1-\gamma for 2<\gamma<\gamma_c, and
\alpha=n-g for \gamma>\gamma_c, s is the maximal outdegree in the subgraph, and
\gamma_c=s+1. We find that certain subgraphs appear much more frequently than
in Erdos networks. These results are in very good agreement with numerical
simulations. This has implications for detecting network motifs, subgraphs that
occur in natural networks significantly more than in their randomized
counterparts.Comment: 8 pages, 5 figure
CORRIE: enzyme sequence annotation with confidence estimates
Using a previously developed automated method for enzyme annotation, we report the re-annotation of the ENZYME database and the analysis of local error rates per class. In control experiments, we demonstrate that the method is able to correctly re-annotate 91% of all Enzyme Classification (EC) classes with high coverage (755 out of 827). Only 44 enzyme classes are found to contain false positives, while the remaining 28 enzyme classes are not represented. We also show cases where the re-annotation procedure results in partial overlaps for those few enzyme classes where a certain inconsistency might appear between homologous proteins, mostly due to function specificity. Our results allow the interactive exploration of the EC hierarchy for known enzyme families as well as putative enzyme sequences that may need to be classified within the EC hierarchy. These aspects of our framework have been incorporated into a web-server, called CORRIE, which stands for Correspondence Indicator Estimation and allows the interactive prediction of a functional class for putative enzymes from sequence alone, supported by probabilistic measures in the context of the pre-calculated Correspondence Indicators of known enzymes with the functional classes of the EC hierarchy. The CORRIE server is available at:
Constructive links between some morphological hierarchies on edge-weighted graphs
International audienceIn edge-weighted graphs, we provide a unified presentation of a family of popular morphological hierarchies such as component trees, quasi flat zones, binary partition trees, and hierarchical watersheds. For any hierarchy of this family, we show if (and how) it can be obtained from any other element of the family. In this sense, the main contribution of this paper is the study of all constructive links between these hierarchies
Rise and Demise of Bioinformatics? Promise and Progress
The field of bioinformatics and computational biology has gone through a number of transformations during the past 15 years, establishing itself as a key component of new biology. This spectacular growth has been challenged by a number of disruptive changes in science and technology. Despite the apparent fatigue of the linguistic use of the term itself, bioinformatics has grown perhaps to a point beyond recognition. We explore both historical aspects and future trends and argue that as the field expands, key questions remain unanswered and acquire new meaning while at the same time the range of applications is widening to cover an ever increasing number of biological disciplines. These trends appear to be pointing to a redefinition of certain objectives, milestones, and possibly the field itself
On morphological hierarchical representations for image processing and spatial data clustering
Hierarchical data representations in the context of classi cation and data
clustering were put forward during the fties. Recently, hierarchical image
representations have gained renewed interest for segmentation purposes. In this
paper, we briefly survey fundamental results on hierarchical clustering and
then detail recent paradigms developed for the hierarchical representation of
images in the framework of mathematical morphology: constrained connectivity
and ultrametric watersheds. Constrained connectivity can be viewed as a way to
constrain an initial hierarchy in such a way that a set of desired constraints
are satis ed. The framework of ultrametric watersheds provides a generic scheme
for computing any hierarchical connected clustering, in particular when such a
hierarchy is constrained. The suitability of this framework for solving
practical problems is illustrated with applications in remote sensing
Brain Radiation Information Data Exchange (BRIDE): Integration of experimental data from low-dose ionising radiation research for pathway discovery
Background: The underlying molecular processes representing stress responses to low-dose ionising radiation (LDIR) in mammals are just beginning to be understood. In particular, LDIR effects on the brain and their possible association with neurodegenerative disease are currently being explored using omics technologies. Results: We describe a light-weight approach for the storage, analysis and distribution of relevant LDIR omics datasets. The data integration platform, called BRIDE, contains information from the literature as well as experimental information from transcriptomics and proteomics studies. It deploys a hybrid, distributed solution using both local storage and cloud technology. Conclusions: BRIDE can act as a knowledge broker for LDIR researchers, to facilitate molecular research on the systems biology of LDIR response in mammals. Its flexible design can capture a range of experimental information for genomics, epigenomics, transcriptomics, and proteomics. The data collection is available at:
- …