43,890 research outputs found
A critical evaluation of network and pathway based classifiers for outcome prediction in breast cancer
Recently, several classifiers that combine primary tumor data, like gene
expression data, and secondary data sources, such as protein-protein
interaction networks, have been proposed for predicting outcome in breast
cancer. In these approaches, new composite features are typically constructed
by aggregating the expression levels of several genes. The secondary data
sources are employed to guide this aggregation. Although many studies claim
that these approaches improve classification performance over single gene
classifiers, the gain in performance is difficult to assess. This stems mainly
from the fact that different breast cancer data sets and validation procedures
are employed to assess the performance. Here we address these issues by
employing a large cohort of six breast cancer data sets as benchmark set and by
performing an unbiased evaluation of the classification accuracies of the
different approaches. Contrary to previous claims, we find that composite
feature classifiers do not outperform simple single gene classifiers. We
investigate the effect of (1) the number of selected features; (2) the specific
gene set from which features are selected; (3) the size of the training set and
(4) the heterogeneity of the data set on the performance of composite feature
and single gene classifiers. Strikingly, we find that randomization of
secondary data sources, which destroys all biological information in these
sources, does not result in a deterioration in performance of composite feature
classifiers. Finally, we show that when a proper correction for gene set size
is performed, the stability of single gene sets is similar to the stability of
composite feature sets. Based on these results there is currently no reason to
prefer prognostic classifiers based on composite features over single gene
classifiers for predicting outcome in breast cancer
The role of mentorship in protege performance
The role of mentorship on protege performance is a matter of importance to
academic, business, and governmental organizations. While the benefits of
mentorship for proteges, mentors and their organizations are apparent, the
extent to which proteges mimic their mentors' career choices and acquire their
mentorship skills is unclear. Here, we investigate one aspect of mentor
emulation by studying mentorship fecundity---the number of proteges a mentor
trains---with data from the Mathematics Genealogy Project, which tracks the
mentorship record of thousands of mathematicians over several centuries. We
demonstrate that fecundity among academic mathematicians is correlated with
other measures of academic success. We also find that the average fecundity of
mentors remains stable over 60 years of recorded mentorship. We further uncover
three significant correlations in mentorship fecundity. First, mentors with
small mentorship fecundity train proteges that go on to have a 37% larger than
expected mentorship fecundity. Second, in the first third of their career,
mentors with large fecundity train proteges that go on to have a 29% larger
than expected fecundity. Finally, in the last third of their career, mentors
with large fecundity train proteges that go on to have a 31% smaller than
expected fecundity.Comment: 23 pages double-spaced, 4 figure
Network Kriging
Network service providers and customers are often concerned with aggregate
performance measures that span multiple network paths. Unfortunately, forming
such network-wide measures can be difficult, due to the issues of scale
involved. In particular, the number of paths grows too rapidly with the number
of endpoints to make exhaustive measurement practical. As a result, it is of
interest to explore the feasibility of methods that dramatically reduce the
number of paths measured in such situations while maintaining acceptable
accuracy.
We cast the problem as one of statistical prediction--in the spirit of the
so-called `kriging' problem in spatial statistics--and show that end-to-end
network properties may be accurately predicted in many cases using a
surprisingly small set of carefully chosen paths. More precisely, we formulate
a general framework for the prediction problem, propose a class of linear
predictors for standard quantities of interest (e.g., averages, totals,
differences) and show that linear algebraic methods of subset selection may be
used to effectively choose which paths to measure. We characterize the
performance of the resulting methods, both analytically and numerically. The
success of our methods derives from the low effective rank of routing matrices
as encountered in practice, which appears to be a new observation in its own
right with potentially broad implications on network measurement generally.Comment: 16 pages, 9 figures, single-space
Climate Dynamics: A Network-Based Approach for the Analysis of Global Precipitation
Precipitation is one of the most important meteorological variables for defining the climate dynamics, but the spatial patterns of precipitation have not been fully investigated yet. The complex network theory, which provides a robust tool to investigate the statistical interdependence of many interacting elements, is used here to analyze the spatial dynamics of annual precipitation over seventy years (1941-2010). The precipitation network is built associating a node to a geographical region, which has a temporal distribution of precipitation, and identifying possible links among nodes through the correlation function. The precipitation network reveals significant spatial variability with barely connected regions, as Eastern China and Japan, and highly connected regions, such as the African Sahel, Eastern Australia and, to a lesser extent, Northern Europe. Sahel and Eastern Australia are remarkably dry regions, where low amounts of rainfall are uniformly distributed on continental scales and small-scale extreme events are rare. As a consequence, the precipitation gradient is low, making these regions well connected on a large spatial scale. On the contrary, the Asiatic South-East is often reached by extreme events such as monsoons, tropical cyclones and heat waves, which can all contribute to reduce the correlation to the short-range scale only. Some patterns emerging between mid-latitude and tropical regions suggest a possible impact of the propagation of planetary waves on precipitation at a global scale. Other links can be qualitatively associated to the atmospheric and oceanic circulation. To analyze the sensitivity of the network to the physical closeness of the nodes, short-term connections are broken. The African Sahel, Eastern Australia and Northern Europe regions again appear as the supernodes of the network, confirming furthermore their long-range connection structure. Almost all North-American and Asian nodes vanish, revealing that extreme events can enhance high precipitation gradients, leading to a systematic absence of long-range patterns
- …