33,681 research outputs found
Boosting the concordance index for survival data - a unified framework to derive and evaluate biomarker combinations
The development of molecular signatures for the prediction of time-to-event
outcomes is a methodologically challenging task in bioinformatics and
biostatistics. Although there are numerous approaches for the derivation of
marker combinations and their evaluation, the underlying methodology often
suffers from the problem that different optimization criteria are mixed during
the feature selection, estimation and evaluation steps. This might result in
marker combinations that are only suboptimal regarding the evaluation criterion
of interest. To address this issue, we propose a unified framework to derive
and evaluate biomarker combinations. Our approach is based on the concordance
index for time-to-event data, which is a non-parametric measure to quantify the
discrimatory power of a prediction rule. Specifically, we propose a
component-wise boosting algorithm that results in linear biomarker combinations
that are optimal with respect to a smoothed version of the concordance index.
We investigate the performance of our algorithm in a large-scale simulation
study and in two molecular data sets for the prediction of survival in breast
cancer patients. Our numerical results show that the new approach is not only
methodologically sound but can also lead to a higher discriminatory power than
traditional approaches for the derivation of gene signatures.Comment: revised manuscript - added simulation study, additional result
Energy management in communication networks: a journey through modelling and optimization glasses
The widespread proliferation of Internet and wireless applications has
produced a significant increase of ICT energy footprint. As a response, in the
last five years, significant efforts have been undertaken to include
energy-awareness into network management. Several green networking frameworks
have been proposed by carefully managing the network routing and the power
state of network devices.
Even though approaches proposed differ based on network technologies and
sleep modes of nodes and interfaces, they all aim at tailoring the active
network resources to the varying traffic needs in order to minimize energy
consumption. From a modeling point of view, this has several commonalities with
classical network design and routing problems, even if with different
objectives and in a dynamic context.
With most researchers focused on addressing the complex and crucial
technological aspects of green networking schemes, there has been so far little
attention on understanding the modeling similarities and differences of
proposed solutions. This paper fills the gap surveying the literature with
optimization modeling glasses, following a tutorial approach that guides
through the different components of the models with a unified symbolism. A
detailed classification of the previous work based on the modeling issues
included is also proposed
An update on statistical boosting in biomedicine
Statistical boosting algorithms have triggered a lot of research during the
last decade. They combine a powerful machine-learning approach with classical
statistical modelling, offering various practical advantages like automated
variable selection and implicit regularization of effect estimates. They are
extremely flexible, as the underlying base-learners (regression functions
defining the type of effect for the explanatory variables) can be combined with
any kind of loss function (target function to be optimized, defining the type
of regression setting). In this review article, we highlight the most recent
methodological developments on statistical boosting regarding variable
selection, functional regression and advanced time-to-event modelling.
Additionally, we provide a short overview on relevant applications of
statistical boosting in biomedicine
Scaling DBSCAN-like algorithms for event detection systems in Twitter
The increasing use of mobile social networks has lately transformed news media. Real-world events are nowadays reported in social networks much faster than in traditional channels. As a result, the autonomous detection of events from networks like Twitter has gained lot of interest in both research and media groups. DBSCAN-like algorithms constitute a well-known clustering approach to retrospective event detection. However, scaling such algorithms to geographically large regions and temporarily long periods present two major shortcomings. First, detecting real-world events from the vast amount of tweets cannot be performed anymore in a single machine. Second, the tweeting activity varies a lot within these broad space-time regions limiting the use of global parameters. Against this background, we propose to scale DBSCAN-like event detection techniques by parallelizing and distributing them through a novel density-aware MapReduce scheme. The proposed scheme partitions tweet data as per its spatial and temporal features and tailors local DBSCAN parameters to local tweet densities. We implement the scheme in Apache Spark and evaluate its performance in a dataset composed of geo-located tweets in the Iberian peninsula during the course of several football matches. The results pointed out to the benefits of our proposal against other state-of-the-art techniques in terms of speed-up and detection accuracy.Peer ReviewedPostprint (author's final draft
- …