Search CORE

4,427 research outputs found

Parallel Implementation of Efficient Search Schemes for the Inference of Cancer Progression Models

Author: Antoniotti Marco
Cazzaniga Paolo
Mauri Giancarlo
Nobile Marco S.
Ramazzotti Daniele
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

The emergence and development of cancer is a consequence of the accumulation over time of genomic mutations involving a specific set of genes, which provides the cancer clones with a functional selective advantage. In this work, we model the order of accumulation of such mutations during the progression, which eventually leads to the disease, by means of probabilistic graphic models, i.e., Bayesian Networks (BNs). We investigate how to perform the task of learning the structure of such BNs, according to experimental evidence, adopting a global optimization meta-heuristics. In particular, in this work we rely on Genetic Algorithms, and to strongly reduce the execution time of the inference -- which can also involve multiple repetitions to collect statistically significant assessments of the data -- we distribute the calculations using both multi-threading and a multi-node architecture. The results show that our approach is characterized by good accuracy and specificity; we also demonstrate its feasibility, thanks to a 84x reduction of the overall execution time with respect to a traditional sequential implementation

arXiv.org e-Print Archive

Repository TU/e

Efficient computational strategies to learn the structure of probabilistic graphical models of cumulative phenomena

Author: Antoniotti Marco
Graudenzi Alex
Nobile Marco S.
Ramazzotti Daniele
Publication venue
Publication date: 23/10/2018
Field of study

Structural learning of Bayesian Networks (BNs) is a NP-hard problem, which is further complicated by many theoretical issues, such as the I-equivalence among different structures. In this work, we focus on a specific subclass of BNs, named Suppes-Bayes Causal Networks (SBCNs), which include specific structural constraints based on Suppes' probabilistic causation to efficiently model cumulative phenomena. Here we compare the performance, via extensive simulations, of various state-of-the-art search strategies, such as local search techniques and Genetic Algorithms, as well as of distinct regularization methods. The assessment is performed on a large number of simulated datasets from topologies with distinct levels of complexity, various sample size and different rates of errors in the data. Among the main results, we show that the introduction of Suppes' constraints dramatically improve the inference accuracy, by reducing the solution space and providing a temporal ordering on the variables. We also report on trade-offs among different search techniques that can be efficiently employed in distinct experimental settings. This manuscript is an extended version of the paper "Structural Learning of Probabilistic Graphical Models of Cumulative Phenomena" presented at the 2018 International Conference on Computational Science

arXiv.org e-Print Archive

Repository TU/e

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

A Profile Likelihood Analysis of the Constrained MSSM with Genetic Algorithms

Author: A Abulencia
A Abulencia
A Brignole
A Djouadi
A Heister
A Heister
A Lewis
A Petiteau
A Yamaguchi
A Yamaguchi
A Yamaguchi
AB Lahanas
AE Eiben
AH Chamseddine
AR Liddle
BC Allanach
BC Allanach
BC Allanach
BC Allanach
BC Allanach
BC Allanach
BC Allanach
BC Allanach
BC Allanach
C Balázs
C Boehm
C Bogdanos
C Fernandez-Ramirez
C Fernandez-Ramirez
C Fernandez-Ramirez
C Winkler
DE Goldberg
DE Goldberg
DE Lopez-Fogliani
DJ Marshall
DJH Chung
EA Baltz
EA Baltz
F Feroz
G Bertone
G Bélanger
G Cowan
G Degrassi
G Jungman
GD Martinez
GG Ross
GL Kane
GM Hippel von
GM Hippel von
H Baer
H Baer
H Flacher
HP Nilles
I Aitchison
J Abdallah
J Angle
J Crowder
J Dunkley
J Liesenborgs
J Rojo
J Skilling
J Skilling
Jan Conrad
JF Markham
JH Holland
JL Feng
JL Feng
JL Feng
JM Link
JM Link
Joakim Edsjö
JP Miller
JR Ellis
JR Ellis
JR Ellis
JR Ellis
JR Ellis
JR Ellis
JR Ellis
JR Ellis
K Griest
K Hagiwara
KH Becks
KL Chan
L Bergstrom
L Bergström
L Roszkowski
L Roszkowski
L Roszkowski
L Roszkowski
L Teodorescu
L Álvarez-Gaumé
Lars Bergström
LJ Hall
M Affenzeller
M Drees
M Frank
M Hobson
M Mjahed
N Ohta
O Buchmueller
O Buchmueller
O Buchmueller
O Oliveira
O Oliveira
P Achard
P Bechtle
P Bechtle
P Charbonneau
P Gondolo
P Nath
P Nath
P Nath
P Nath
P Scott
P Scott
Pat Scott
R Barate
R Barbieri
R Berlich
R Lafaye
R Trotta
R Trotta
R Trotta
R Trotta
R Trotta
RL Arnowitt
RL Arnowitt
RL Arnowitt
RR Austri de
S Abdullin
S Heinemeyer
S Heinemeyer
S Janssen
SN Sivanandam
SS AbdusSalam
VD Barger
WA Rolke
WB Atwood
Yashar Akrami
Z Ahmed
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 23/03/2010
Field of study

The Constrained Minimal Supersymmetric Standard Model (CMSSM) is one of the simplest and most widely-studied supersymmetric extensions to the standard model of particle physics. Nevertheless, current data do not sufficiently constrain the model parameters in a way completely independent of priors, statistical measures and scanning techniques. We present a new technique for scanning supersymmetric parameter spaces, optimised for frequentist profile likelihood analyses and based on Genetic Algorithms. We apply this technique to the CMSSM, taking into account existing collider and cosmological data in our global fit. We compare our method to the MultiNest algorithm, an efficient Bayesian technique, paying particular attention to the best-fit points and implications for particle masses at the LHC and dark matter searches. Our global best-fit point lies in the focus point region. We find many high-likelihood points in both the stau co-annihilation and focus point regions, including a previously neglected section of the co-annihilation region at large m_0. We show that there are many high-likelihood points in the CMSSM parameter space commonly missed by existing scanning techniques, especially at high masses. This has a significant influence on the derived confidence regions for parameters and observables, and can dramatically change the entire statistical inference of such scans.Comment: 47 pages, 8 figures; Fig. 8, Table 7 and more discussions added to Sec. 3.4.2 in response to referee's comments; accepted for publication in JHE

arXiv.org e-Print Archive

University of Queensland eSpace

Automating biomedical data science through tree-based pipeline optimization

Author: Andrews Peter C.
Kidd La Creis
Lavender Nicole A.
Moore Jason H.
Olson Randal S.
Urbanowicz Ryan J.
Publication venue
Publication date: 27/01/2016
Field of study

Over the past decade, data science and machine learning has grown from a mysterious art form to a staple tool across a variety of fields in academia, business, and government. In this paper, we introduce the concept of tree-based pipeline optimization for automating one of the most tedious parts of machine learning---pipeline design. We implement a Tree-based Pipeline Optimization Tool (TPOT) and demonstrate its effectiveness on a series of simulated and real-world genetic data sets. In particular, we show that TPOT can build machine learning pipelines that achieve competitive classification accuracy and discover novel pipeline operators---such as synthetic feature constructors---that significantly improve classification accuracy on these data sets. We also highlight the current challenges to pipeline optimization, such as the tendency to produce pipelines that overfit the data, and suggest future research paths to overcome these challenges. As such, this work represents an early step toward fully automating machine learning pipeline design.Comment: 16 pages, 5 figures, to appear in EvoBIO 2016 proceeding

arXiv.org e-Print Archive

Scipedia

The Optimisation of Stochastic Grammars to Enable Cost-Effective Probabilistic Structural Testing

Author: Alexander Rob
Clark John Andrew
Hadley Mark Jason
Poulding Simon Marcus
Publication venue
Publication date: 01/01/2013
Field of study

The effectiveness of probabilistic structural testing depends on the characteristics of the probability distribution from which test inputs are sampled at random. Metaheuristic search has been shown to be a practical method of optimis- ing the characteristics of such distributions. However, the applicability of the existing search-based algorithm is lim- ited by the requirement that the software’s inputs must be a fixed number of numeric values. In this paper we relax this limitation by means of a new representation for the probability distribution. The repre- sentation is based on stochastic context-free grammars but incorporates two novel extensions: conditional production weights and the aggregation of terminal symbols represent- ing numeric values. We demonstrate that an algorithm which combines the new representation with hill-climbing search is able to effi- ciently derive probability distributions suitable for testing software with structurally-complex input domains

CiteSeerX