892 research outputs found
Fast Bayesian Optimization of Machine Learning Hyperparameters on Large Datasets
Bayesian optimization has become a successful tool for hyperparameter
optimization of machine learning algorithms, such as support vector machines or
deep neural networks. Despite its success, for large datasets, training and
validating a single configuration often takes hours, days, or even weeks, which
limits the achievable performance. To accelerate hyperparameter optimization,
we propose a generative model for the validation error as a function of
training set size, which is learned during the optimization process and allows
exploration of preliminary configurations on small subsets, by extrapolating to
the full dataset. We construct a Bayesian optimization procedure, dubbed
Fabolas, which models loss and training time as a function of dataset size and
automatically trades off high information gain about the global optimum against
computational cost. Experiments optimizing support vector machines and deep
neural networks show that Fabolas often finds high-quality solutions 10 to 100
times faster than other state-of-the-art Bayesian optimization methods or the
recently proposed bandit strategy Hyperband
Auto-Sklearn 2.0: The Next Generation
Automated Machine Learning, which supports practitioners and researchers with
the tedious task of manually designing machine learning pipelines, has recently
achieved substantial success. In this paper we introduce new Automated Machine
Learning (AutoML) techniques motivated by our winning submission to the second
ChaLearn AutoML challenge, PoSH Auto-sklearn. For this, we extend Auto-sklearn
with a new, simpler meta-learning technique, improve its way of handling
iterative algorithms and enhance it with a successful bandit strategy for
budget allocation. Furthermore, we go one step further and study the design
space of AutoML itself and propose a solution towards truly hand-free AutoML.
Together, these changes give rise to the next generation of our AutoML system,
Auto-sklearn (2.0). We verify the improvement by these additions in a large
experimental study on 39 AutoML benchmark datasets and conclude the paper by
comparing to Auto-sklearn (1.0), reducing the regret by up to a factor of five
Auto-Sklearn 2.0: Hands-free AutoML via Meta-Learning
Automated Machine Learning (AutoML) supports practitioners and researchers with the tedious task of designing machine learning pipelines and has recently achieved substantial success. In this paper, we introduce new AutoML approaches motivated by our winning submission to the second ChaLearn AutoML challenge. We develop PoSH Auto-sklearn, which enables AutoML systems to work well on large datasets under rigid time limits by using a new, simple and meta-feature-free meta-learning technique and by employing a successful bandit strategy for budget allocation. However, PoSH Auto-sklearn introduces even more ways of running AutoML and might make it harder for users to set it up correctly. Therefore, we also go one step further and study the design space of AutoML itself, proposing a solution towards truly hands-free AutoML. Together, these changes give rise to the next generation of our AutoML system, Auto-sklearn 2.0. We verify the improvements by these additions in an extensive experimental study on 39 AutoML benchmark datasets. We conclude the paper by comparing to other popular AutoML frameworks and Auto-sklearn 1.0, reducing the relative error by up to a factor of 4.5, and yielding a performance in 10 minutes that is substantially better than what Auto-sklearn 1.0 achieves within an hour
Detection of Circulating Tumour Cells from Blood of Breast Cancer Patients via RT-qPCR
Breast cancer is still the most frequent cause of cancer-related death in women worldwide. Often death is not caused only by the primary tumour itself, but also by metastatic lesions. Today it is largely accepted, that these remote metastases arise out of cells, which detach from the primary tumour, enter circulation, settle down at secondary sites in the body and are called Circulating Tumour Cells (CTCs). The occurrence of such minimal residual diseases in the blood of breast cancer patients is mostly linked to a worse prognosis for therapy outcome and overall survival. Due to their very low frequency, the detection of CTCs is, still a technical challenge. RT-qPCR as a highly sensitive method could be an approach for CTC-detection from peripheral blood of breast cancer patients. This assumption is based on the fact that CTCs are of epithelial origin and therefore express a different gene panel than surrounding blood cells. For the technical approach it is necessary to identify appropriate marker genes and to correlate their gene expression levels to the number of tumour cells within a sample in an in vitro approach. After that, samples from adjuvant and metastatic patients can be analysed. This approach may lead to new concepts in diagnosis and treatmen
Astraeus VIII: A new framework for Lyman- emitters applied to different reionisation scenarios
We use the {\sc astraeus} framework to investigate how the visibility and
spatial distribution of Lyman- (Ly) emitters (LAEs) during
reionisation is sensitive to a halo mass-dependent fraction of ionising
radiation escaping from the galactic environment () and the
ionisation topology. To this end, we consider the two physically plausible
bracketing scenarios of increasing and decreasing with rising
halo mass. We derive the corresponding observed Ly luminosities of
galaxies for three different analytic Ly line profiles and associated
Ly escape fraction () models:
importantly, we introduce two novel analytic Ly line profile models
that describe the surrounding interstellar medium (ISM) as dusty gas clumps.
They are based on parameterising results from radiative transfer simulations,
with one of them relating to
by assuming the ISM of being interspersed with low-density
tunnels. Our key findings are: (i) for dusty gas clumps, the Ly line
profile develops from a central to double peak profile as a galaxy's halo mass
increases; (ii) LAEs are galaxies with located in
overdense and highly ionised regions; (iii) for this reason, the spatial
distribution of LAEs is primarily sensitive to the global ionisation fraction
and only weakly in second-order to the ionisation topology or a halo
mass-dependent ; (iv) furthermore, as the observed Ly
luminosity functions reflect the Ly emission from more massive
galaxies, there is a degeneracy between the -dependent
intrinsic Ly luminosity and the Ly attenuation by dust in the
ISM if does not exceed .Comment: 25 pages, 9 figures; accepted for publication in MNRA
Astraeus VII: The environmental-dependent assembly of galaxies in the Epoch of Reionization
Using the ASTRAEUS (semi-numerical rAdiative tranSfer coupling of galaxy
formaTion and Reionization in N-body dark matter simUlationS) framework, we
explore the impact of environmental density and radiative feedback on the
assembly of galaxies and their host halos during the Epoch of Reionization. The
ASTRAEUS framework allows us to study the evolution of galaxies with masses
() in wide variety of
environment ( averaged over ). We find that : (i) there exists a mass- and redshift- dependent
"characteristic" environment (, up to ) at which
galaxies are most efficient at accreting dark matter, e.g at a rate of
of their mass every Myr at ; (ii) the number of minor and major mergers
and their contributions to the dark matter assembly increases with halo mass at
all redshifts and is mostly independent of the environment; (iii) at
minor mergers contribute slightly more (by up to ) to the dark
matter assembly while for the stellar assembly, major mergers dominate the
contribution from minor mergers for
galaxies; (iv) radiative feedback quenches star formation more in low-mass
galaxies () in over-dense environments
(); dominated by their major branch, this yields
star formation histories biased towards older ages with a slower redshift
evolution.Comment: 17 pages, 15 figures, submitted to MNRAS, comments welcome
Astraeus IV:Quantifying the star foation histories of galaxies in the Epoch of Reionization
We use the \textsc{astraeus} framework, that couples an N-body simulation
with a semi-analytic model for galaxy formation and a semi-numerical model for
reionization, to quantify the star formation histories (SFHs) of galaxies in
the first billion years. Exploring four models of radiative feedback, we fit
the SFH of each galaxy at as ; star formation is deemed stochastic if it deviates from this fit by
more than dex. Our key findings are: (i) The
fraction of stellar mass formed and time spent in the stochastic phase decrease
with increasing stellar mass and redshift . While galaxies with stellar
masses of at form of
their stellar mass in the stochastic phase, this reduces to at all
redshifts for galaxies with ; (ii) the fractional
mass assembled and lifetime spent in the stochastic phase do not significantly
change with the radiative feedback model used; (iii) at all redshifts,
increases (decreases for the strongest radiative feedback model) with stellar
mass for galaxies with and converges to
for more massive galaxies; always increases with stellar
mass. Our proposed fits can reliably recover the stellar masses and
mass-to-light ratios for galaxies with and
at . This physical model can therefore
be used to derive the SFHs for galaxies observed by a number of forthcoming
instruments.Comment: 19 pages, 14 figures, accepted for publication in MNRA
- …