892 research outputs found

    Fast Bayesian Optimization of Machine Learning Hyperparameters on Large Datasets

    Full text link
    Bayesian optimization has become a successful tool for hyperparameter optimization of machine learning algorithms, such as support vector machines or deep neural networks. Despite its success, for large datasets, training and validating a single configuration often takes hours, days, or even weeks, which limits the achievable performance. To accelerate hyperparameter optimization, we propose a generative model for the validation error as a function of training set size, which is learned during the optimization process and allows exploration of preliminary configurations on small subsets, by extrapolating to the full dataset. We construct a Bayesian optimization procedure, dubbed Fabolas, which models loss and training time as a function of dataset size and automatically trades off high information gain about the global optimum against computational cost. Experiments optimizing support vector machines and deep neural networks show that Fabolas often finds high-quality solutions 10 to 100 times faster than other state-of-the-art Bayesian optimization methods or the recently proposed bandit strategy Hyperband

    Auto-Sklearn 2.0: The Next Generation

    Full text link
    Automated Machine Learning, which supports practitioners and researchers with the tedious task of manually designing machine learning pipelines, has recently achieved substantial success. In this paper we introduce new Automated Machine Learning (AutoML) techniques motivated by our winning submission to the second ChaLearn AutoML challenge, PoSH Auto-sklearn. For this, we extend Auto-sklearn with a new, simpler meta-learning technique, improve its way of handling iterative algorithms and enhance it with a successful bandit strategy for budget allocation. Furthermore, we go one step further and study the design space of AutoML itself and propose a solution towards truly hand-free AutoML. Together, these changes give rise to the next generation of our AutoML system, Auto-sklearn (2.0). We verify the improvement by these additions in a large experimental study on 39 AutoML benchmark datasets and conclude the paper by comparing to Auto-sklearn (1.0), reducing the regret by up to a factor of five

    Auto-Sklearn 2.0: Hands-free AutoML via Meta-Learning

    Get PDF
    Automated Machine Learning (AutoML) supports practitioners and researchers with the tedious task of designing machine learning pipelines and has recently achieved substantial success. In this paper, we introduce new AutoML approaches motivated by our winning submission to the second ChaLearn AutoML challenge. We develop PoSH Auto-sklearn, which enables AutoML systems to work well on large datasets under rigid time limits by using a new, simple and meta-feature-free meta-learning technique and by employing a successful bandit strategy for budget allocation. However, PoSH Auto-sklearn introduces even more ways of running AutoML and might make it harder for users to set it up correctly. Therefore, we also go one step further and study the design space of AutoML itself, proposing a solution towards truly hands-free AutoML. Together, these changes give rise to the next generation of our AutoML system, Auto-sklearn 2.0. We verify the improvements by these additions in an extensive experimental study on 39 AutoML benchmark datasets. We conclude the paper by comparing to other popular AutoML frameworks and Auto-sklearn 1.0, reducing the relative error by up to a factor of 4.5, and yielding a performance in 10 minutes that is substantially better than what Auto-sklearn 1.0 achieves within an hour

    Detection of Circulating Tumour Cells from Blood of Breast Cancer Patients via RT-qPCR

    Get PDF
    Breast cancer is still the most frequent cause of cancer-related death in women worldwide. Often death is not caused only by the primary tumour itself, but also by metastatic lesions. Today it is largely accepted, that these remote metastases arise out of cells, which detach from the primary tumour, enter circulation, settle down at secondary sites in the body and are called Circulating Tumour Cells (CTCs). The occurrence of such minimal residual diseases in the blood of breast cancer patients is mostly linked to a worse prognosis for therapy outcome and overall survival. Due to their very low frequency, the detection of CTCs is, still a technical challenge. RT-qPCR as a highly sensitive method could be an approach for CTC-detection from peripheral blood of breast cancer patients. This assumption is based on the fact that CTCs are of epithelial origin and therefore express a different gene panel than surrounding blood cells. For the technical approach it is necessary to identify appropriate marker genes and to correlate their gene expression levels to the number of tumour cells within a sample in an in vitro approach. After that, samples from adjuvant and metastatic patients can be analysed. This approach may lead to new concepts in diagnosis and treatmen

    Astraeus VIII: A new framework for Lyman-α\alpha emitters applied to different reionisation scenarios

    Get PDF
    We use the {\sc astraeus} framework to investigate how the visibility and spatial distribution of Lyman-α\alpha (Lyα\alpha) emitters (LAEs) during reionisation is sensitive to a halo mass-dependent fraction of ionising radiation escaping from the galactic environment (fescf_\mathrm{esc}) and the ionisation topology. To this end, we consider the two physically plausible bracketing scenarios of fescf_\mathrm{esc} increasing and decreasing with rising halo mass. We derive the corresponding observed Lyα\alpha luminosities of galaxies for three different analytic Lyα\alpha line profiles and associated Lyα\alpha escape fraction (fescLyαf_\mathrm{esc}^\mathrm{Ly\alpha}) models: importantly, we introduce two novel analytic Lyα\alpha line profile models that describe the surrounding interstellar medium (ISM) as dusty gas clumps. They are based on parameterising results from radiative transfer simulations, with one of them relating fescLyαf_\mathrm{esc}^\mathrm{Ly\alpha} to fescf_\mathrm{esc} by assuming the ISM of being interspersed with low-density tunnels. Our key findings are: (i) for dusty gas clumps, the Lyα\alpha line profile develops from a central to double peak profile as a galaxy's halo mass increases; (ii) LAEs are galaxies with Mh1010MM_h\gtrsim10^{10}M_\odot located in overdense and highly ionised regions; (iii) for this reason, the spatial distribution of LAEs is primarily sensitive to the global ionisation fraction and only weakly in second-order to the ionisation topology or a halo mass-dependent fescf_\mathrm{esc}; (iv) furthermore, as the observed Lyα\alpha luminosity functions reflect the Lyα\alpha emission from more massive galaxies, there is a degeneracy between the fescf_\mathrm{esc}-dependent intrinsic Lyα\alpha luminosity and the Lyα\alpha attenuation by dust in the ISM if fescf_\mathrm{esc} does not exceed 50%\sim50\%.Comment: 25 pages, 9 figures; accepted for publication in MNRA

    Astraeus VII: The environmental-dependent assembly of galaxies in the Epoch of Reionization

    Get PDF
    Using the ASTRAEUS (semi-numerical rAdiative tranSfer coupling of galaxy formaTion and Reionization in N-body dark matter simUlationS) framework, we explore the impact of environmental density and radiative feedback on the assembly of galaxies and their host halos during the Epoch of Reionization. The ASTRAEUS framework allows us to study the evolution of galaxies with masses (108.2M<Mh<1013M\rm 10^{8.2}M_\odot < M_{\rm h} < 10^{13}M_\odot) in wide variety of environment (0.5<log(1+δ)<1.3-0.5 < {\rm log}(1+\delta) < 1.3 averaged over (2 cMpc)3(2~{\rm cMpc})^3). We find that : (i) there exists a mass- and redshift- dependent "characteristic" environment (log(1+δa(Mh,z))=0.021×Mh0.16+0.07z1.12{\rm log} (1+\delta_a(M_{\rm h}, z)) = 0.021\times M_{\rm h}^{0.16} + 0.07 z -1.12, up to z10z\sim 10) at which galaxies are most efficient at accreting dark matter, e.g at a rate of 0.2%0.2\% of their mass every Myr at z=5z=5; (ii) the number of minor and major mergers and their contributions to the dark matter assembly increases with halo mass at all redshifts and is mostly independent of the environment; (iii) at z=5z=5 minor mergers contribute slightly more (by up to 10%\sim 10\%) to the dark matter assembly while for the stellar assembly, major mergers dominate the contribution from minor mergers for Mh1011.5MM_{\rm h}\lesssim 10^{11.5}M_\odot galaxies; (iv) radiative feedback quenches star formation more in low-mass galaxies (Mh109.5MM_{\rm h} \lesssim 10^{9.5}M_\odot) in over-dense environments (log(1+δ)>0.5{\rm log}(1+\delta) > 0.5); dominated by their major branch, this yields star formation histories biased towards older ages with a slower redshift evolution.Comment: 17 pages, 15 figures, submitted to MNRAS, comments welcome

    Astraeus IV:Quantifying the star foation histories of galaxies in the Epoch of Reionization

    Get PDF
    We use the \textsc{astraeus} framework, that couples an N-body simulation with a semi-analytic model for galaxy formation and a semi-numerical model for reionization, to quantify the star formation histories (SFHs) of galaxies in the first billion years. Exploring four models of radiative feedback, we fit the SFH of each galaxy at z>5z>5 as log(SFR(z))=α(1+z)+β\mathrm{log}(\mathrm{SFR}(z))=-\alpha(1 + z)+\beta; star formation is deemed stochastic if it deviates from this fit by more than ΔSFR=0.6\Delta_\mathrm{SFR}=0.6\,dex. Our key findings are: (i) The fraction of stellar mass formed and time spent in the stochastic phase decrease with increasing stellar mass and redshift zz. While galaxies with stellar masses of M107MM_\star\sim10^7M_\odot at z5 (10)z\sim5~(10) form 70% (20%)\sim70\%~(20\%) of their stellar mass in the stochastic phase, this reduces to <10%<10\% at all redshifts for galaxies with M>1010MM_\star > 10^{10}M_\odot; (ii) the fractional mass assembled and lifetime spent in the stochastic phase do not significantly change with the radiative feedback model used; (iii) at all redshifts, α\alpha increases (decreases for the strongest radiative feedback model) with stellar mass for galaxies with M108.5MM_\star\lesssim 10^{8.5}M_\odot and converges to 0.18\sim0.18 for more massive galaxies; β\beta always increases with stellar mass. Our proposed fits can reliably recover the stellar masses and mass-to-light ratios for galaxies with M10810.5MM_\star\sim10^{8-10.5}M_\odot and MUV17 to 23M_{UV}\sim-17~{\rm to}~-23 at z59z\sim 5-9. This physical model can therefore be used to derive the SFHs for galaxies observed by a number of forthcoming instruments.Comment: 19 pages, 14 figures, accepted for publication in MNRA
    corecore