46 research outputs found

    Generalizability in Document Layout Analysis for Scientific Article Figure & Caption Extraction

    Full text link
    The lack of generalizability -- in which a model trained on one dataset cannot provide accurate results for a different dataset -- is a known problem in the field of document layout analysis. Thus, when a model is used to locate important page objects in scientific literature such as figures, tables, captions, and math formulas, the model often cannot be applied successfully to new domains. While several solutions have been proposed, including newer and updated deep learning models, larger hand-annotated datasets, and the generation of large synthetic datasets, so far there is no "magic bullet" for translating a model trained on a particular domain or historical time period to a new field. Here we present our ongoing work in translating our document layout analysis model from the historical astrophysical literature to the larger corpus of scientific documents within the HathiTrust U.S. Federal Documents collection. We use this example as an avenue to highlight some of the problems with generalizability in the document layout analysis community and discuss several challenges and possible solutions to address these issues. All code for this work is available on The Reading Time Machine GitHub repository (https://github.com/ReadingTimeMachine/htrc_short_conf).Comment: 9 pages, 3 figures, submitted as part of AEOLIAN Workshop 5: Making More Sense With Machines: AI/ML Methods for Interrogating and Understanding Our Textual Heritage in the Humanities, Natural Sciences, and Social Science

    First results from the IllustrisTNG simulations: the stellar mass content of groups and clusters of galaxies

    Full text link
    The IllustrisTNG project is a new suite of cosmological magneto-hydrodynamical simulations of galaxy formation performed with the Arepo code and updated models for feedback physics. Here we introduce the first two simulations of the series, TNG100 and TNG300, and quantify the stellar mass content of about 4000 massive galaxy groups and clusters (1013M200c/Msun101510^{13} \leq M_{\rm 200c}/M_{\rm sun} \leq 10^{15}) at recent times (z1z \leq 1). The richest clusters have half of their total stellar mass bound to satellite galaxies, with the other half being associated with the central galaxy and the diffuse intra-cluster light. The exact ICL fraction depends sensitively on the definition of a central galaxy's mass and varies in our most massive clusters between 20 to 40% of the total stellar mass. Haloes of 5×1014Msun5\times 10^{14}M_{\rm sun} and above have more diffuse stellar mass outside 100 kpc than within 100 kpc, with power-law slopes of the radial mass density distribution as shallow as the dark matter's ( 3.5<α3D<3-3.5 < \alpha_{\rm 3D} < -3). Total halo mass is a very good predictor of stellar mass, and vice versa: at z=0z=0, the 3D stellar mass measured within 30 kpc scales as (M500c)0.49\propto (M_{\rm 500c})^{0.49} with a 0.12\sim 0.12 dex scatter. This is possibly too steep in comparison to the available observational constraints, even though the abundance of TNG less massive galaxies (<1011Msun< 10^{11}M_{\rm sun} in stars) is in good agreement with the measured galaxy stellar mass functions at recent epochs. The 3D sizes of massive galaxies fall too on a tight (\sim0.16 dex scatter) power-law relation with halo mass, with r0.5stars(M500c)0.53r^{\rm stars}_{\rm 0.5} \propto (M_{\rm 500c})^{0.53}. Even more fundamentally, halo mass alone is a good predictor for the whole stellar mass profiles beyond the inner few kpc, and we show how on average these can be precisely recovered given a single mass measurement of the galaxy or its halo.Comment: Accepted by MNRAS, updated to match published version. Highlights: Figures 5, 9, 11. The IllustrisTNG website can be found at http://www.tng-project.org

    Supermassive black holes and their feedback effects in the IllustrisTNG simulation

    Full text link
    We study the population of supermassive black holes (SMBHs) and their effects on massive central galaxies in the IllustrisTNG cosmological hydrodynamical simulations of galaxy formation. The employed model for SMBH growth and feedback assumes a two-mode scenario in which the feedback from active galactic nuclei occurs through a kinetic, comparatively efficient mode at low accretion rates relative to the Eddington limit, and in the form of a thermal, less efficient mode at high accretion rates. We show that the quenching of massive central galaxies happens coincidently with kinetic-mode feedback, consistent with the notion that active supermassive black cause the low specific star formation rates observed in massive galaxies. However, major galaxy mergers are not responsible for initiating most of the quenching events in our model. Up to black hole masses of about 108.5M10^{8.5}\,{\rm M}_\odot, the dominant growth channel for SMBHs is in the thermal mode. Higher mass black holes stay mainly in the kinetic mode and gas accretion is self-regulated via their feedback, which causes their Eddington ratios to drop, with SMBH mergers becoming the main channel for residual mass growth. As a consequence, the quasar luminosity function is dominated by rapidly accreting, moderately massive black holes in the thermal mode. We show that the associated growth history of SMBHs produces a low-redshift quasar luminosity function and a redshift zero black hole mass-stellar bulge mass relation in good agreement with observations, whereas the simulation tends to over-predict the high-redshift quasar luminosity function.Comment: 16 pages, 11 figures, submitted to MNRAS, the IllustrisTNG project website: www.tng-project.org, comments welcom

    First results from the IllustrisTNG simulations: radio haloes and magnetic fields

    Full text link
    We introduce the IllustrisTNG project, a new suite of cosmological magnetohydrodynamical simulations performed with the moving-mesh code AREPO employing an updated Illustris galaxy formation model. Here we focus on the general properties of magnetic fields and the diffuse radio emission in galaxy clusters. Magnetic fields are prevalent in galaxies, and their build-up is closely linked to structure formation. We find that structure formation amplifies the initial seed fields (101410^{-14} comoving Gauss) to the values observed in low-redshift galaxies (110μG1-10\,\mu{\rm G}). The magnetic field topology is closely connected to galaxy morphology such that irregular fields are hosted by early-type galaxies, while large-scale, ordered fields are present in disc galaxies. Using two simple models for the energy distribution of relativistic electrons we predict the diffuse radio emission of 280280 clusters with a baryonic mass resolution of 1.1×107M1.1\times 10^{7}\,{\rm M_{\odot}}, and generate mock observations for VLA, LOFAR, ASKAP and SKA. Our simulated clusters show extended radio emission, whose detectability correlates with their virial mass. We reproduce the observed scaling relations between total radio power and X-ray emission, M500M_{500}, and the Sunyaev-Zel'dovich Y500Y_{\rm 500} parameter. The radio emission surface brightness profiles of our most massive clusters are in reasonable agreement with VLA measurements of Coma and Perseus. Finally, we discuss the fraction of detected extended radio haloes as a function of virial mass and source count functions for different instruments. Overall our results agree encouragingly well with observations, but a refined analysis requires a more sophisticated treatment of relativistic particles in large-scale galaxy formation simulations.Comment: 28 pages, 18 figures, 2 tables, 3 appendices. Added a new relativistic electron energy parametrization and text modifications to match the accepted version for publication in MNRAS. More information, images and movies of the IllustrisTNG project can be found at http://www.tng-project.or
    corecore