213 research outputs found

    A Formal Framework for Probabilistic Unclean Databases

    Get PDF
    Most theoretical frameworks that focus on data errors and inconsistencies follow logic-based reasoning. Yet, practical data cleaning tools need to incorporate statistical reasoning to be effective in real-world data cleaning tasks. Motivated by empirical successes, we propose a formal framework for unclean databases, where two types of statistical knowledge are incorporated: The first represents a belief of how intended (clean) data is generated, and the second represents a belief of how noise is introduced in the actual observed database. To capture this noisy channel model, we introduce the concept of a Probabilistic Unclean Database (PUD), a triple that consists of a probabilistic database that we call the intention, a probabilistic data transformator that we call the realization and captures how noise is introduced, and an observed unclean database that we call the observation. We define three computational problems in the PUD framework: cleaning (infer the most probable intended database, given a PUD), probabilistic query answering (compute the probability of an answer tuple over the unclean observed database), and learning (estimate the most likely intention and realization models of a PUD, given examples as training data). We illustrate the PUD framework on concrete representations of the intention and realization, show that they generalize traditional concepts of repairs such as cardinality and value repairs, draw connections to consistent query answering, and prove tractability results. We further show that parameters can be learned in some practical instantiations, and in fact, prove that under certain conditions we can learn a PUD directly from a single dirty database without any need for clean examples

    The Lyman-continuum-leaking super star cluster in the Sunburst Arc and its surrounding nebula

    Get PDF
    Strong lensing offers a precious opportunity for studying the formation and early evolution of super star clusters that are rare in our cosmic backyard. The Sunburst Arc, a lensed Cosmic Noon galaxy, hosts a young super star cluster with escaping Lyman continuum radiation. Analyzing archival HST images and emission line data from VLT/MUSE and X-shooter, we construct a physical model for the cluster and its surrounding photoionized nebula. We confirm that the cluster is \sim 3\mbox{--}4\,Myr old, is extremely massive M107MM_\star \sim 10^7\,M_\odot and yet has a central component as compact as several parsecs, and we find a metallicity Z=(0.26±0.03)ZZ=(0.26\pm0.03)\,Z_\odot. The cluster is surrounded by 105M\gtrsim 10^5\,M_\odot of dense clouds that have been pressurized to P109Kcm3P\sim 10^9\,{\rm K}\,{\rm cm}^{-3} by perhaps stellar radiation at within ten parsecs. These should have large neutral columns NHI>1022.5cm2N_{\rm HI} > 10^{22.5}\,{\rm cm}^{-2} to survive rapid ejection by radiation pressure. The clouds are likely dusty as they show gas-phase depletion of silicon, and may be conducive to secondary star formation if NHI>1024cm2N_{\rm HI} > 10^{24}\,{\rm cm}^{-2} or if they sink further toward the cluster center. Detecting strong NIII]λλ{\rm N III]}\lambda\lambda1750,1752, we infer heavy nitrogen enrichment log(N/O)=0.230.11+0.08\log({\rm N/O})=-0.23^{+0.08}_{-0.11}. This requires efficiently retaining 500M\gtrsim 500\,M_\odot of nitrogen in the high-pressure clouds from massive stars heavier than 60M60\,M_\odot up to 4 Myr. We suggest a physical origin of the high-pressure clouds from partial or complete condensation of slow massive star ejecta, which may have important implication for the puzzle of multiple stellar populations in globular clusters.Comment: 25 pages, 10 figure

    Immune tolerance maintained by cooperative interactions between T cells and antigen presenting cells shapes a diverse TCR repertoire

    Get PDF
    The T cell population in an individual needs to avoid harmful activation by self-peptides while maintaining the ability to respond to an unknown set of foreign peptides. This property is acquired by a combination of thymic and extra-thymic mechanisms. We extend current models for the development of self/non-self discrimination to consider the acquisition of self-tolerance as an emergent system level property of the overall T cell receptor repertoire. We propose that tolerance is established at the level of the antigen presenting cell/T cell cluster, which facilitates and integrates co-operative interactions between T cells of different specificity. The threshold for self-reactivity is therefore imposed at a population level, and not at the level of the individual T cell/antigen encounter. Mathematically, the model can be formulated as a linear programming optimisation problem, which can be implemented as a multiplicative update algorithm which shows a rapid convergence to a stable state. The model constrains self-reactivity within a predefined threshold, but maintains the diversity and cross reactivity which are key characteristics of human T cell immunity. We show further that the size of individual clones in the model repertoire remains heterogeneous, and that new clones can establish themselves even when the repertoire is stable. Our study combines the salient features of the danger model of self/non-self discrimination with the concepts of quorum sensing, and extends repertoire generation models to encompass the establishment of tolerance. Furthermore, the dynamic and continuous repertoire reshaping which underlies tolerance in this model suggests opportunities for therapeutic intervention to achieve long-term tolerance following transplantation

    Computed Pre-reactive Complex Association Lifetimes Explain Trends in Experimental Reaction Rates for Peroxy Radical Recombinations

    Get PDF
    The lifetimes of pre-reactive complexes, although implicitly part of the equations used to model many gas-phase bimolecular reactions, have seldom been included in quantitative calculations of rate coefficients. Here, we demonstrate the application of empirical molecular dynamics simulations of collisions between peroxy radicals to model association lifetimes. With the exception of the methyl peroxy−acetyl peroxy system, measurements of the lifetimes based on a phenomenological model are shown to correlate well with available experimental data for recombination reactions of peroxy radicals in cases where the rate-limiting transition state lies below the reactants in energy. Further, we predict reaction rates for larger α-pinene-derived peroxy radicals, and we interpret our results in tandem with available experimental data on these systems, which are of great relevance to improve our understanding of atmospheric aerosol formation.Peer reviewe

    Two-dimensional NMR spectral studies of some 2,6-diarylpiperidin-4-ones

    Get PDF
    662-66

    Editorial: Safeguarding youth from agricultural injury and illness: international experiences

    Get PDF
    [Extract] Worldwide, agriculture is among the most dangerous industries and one of the few that involves children (<18 years-of-age) in the worksite as laborers or bystanders. Children are exposed to an array of agriculture-related hazards whether working or merely being present in the farm environment. From a public health and child advocacy perspective, safeguarding these young people from preventable disease and injury is important for many reasons. The negative impacts of a childhood agricultural disease or injury range from permanent disabilities, death, family disruptions, and economic hardships including the potential loss of a sustainable family farm enterprise. At the same time, growing up in an agricultural setting can lead to independent, hardworking, successful adults, who gain a range of benefits, including skill development, family time together, improved immune response, and other protective health factors

    Towards a barnacle tree of life:integrating diverse phylogenetic efforts into a comprehensive hypothesis of thecostracan evolution

    Get PDF
    Barnacles and their allies (Thecostraca) are a biologically diverse, monophyletic crustacean group, which includes both intensely studied taxa, such as the acorn and stalked barnacles, as well as cryptic taxa, for example, Facetotecta. Recent efforts have clarified phylogenetic relationships in many different parts of the barnacle tree, but the outcomes of these phylogenetic studies have not yet been combined into a single hypothesis for all barnacles. In the present study, we applied a new “synthesis” tree approach to estimate the first working Barnacle Tree of Life. Using this approach, we integrated phylogenetic hypotheses from 27 studies, which did not necessarily include the same taxa or used the same characters, with hierarchical taxonomic information for all recognized species. This first synthesis tree contains 2,070 barnacle species and subspecies, including 239 barnacle species with phylogenetic information and 198 undescribed or unidentified species. The tree had 442 bifurcating nodes, indicating that 79.3% of all nodes are still unresolved. We found that the acorn and stalked barnacles, the Thoracica, and the parasitic Rhizocephala have the largest amount of published phylogenetic information. About half of the thecostracan families for which phylogenetic information was available were polyphyletic. We queried publicly available geographic occurrence databases for the group, gaining a sense of geographic gaps and hotspots in our phylogenetic knowledge. Phylogenetic information is especially lacking for deep sea and Arctic taxa, but even coastal species are not fully incorporated into phylogenetic studies.publishedVersio
    corecore