309 research outputs found

    Enabling Quality Control for Entity Resolution: A Human and Machine Cooperation Framework

    Full text link
    Even though many machine algorithms have been proposed for entity resolution, it remains very challenging to find a solution with quality guarantees. In this paper, we propose a novel HUman and Machine cOoperation (HUMO) framework for entity resolution (ER), which divides an ER workload between the machine and the human. HUMO enables a mechanism for quality control that can flexibly enforce both precision and recall levels. We introduce the optimization problem of HUMO, minimizing human cost given a quality requirement, and then present three optimization approaches: a conservative baseline one purely based on the monotonicity assumption of precision, a more aggressive one based on sampling and a hybrid one that can take advantage of the strengths of both previous approaches. Finally, we demonstrate by extensive experiments on real and synthetic datasets that HUMO can achieve high-quality results with reasonable return on investment (ROI) in terms of human cost, and it performs considerably better than the state-of-the-art alternatives in quality control.Comment: 12 pages, 11 figures. Camera-ready version of the paper submitted to ICDE 2018, In Proceedings of the 34th IEEE International Conference on Data Engineering (ICDE 2018

    Thermoplasmatales and Methanogens: Potential Association with the Crenarchaeol Production in Chinese Soils

    Get PDF
    Crenarchaeol is a unique isoprenoid glycerol dibiphytanyl glycerol tetraether (iGDGT) lipid, which is only identified in cultures of ammonia-oxidizing Thaumarchaeota. However, the taxonomic origins of crenarchaeol have been debated recently. The archaeal populations, other than Thaumarchaeota, may have associations with the production of crenarchaeol in ecosystems characterized by non-thaumarchaeotal microorganisms. To this end, we investigated 47 surface soils from upland and wetland soils and rice fields and another three surface sediments from river banks. The goal was to examine the archaeal community compositions in comparison with patterns of iGDGTs in four fractional forms (intact polar-, core-, monoglycosidic- and diglycosidic-lipid fractions) along gradients of environments. The DistLM analysis identified that Group I.1b Thaumarchaeota were mainly responsible for changes in crenarchaeol in the overall soil samples; however, Thermoplasmatales may also contribute to it. This is further supported by the comparison of crenarchaeol between samples characterized by methanogens, Thermoplasmatales or Group I.1b Thaumarchaeota, which suggests that the former two may contribute to the crenarchaeol pool. Last, when samples containing enhanced abundance of Thermoplasmatales and methanogens were considered, crenarchaeol was observed to correlate positively with Thermoplasmatales and archaeol, respectively. Collectively, our data suggest that the crenarchaeol production is mainly derived from Thaumarchaeota and partly associated with uncultured representatives of Thermoplasmatales and archaeol-producing methanogens in soil environments that may be in favor of their growth. Our finding supports the notion that Thaumarchaeota may not be the sole source of crenarchaeol in the natural environment, which may have implication for the evolution of lipid synthesis among different types of archaea

    Orbit- and Atom-Resolved Spin Textures of Intrinsic, Extrinsic and Hybridized Dirac Cone States

    Full text link
    Combining first-principles calculations and spin- and angle-resolved photoemission spectroscopy measurements, we identify the helical spin textures for three different Dirac cone states in the interfaced systems of a 2D topological insulator (TI) of Bi(111) bilayer and a 3D TI Bi2Se3 or Bi2Te3. The spin texture is found to be the same for the intrinsic Dirac cone of Bi2Se3 or Bi2Te3 surface state, the extrinsic Dirac cone of Bi bilayer state induced by Rashba effect, and the hybridized Dirac cone between the former two states. Further orbit- and atom-resolved analysis shows that s and pz orbits have a clockwise (counterclockwise) spin rotation tangent to the iso-energy contour of upper (lower) Dirac cone, while px and py orbits have an additional radial spin component. The Dirac cone states may reside on different atomic layers, but have the same spin texture. Our results suggest that the unique spin texture of Dirac cone states is a signature property of spin-orbit coupling, independent of topology

    Age Is Important for the Early-Stage Detection of Breast Cancer on Both Transcriptomic and Methylomic Biomarkers

    Get PDF
    Patients at different ages have different rates of cell development and metabolisms. As a result, age should be an essential part of how a disease diagnosis model is trained and optimized. Unfortunately, most of the existing studies have not taken age into account. This study demonstrated that disease diagnosis models could be improved by merely applying individual models for patients of different age groups. Both transcriptomes and methylomes of the TCGA breast cancer dataset (TCGA-BRCA) were utilized for the analysis procedure of feature selection and classification. Our experimental data strongly suggested that disease diagnosis modeling should integrate patient age into the whole experimental design
    • …
    corecore