309 research outputs found
Enabling Quality Control for Entity Resolution: A Human and Machine Cooperation Framework
Even though many machine algorithms have been proposed for entity resolution,
it remains very challenging to find a solution with quality guarantees. In this
paper, we propose a novel HUman and Machine cOoperation (HUMO) framework for
entity resolution (ER), which divides an ER workload between the machine and
the human. HUMO enables a mechanism for quality control that can flexibly
enforce both precision and recall levels. We introduce the optimization problem
of HUMO, minimizing human cost given a quality requirement, and then present
three optimization approaches: a conservative baseline one purely based on the
monotonicity assumption of precision, a more aggressive one based on sampling
and a hybrid one that can take advantage of the strengths of both previous
approaches. Finally, we demonstrate by extensive experiments on real and
synthetic datasets that HUMO can achieve high-quality results with reasonable
return on investment (ROI) in terms of human cost, and it performs considerably
better than the state-of-the-art alternatives in quality control.Comment: 12 pages, 11 figures. Camera-ready version of the paper submitted to
ICDE 2018, In Proceedings of the 34th IEEE International Conference on Data
Engineering (ICDE 2018
Thermoplasmatales and Methanogens: Potential Association with the Crenarchaeol Production in Chinese Soils
Crenarchaeol is a unique isoprenoid glycerol dibiphytanyl glycerol tetraether (iGDGT) lipid, which is only identified in cultures of ammonia-oxidizing Thaumarchaeota. However, the taxonomic origins of crenarchaeol have been debated recently. The archaeal populations, other than Thaumarchaeota, may have associations with the production of crenarchaeol in ecosystems characterized by non-thaumarchaeotal microorganisms. To this end, we investigated 47 surface soils from upland and wetland soils and rice fields and another three surface sediments from river banks. The goal was to examine the archaeal community compositions in comparison with patterns of iGDGTs in four fractional forms (intact polar-, core-, monoglycosidic- and diglycosidic-lipid fractions) along gradients of environments. The DistLM analysis identified that Group I.1b Thaumarchaeota were mainly responsible for changes in crenarchaeol in the overall soil samples; however, Thermoplasmatales may also contribute to it. This is further supported by the comparison of crenarchaeol between samples characterized by methanogens, Thermoplasmatales or Group I.1b Thaumarchaeota, which suggests that the former two may contribute to the crenarchaeol pool. Last, when samples containing enhanced abundance of Thermoplasmatales and methanogens were considered, crenarchaeol was observed to correlate positively with Thermoplasmatales and archaeol, respectively. Collectively, our data suggest that the crenarchaeol production is mainly derived from Thaumarchaeota and partly associated with uncultured representatives of Thermoplasmatales and archaeol-producing methanogens in soil environments that may be in favor of their growth. Our finding supports the notion that Thaumarchaeota may not be the sole source of crenarchaeol in the natural environment, which may have implication for the evolution of lipid synthesis among different types of archaea
Orbit- and Atom-Resolved Spin Textures of Intrinsic, Extrinsic and Hybridized Dirac Cone States
Combining first-principles calculations and spin- and angle-resolved
photoemission spectroscopy measurements, we identify the helical spin textures
for three different Dirac cone states in the interfaced systems of a 2D
topological insulator (TI) of Bi(111) bilayer and a 3D TI Bi2Se3 or Bi2Te3. The
spin texture is found to be the same for the intrinsic Dirac cone of Bi2Se3 or
Bi2Te3 surface state, the extrinsic Dirac cone of Bi bilayer state induced by
Rashba effect, and the hybridized Dirac cone between the former two states.
Further orbit- and atom-resolved analysis shows that s and pz orbits have a
clockwise (counterclockwise) spin rotation tangent to the iso-energy contour of
upper (lower) Dirac cone, while px and py orbits have an additional radial spin
component. The Dirac cone states may reside on different atomic layers, but
have the same spin texture. Our results suggest that the unique spin texture of
Dirac cone states is a signature property of spin-orbit coupling, independent
of topology
Age Is Important for the Early-Stage Detection of Breast Cancer on Both Transcriptomic and Methylomic Biomarkers
Patients at different ages have different rates of cell development and metabolisms. As a result, age should be an essential part of how a disease diagnosis model is trained and optimized. Unfortunately, most of the existing studies have not taken age into account. This study demonstrated that disease diagnosis models could be improved by merely applying individual models for patients of different age groups. Both transcriptomes and methylomes of the TCGA breast cancer dataset (TCGA-BRCA) were utilized for the analysis procedure of feature selection and classification. Our experimental data strongly suggested that disease diagnosis modeling should integrate patient age into the whole experimental design
- …