144 research outputs found

    Annotating Web Tables with the Crowd

    Get PDF
    The Web contains a large amount of structured tables, most of which lacks header rows. Algorithmic approaches have been proposed to recover semantics for web tables by annotating column labels and identifying subject columns. However, state-of-the-art technology is not yet able to provide satisfactory accuracy and recall. In this paper, we present a hybrid machine-crowdsourcing framework that leverages human intelligence to improve the performance of web table annotation. In this framework, machine-based algorithms are used to prompt human workers with candidate lists of concepts, while an improved K-means algorithm based on novel integrative distance is proposed to minimize the number of tuples posed to the crowd. In order to recommend the most related tasks for human workers and determine the final answers more accurately, an evaluation mechanism is also implemented based on Answer Credibility which measures the probability of a worker's intuitive answer being the final answer for a task. The results of extensive experiments conducted on real-world datasets show that our framework can significantly improve annotation accuracy and time efficiency for web tables, and our task reduction and answer evaluation mechanism is effective and efficient for improving answer quality

    Development of a Chemically Defined Medium for Better Yield and Purification of Enterocin Y31 from Enterococcus faecium

    Get PDF
    The macro- and micronutrients in traditional medium, such as MRS, used for cultivating lactic acid bacteria, especially for bacteriocin production, have not been defined, preventing the quantitative monitoring of metabolic flux during bacteriocin biosynthesis. To enhance Enterocin Y31 production and simplify steps of separation and purification, we developed a simplified chemically defined medium (SDM) for the growth of Enterococcus faecium Y31 and production of its bacteriocin, Enterocin Y31. We found that the bacterial growth was unrelated to Enterocin Y31 production in MRS; therefore, both the growth rate and the Enterocin Y31 production were set as the index for investigation. Single omission experiments revealed that 5 g/L NaCl, five vitamins, two nucleic acid bases, MgSO4·7H2O, MnSO4·4H2O, KH2PO4, K2HPO4, CH3COONa, fourteen amino acids, and glucose were essential for the strain’s growth and Enterocin Y31 production. Thus, a novel simplified and defined medium (SDM) was formulated with 30 components in total. Consequently, Enterocin Y31 production yield was higher in SDM as compared to either MRS or CDM. SDM improved the Enterocin Y31 production and simplified the steps of purification (only two steps), which has broad potential applications

    Channel-Wise Contrastive Learning for Learning with Noisy Labels

    Full text link
    In real-world datasets, noisy labels are pervasive. The challenge of learning with noisy labels (LNL) is to train a classifier that discerns the actual classes from given instances. For this, the model must identify features indicative of the authentic labels. While research indicates that genuine label information is embedded in the learned features of even inaccurately labeled data, it's often intertwined with noise, complicating its direct application. Addressing this, we introduce channel-wise contrastive learning (CWCL). This method distinguishes authentic label information from noise by undertaking contrastive learning across diverse channels. Unlike conventional instance-wise contrastive learning (IWCL), CWCL tends to yield more nuanced and resilient features aligned with the authentic labels. Our strategy is twofold: firstly, using CWCL to extract pertinent features to identify cleanly labeled samples, and secondly, progressively fine-tuning using these samples. Evaluations on several benchmark datasets validate our method's superiority over existing approaches

    Discovering Foreign Keys on Web Tables with the Crowd

    Get PDF
    Foreign-key relationship is one of the most important constraints between two tables. Previous works focused on detecting inclusion dependencies (INDs) or foreign keys in relational database. To discover foreign-key relationship is obviously helpful for analyzing and integrating data in web tables. However, because of poor quality of web tables, it is difficult to discover foreign keys by existing techniques based on checking basic integrity constraints. In this paper, we propose a hybrid human-machine framework to detect foreign keys on web tables. After discovering candidates and evaluating their confidence of being true foreign keys by machine algorithm, we verify those candidates leveraging the power of the crowd. To reduce the monetary cost, a dynamical task selection technique based on conflict detection and inclusion dependency is proposed, which could eliminate redundant tasks and assign the most valuable tasks to workers. Additionally, to make workers complete tasks more effectively and efficiently, sampling strategy is applied to minimize the number of tuples posed to the crowd. We conducted extensive experiments on real-world datasets and results show that our framework can obviously improve foreign key detection accuracy on web tables with lower monetary cost and time cost

    The SITE-100 Project: Site-Based Biodiversity Genomics for Species Discovery, Community Ecology, and a Global Tree-of-Life

    Get PDF
    Most insect communities are composed of evolutionarily diverse lineages, but detailed phylogenetic analyses of whole communities are lacking, in particular in species-rich tropical faunas. Likewise, our knowledge of the Tree-of-Life to document evolutionary diversity of organisms remains highly incomplete and especially requires the inclusion of unstudied lineages from species-rich ecosystems. Here we present the SITE-100 program, which is an attempt at building the Tree-of-Life from whole-community sampling of high-biodiversity sites around the globe. Combining the local site-based sets into a global tree produces an increasingly comprehensive estimate of organismal phylogeny, while also re-tracing evolutionary history of lineages constituting the local community. Local sets are collected in bulk in standardized passive traps and imaged with large-scale high-resolution cameras, which is followed by a parataxonomy step for the preliminary separation of morphospecies and selection of specimens for phylogenetic analysis. Selected specimens are used for individual DNA extraction and sequencing, usually to sequence mitochondrial genomes. All remaining specimens are bulk extracted and subjected to metabarcoding. Phylogenetic analysis on the mitogenomes produces a reference tree to which short barcode sequences are added in a secondary analysis using phylogenetic placement methods or backbone constrained tree searches. However, the approach may be hampered because (1) mitogenomes are limited in phylogenetic informativeness, and (2) site-based sampling may produce poor taxon coverage which causes challenges for phylogenetic inference. To mitigate these problems, we first assemble nuclear shotgun data from taxonomically chosen lineages to resolve the base of the tree, and add site-based mitogenome and DNA barcode data in three hierarchical steps. We posit that site-based sampling, though not meeting the criterion of “taxon-completeness,” has great merits given preliminary studies showing representativeness and evenness of taxa sampled. We therefore argue in favor of site-based sampling as an unorthodox but logistically efficient way to construct large phylogenetic trees.Copyright © 2022 Bian, Garner, Liu and Vogler. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. The attached file is the published version of the article

    Unleashing the Potential of Regularization Strategies in Learning with Noisy Labels

    Full text link
    In recent years, research on learning with noisy labels has focused on devising novel algorithms that can achieve robustness to noisy training labels while generalizing to clean data. These algorithms often incorporate sophisticated techniques, such as noise modeling, label correction, and co-training. In this study, we demonstrate that a simple baseline using cross-entropy loss, combined with widely used regularization strategies like learning rate decay, model weights average, and data augmentations, can outperform state-of-the-art methods. Our findings suggest that employing a combination of regularization strategies can be more effective than intricate algorithms in tackling the challenges of learning with noisy labels. While some of these regularization strategies have been utilized in previous noisy label learning research, their full potential has not been thoroughly explored. Our results encourage a reevaluation of benchmarks for learning with noisy labels and prompt reconsideration of the role of specialized learning algorithms designed for training with noisy labels

    Enhanced HMGB1 Expression May Contribute to Th17 Cells Activation in Rheumatoid Arthritis

    Get PDF
    Rheumatoid arthritis(RA) is a common autoimmune disease associated with Th17 cells, but what about the effect of high-mobility group box chromosomal protein 1 (HMGB1) and the relationship between Th17-associated factors and HMGB1 in RA remains unknown. In the present study, we investigated the mRNA levels of HMGB1, RORγt, and IL-17 in peripheral blood mononuclear cells (PBMCs) from patients with rheumatoid arthritis by quantitative real-time PCR (RT-qPCR), and the concentrations of HMGB1, IL-17, and IL-23 in plasma were detected by ELISA. And then, the effect of HMGB1 on Th17 cells differentiation was analyzed in vitro. Our clinical studies showed that the mRNAs of HMGB1, RORγt, and IL-17 in patients were higher than that in health control (P < 0.05), especially in active RA patients (P < 0.05). The plasma HMGB1, IL-17, and IL-23 in RA patients were also higher than that in health control (P < 0.05); there was a positive correlation between the expression levels of HMGB1 and the amount of CRP, ERS, and RF in plasma. In vitro, the IL-17-produced CD4+T cells were increased with 100 ng/mL rHMGB1 for 12h, which indicated that the increased HMGB1 might contribute to Th17 cells activation in RA patients

    Identification and characterization of class 1 integrons among Pseudomonas aeruginosa isolates from patients in Zhenjiang, China

    Get PDF
    SummaryObjectivesThe role of integrons in the spread of antibiotic resistance has been well established. The aim of this study was to investigate the resistance profiles of Pseudomonas aeruginosa isolated from patients in Zhenjiang to 13 antibiotics, and to identify the structure and dissemination of class 1 integrons.MethodsThe Kirby–Bauer disk diffusion assay was used to determine the rate of P. aeruginosa resistance. Class 1 integrons from multidrug-resistant isolates were amplified by PCR, and their PCR products were sequenced. We also analyzed the integron structures containing the same gene cassettes by restriction fragment length polymorphism (RFLP). Isolates were genotyped by pulsed-field gel electrophoresis (PFGE).ResultsThe resistance rates were between 29.6% and 90.1%. The prevalence of class 1 integrons was 38.0%. These integrons included five gene cassettes (aadB, aac6-II, blaPSE-1, dfrA17, and aadA5). The dfrA17 and aadA5 gene cassettes were found most often.ConclusionsClass 1 integrons were found to be widespread in P. aeruginosa isolated from clinical samples in the Zhenjiang area of China. The antibiotic resistance rates in class 1 integron-positive strains of P. aeruginosa were noticeably higher than those in class 1 integron-negative strains. PFGE showed that particular clones were circulating among patients
    corecore