15 research outputs found

    Threshold-free Selection of Taxonomic Multilabels

    Get PDF
    In online search or content selection systems, significant computational resources are expended to classify or categorize electronic documents into topics, concepts, or entities. A classifier can process, parse or otherwise analyze the document to assign one or more labels to the document based on the taxonomy. The classifier can generate a score for each of the labels, and provide the labels and the scores to other components or modules for further downstream processing. To keep downstream processes efficient without causing excessive processing of labels, the classifier may filter out the labels to return a subset of labels based on comparing a label’s score with a threshold. However, using a threshold-based technique to filter out labels may not account for the tree structure of the taxonomy, and it may also fail to take into account the likelihood dependencies between all parent nodes and child nodes. The proposed technique solves this by (1) selecting a set of labels returned by the classifier that optimizes certain metrics, such as precision and recall metrics; and (2) using a greedy multi-label selection algorithm that optimizes the precision/recall in step (1). Using these techniques, the system can select a subset of labels to return or provide for further processing

    Inheritance patterns in citation networks reveal scientific memes

    Full text link
    Memes are the cultural equivalent of genes that spread across human culture by means of imitation. What makes a meme and what distinguishes it from other forms of information, however, is still poorly understood. Our analysis of memes in the scientific literature reveals that they are governed by a surprisingly simple relationship between frequency of occurrence and the degree to which they propagate along the citation graph. We propose a simple formalization of this pattern and we validate it with data from close to 50 million publication records from the Web of Science, PubMed Central, and the American Physical Society. Evaluations relying on human annotators, citation network randomizations, and comparisons with several alternative approaches confirm that our formula is accurate and effective, without a dependence on linguistic or ontological knowledge and without the application of arbitrary thresholds or filters.Comment: 8 two-column pages, 5 figures; accepted for publication in Physical Review

    External quality assessment of SARS-CoV-2-sequencing: An ESGMD-SSM pilot trial across 15 European laboratories

    Get PDF
    Objective: This first pilot on external quality assessment (EQA) of SARS-CoV-2 whole genome sequencing, initiated by the ESCMID Study Group for Genomic and Molecular Diagnostics (ESGMD) and Swiss Society for Microbiology (SSM), aims to build a framework between laboratories in order to improve pathogen surveillance sequencing.Methods: Ten samples with varying viral loads were sent out to 15 clinical laboratories who had free choice of sequencing methods and bioinformatic analyses. The key aspects on which the individual centres were compared on were identification of 1) SNPs and indels, 2) Pango lineages, and 3) clusters between samples.Results: The participating laboratories used a wide array of methods and analysis pipelines. Most were able to generate whole genomes for all samples. Genomes were sequenced to varying depth (up to 100-fold difference across centres). There was a very good consensus regarding the majority of reporting criteria, but there were a few discrepancies in lineage and cluster assignment. Additionally, there were inconsistencies in variant calling. The main reasons for discrepancies were missing data, bioinformatic choices, and interpretation of data.Conclusions: The pilot EQA was an overall success. It was able to show the high quality of participating labs and provide valuable feedback in cases where problems occurred, thereby improving the sequencing setup of laboratories. A larger follow-up EQA should, however, improve on defining the variables and format of the report. Additionally, contamination and/or minority variants should be a further aspect of assessment.</p

    External quality assessment of SARS-CoV-2-sequencing: An ESGMD-SSM pilot trial across 15 European laboratories.

    Get PDF
    OBJECTIVE This first pilot on external quality assessment (EQA) of SARS-CoV-2 whole genome sequencing, initiated by the ESCMID Study Group for Genomic and Molecular Diagnostics (ESGMD) and Swiss Society for Microbiology (SSM), aims to build a framework between laboratories in order to improve pathogen surveillance sequencing. METHODS Ten samples with varying viral loads were sent out to 15 clinical laboratories who had free choice of sequencing methods and bioinformatic analyses. The key aspects on which the individual centres were compared on were identification of 1) SNPs and indels, 2) Pango lineages, and 3) clusters between samples. RESULTS The participating laboratories used a wide array of methods and analysis pipelines. Most were able to generate whole genomes for all samples. Genomes were sequenced to varying depth (up to 100-fold difference across centres). There was a very good consensus regarding the majority of reporting criteria, but there were a few discrepancies in lineage and cluster assignment. Additionally, there were inconsistencies in variant calling. The main reasons for discrepancies were missing data, bioinformatic choices, and interpretation of data. CONCLUSIONS The pilot EQA was an overall success. It was able to show the high quality of participating labs and provide valuable feedback in cases where problems occurred, thereby improving the sequencing setup of laboratories. A larger follow-up EQA should, however, improve on defining the variables and format of the report. Additionally, contamination and/or minority variants should be a further aspect of assessment

    The Impact of Network Size and Financial Incentives on Adoption and Participation in New Online Communities

    No full text
    The success of online communities depends heavily on the providers' abilities to motivate potential users to adopt the service and to actively participate. Because research in this field of media economics is rare, especially with regard to newly established communities, this study analyzes what drives community adoption and how direct and indirect financial incentives influence user participation. Extending Ajzen's (1991) Theory of Planned Behavior, this article shows, in 2 empirical studies, that network size significantly affects adoption in newly established communities. The results of the first study indicate a strong effect of indirect financial incentives (saving money) on the intention to adopt. The second study indicates that direct financial incentives (earning money) may well help increase the network's size without altering user motivation through crowding-out effects. It is interesting to note that the presence of direct financial incentives attracts new users, but it does not increase usage.

    External Quality Assessment of SARS-CoV-2 Sequencing: an ESGMD-SSM Pilot Trial across 15 European Laboratories.

    No full text
    This first pilot trial on external quality assessment (EQA) of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) whole-genome sequencing, initiated by the European Society of Clinical Microbiology and Infectious Diseases (ESCMID) Study Group for Genomic and Molecular Diagnostics (ESGMD) and the Swiss Society for Microbiology (SSM), aims to build a framework between laboratories in order to improve pathogen surveillance sequencing. Ten samples with various viral loads were sent out to 15 clinical laboratories that had free choice of sequencing methods and bioinformatic analyses. The key aspects on which the individual centers were compared were the identification of (i) single nucleotide polymorphisms (SNPs) and indels, (ii) Pango lineages, and (iii) clusters between samples. The participating laboratories used a wide array of methods and analysis pipelines. Most were able to generate whole genomes for all samples. Genomes were sequenced to various depths (up to a 100-fold difference across centers). There was a very good consensus regarding the majority of reporting criteria, but there were a few discrepancies in lineage and cluster assignments. Additionally, there were inconsistencies in variant calling. The main reasons for discrepancies were missing data, bioinformatic choices, and interpretation of data. The pilot EQA was overall a success. It was able to show the high quality of participating laboratories and provide valuable feedback in cases where problems occurred, thereby improving the sequencing setup of laboratories. A larger follow-up EQA should, however, improve on defining the variables and format of the report. Additionally, contamination and/or minority variants should be a further aspect of assessment
    corecore