3 research outputs found

    Benchmarking immunoinformatic tools for the analysis of antibody repertoire sequences

    Get PDF
    Antibody repertoires reveal insights into the biology of the adaptive immune system and empower diagnostics and therapeutics. There are currently multiple tools available for the annotation of antibody sequences. All downstream analyses such as choosing lead drug candidates depend on the correct annotation of these sequences; however, a thorough comparison of the performance of these tools has not been investigated. Here, we benchmark the performance of commonly used immunoinformatic tools, i.e. IMGT/HighV-QUEST, IgBLAST and MiXCR, in terms of reproducibility of annotation output, accuracy and speed using simulated and experimental high-throughput sequencing datasets

    Benchmarking immunoinformatic tools for the analysis of antibody repertoire sequences

    No full text
    SUMMARY: Antibody repertoires reveal insights into the biology of the adaptive immune system and empower diagnostics and therapeutics. There are currently multiple tools available for the annotation of antibody sequences. All downstream analyses such as choosing lead drug candidates depend on the correct annotation of these sequences; however, a thorough comparison of the performance of these tools has not been investigated. Here, we benchmark the performance of commonly used immunoinformatic tools, i.e. IMGT/HighV-QUEST, IgBLAST and MiXCR, in terms of reproducibility of annotation output, accuracy and speed using simulated and experimental high-throughput sequencing datasets.We analyzed changes in IMGT reference germline database in the last 10 years in order to assess the reproducibility of the annotation output. We found that only 73/183 (40%) V, D and J human genes were shared between the reference germline sets used by the tools. We found that the annotation results differed between tools. In terms of alignment accuracy, MiXCR had the highest average frequency of gene mishits, 0.02 mishit frequency and IgBLAST the lowest, 0.004 mishit frequency. Reproducibility in the output of complementarity determining three regions (CDR3 amino acids) ranged from 4.3% to 77.6% with preprocessed data. In addition, run time of the tools was assessed: MiXCR was the fastest tool for number of sequences processed per unit of time. These results indicate that immunoinformatic analyses greatly depend on the choice of bioinformatics tool. Our results support informed decision-making to immunoinformaticians based on repertoire composition and sequencing platforms. AVAILABILITY AND IMPLEMENTATION: All tools utilized in the paper are free for academic use. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online

    RWD-Cockpit: Application for Quality Assessment of Real-world Data

    No full text
    BackgroundDigital technologies are transforming the health care system. A large part of information is generated as real-world data (RWD). Data from electronic health records and digital biomarkers have the potential to reveal associations between the benefits and adverse events of medicines, establish new patient-stratification principles, expose unknown disease correlations, and inform on preventive measures. The impact for health care payers and providers, the biopharmaceutical industry, and governments is massive in terms of health outcomes, quality of care, and cost. However, a framework to assess the preliminary quality of RWD is missing, thus hindering the conduct of population-based observational studies to support regulatory decision-making and real-world evidence. ObjectiveTo address the need to qualify RWD, we aimed to build a web application as a tool to translate characterization of some quality parameters of RWD into a metric and propose a standard framework for evaluating the quality of the RWD. MethodsThe RWD-Cockpit systematically scores data sets based on proposed quality metrics and customizable variables chosen by the user. Sleep RWD generated de novo and publicly available data sets were used to validate the usability and applicability of the web application. The RWD quality score is based on the evaluation of 7 variables: manageability specifies access and publication status; complexity defines univariate, multivariate, and longitudinal data; sample size indicates the size of the sample or samples; privacy and liability stipulates privacy rules; accessibility specifies how the data set can be accessed and to what granularity; periodicity specifies how often the data set is updated; and standardization specifies whether the data set adheres to any specific technical or metadata standard. These variables are associated with several descriptors that define specific characteristics of the data set. ResultsTo address the need to qualify RWD, we built the RWD-Cockpit web application, which proposes a framework and applies a common standard for a preliminary evaluation of RWD quality across data sets—molecular, phenotypical, and social—and proposes a standard that can be further personalized by the community retaining an internal standard. Applied to 2 different case studies—de novo–generated sleep data and publicly available data sets—the RWD-Cockpit could identify and provide researchers with variables that might increase quality. ConclusionsThe results from the application of the framework of RWD metrics implemented in the RWD-Cockpit application suggests that multiple data sets can be preliminarily evaluated in terms of quality using the proposed metrics. The output scores—quality identifiers—provide a first quality assessment for the use of RWD. Although extensive challenges remain to be addressed to set RWD quality standards, our proposal can serve as an initial blueprint for community efforts in the characterization of RWD quality for regulated settings
    corecore