119,550 research outputs found

    Probabilistic techniques for obtaining accurate patient counts in Clinical Data Warehouses

    Get PDF
    AbstractProposal and execution of clinical trials, computation of quality measures and discovery of correlation between medical phenomena are all applications where an accurate count of patients is needed. However, existing sources of this type of patient information, including Clinical Data Warehouses (CDWs) may be incomplete or inaccurate. This research explores applying probabilistic techniques, supported by the MayBMS probabilistic database, to obtain accurate patient counts from a Clinical Data Warehouse containing synthetic patient data.We present a synthetic Clinical Data Warehouse, and populate it with simulated data using a custom patient data generation engine. We then implement, evaluate and compare different techniques for obtaining patients counts.We model billing as a test for the presence of a condition. We compute billing’s sensitivity and specificity both by conducting a “Simulated Expert Review” where a representative sample of records are reviewed and labeled by experts, and by obtaining the ground truth for every record.We compute the posterior probability of a patient having a condition through a “Bayesian Chain”, using Bayes’ Theorem to calculate the probability of a patient having a condition after each visit. The second method is a “one-shot” approach that computes the probability of a patient having a condition based on whether the patient is ever billed for the condition.Our results demonstrate the utility of probabilistic approaches, which improve on the accuracy of raw counts. In particular, the simulated review paired with a single application of Bayes’ Theorem produces the best results, with an average error rate of 2.1% compared to 43.7% for the straightforward billing counts.Overall, this research demonstrates that Bayesian probabilistic approaches improve patient counts on simulated patient populations. We believe that total patient counts based on billing data are one of the many possible applications of our Bayesian framework. Use of these probabilistic techniques will enable more accurate patient counts and better results for applications requiring this metric

    Near-optimal RNA-Seq quantification

    Get PDF
    We present a novel approach to RNA-Seq quantification that is near optimal in speed and accuracy. Software implementing the approach, called kallisto, can be used to analyze 30 million unaligned paired-end RNA-Seq reads in less than 5 minutes on a standard laptop computer while providing results as accurate as those of the best existing tools. This removes a major computational bottleneck in RNA-Seq analysis.Comment: - Added some results (paralog analysis, allele specific expression analysis, alignment comparison, accuracy analysis with TPMs) - Switched bootstrap analysis to human sample from SEQC-MAQCIII - Provided link to a snakefile that allows for reproducibility of all results and figures in the pape

    Comparison of otolith readability and reproducibility of counts of translucent zones using different otolith preparation methods for four endemic Labeobarbus species in Lake Tana, Ethiopia

    Get PDF
    The analysis of fish age data is vital for the successful conservation of fish. Attempts to develop optimal management strategies for effective conservation of the endemic Labeobarbus species are strongly affected by the lack of accurate age estimates. Although methodological studies are key to acquiring a good insight into the age of fishes, up to now, there have not been any studies comparing different methods for these species. Thus, this study aimed at determining the best method for the endemic Labeobarbus species. Samples were collected from May 2016 to April 2017. Asteriscus otoliths from 150 specimens each of L. intermedius, L. tsanensis, L. platydorsus, and L. megastoma were examined. Six methods were evaluated; however, only three methods resulted in readable images. The procedure in which whole otoliths were first submerged in water, and subsequently placed in glycerol to take the image (MO1), was generally best. Except for L. megastoma, this method produced the clearest image as both the coefficient of variation and average percentage error between readers were lowest. Furthermore, except for L. megastoma, MO1 had high otolith readability and no systematic bias. Therefore, we suggest that MO1 should be used as the standard otolith preparation technique for the first three species, while for L. megastoma, other preparation techniques should be evaluated. This study provides a reference for researchers from Africa, particularly Ethiopia, to develop a suitable otolith preparation method for the different tropical fish species

    Do News and Sentiment play a role in Stock Price Prediction?

    Get PDF

    Archon Genomics X PRIZE Validation Protocol

    Get PDF
    This document is a collective assembly of techniques designed to test the quality and accuracy of 100 whole human genome sequences resulting from the $10 Million Archon Genomics X PRIZE (AGXP) competition. The purpose of this article is to enlist constructive criticism from the genomic and genetic community on the outlined approaches. The intent for the final version of this Validation Protocol is to become a useful standard by which to gauge the capabilities of whole genome sequencing technologies that emerge even after 2012

    Improving the value of public RNA-seq expression data by phenotype prediction.

    Get PDF
    Publicly available genomic data are a valuable resource for studying normal human variation and disease, but these data are often not well labeled or annotated. The lack of phenotype information for public genomic data severely limits their utility for addressing targeted biological questions. We develop an in silico phenotyping approach for predicting critical missing annotation directly from genomic measurements using well-annotated genomic and phenotypic data produced by consortia like TCGA and GTEx as training data. We apply in silico phenotyping to a set of 70 000 RNA-seq samples we recently processed on a common pipeline as part of the recount2 project. We use gene expression data to build and evaluate predictors for both biological phenotypes (sex, tissue, sample source) and experimental conditions (sequencing strategy). We demonstrate how these predictions can be used to study cross-sample properties of public genomic data, select genomic projects with specific characteristics, and perform downstream analyses using predicted phenotypes. The methods to perform phenotype prediction are available in the phenopredict R package and the predictions for recount2 are available from the recount R package. With data and phenotype information available for 70,000 human samples, expression data is available for use on a scale that was not previously feasible

    Gut microbiota in HIV-pneumonia patients is related to peripheral CD4 counts, lung microbiota, and in vitro macrophage dysfunction.

    Get PDF
    Pneumonia is common and frequently fatal in HIV-infected patients, due to rampant, systemic inflammation and failure to control microbial infection. While airway microbiota composition is related to local inflammatory response, gut microbiota has been shown to correlate with the degree of peripheral immune activation (IL6 and IP10 expression) in HIV-infected patients. We thus hypothesized that both airway and gut microbiota are perturbed in HIV-infected pneumonia patients, that the gut microbiota is related to peripheral CD4+ cell counts, and that its associated products differentially program immune cell populations necessary for controlling microbial infection in CD4-high and CD4-low patients. To assess these relationships, paired bronchoalveolar lavage and stool microbiota (bacterial and fungal) from a large cohort of Ugandan, HIV-infected patients with pneumonia were examined, and in vitro tests of the effect of gut microbiome products on macrophage effector phenotypes performed. While lower airway microbiota stratified into three compositionally distinct microbiota as previously described, these were not related to peripheral CD4 cell count. In contrast, variation in gut microbiota composition significantly related to CD4 cell count, lung microbiota composition, and patient mortality. Compared with patients with high CD4+ cell counts, those with low counts possessed more compositionally similar airway and gut microbiota, evidence of microbial translocation, and their associated gut microbiome products reduced macrophage activation and IL-10 expression and increased IL-1β expression in vitro. These findings suggest that the gut microbiome is related to CD4 status and plays a key role in modulating macrophage function, critical to microbial control in HIV-infected patients with pneumonia

    InPhaDel: integrative shotgun and proximity-ligation sequencing to phase deletions with single nucleotide polymorphisms.

    Get PDF
    Phasing of single nucleotide (SNV), and structural variations into chromosome-wide haplotypes in humans has been challenging, and required either trio sequencing or restricting phasing to population-based haplotypes. Selvaraj et al demonstrated single individual SNV phasing is possible with proximity ligated (HiC) sequencing. Here, we demonstrate HiC can phase structural variants into phased scaffolds of SNVs. Since HiC data is noisy, and SV calling is challenging, we applied a range of supervised classification techniques, including Support Vector Machines and Random Forest, to phase deletions. Our approach was demonstrated on deletion calls and phasings on the NA12878 human genome. We used three NA12878 chromosomes and simulated chromosomes to train model parameters. The remaining NA12878 chromosomes withheld from training were used to evaluate phasing accuracy. Random Forest had the highest accuracy and correctly phased 86% of the deletions with allele-specific read evidence. Allele-specific read evidence was found for 76% of the deletions. HiC provides significant read evidence for accurately phasing 33% of the deletions. Also, eight of eight top ranked deletions phased by only HiC were validated using long range polymerase chain reaction and Sanger. Thus, deletions from a single individual can be accurately phased using a combination of shotgun and proximity ligation sequencing. InPhaDel software is available at: http://l337x911.github.io/inphadel/
    • …
    corecore