17 research outputs found

    RNA-seq: technical variability and sampling

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>RNA-seq is revolutionizing the way we study transcriptomes. mRNA can be surveyed without prior knowledge of gene transcripts. Alternative splicing of transcript isoforms and the identification of previously unknown exons are being reported. Initial reports of differences in exon usage, and splicing between samples as well as quantitative differences among samples are beginning to surface. Biological variation has been reported to be larger than technical variation. In addition, technical variation has been reported to be in line with expectations due to random sampling. However, strategies for dealing with technical variation will differ depending on the magnitude. The size of technical variance, and the role of sampling are examined in this manuscript.</p> <p>Results</p> <p>In this study three independent Solexa/Illumina experiments containing technical replicates are analyzed. When coverage is low, large disagreements between technical replicates are apparent. Exon detection between technical replicates is highly variable when the coverage is less than 5 reads per nucleotide and estimates of gene expression are more likely to disagree when coverage is low. Although large disagreements in the estimates of expression are observed at all levels of coverage.</p> <p>Conclusions</p> <p>Technical variability is too high to ignore. Technical variability results in inconsistent detection of exons at low levels of coverage. Further, the estimate of the relative abundance of a transcript can substantially disagree, even when coverage levels are high. This may be due to the low sampling fraction and if so, it will persist as an issue needing to be addressed in experimental design even as the next wave of technology produces larger numbers of reads. We provide practical recommendations for dealing with the technical variability, without dramatic cost increases.</p

    Predictive Analytics in Practice: A Novel Simulation Application for Addressing Patient Flow Challenges in Today&apos;s Emergency Departments

    Get PDF
    Abstract Objectives: To develop a flexible software application that uses predictive analytics to enable emergency department (ED) decision-makers in virtually any environment to predict the effects of operational interventions and enhance continual process improvement efforts. To demonstrate the ability of the application&apos;s core simulation model to recreate and predict sitespecific patient flow in two very different EDs: a large academic center and a freestanding ED. To describe how the application was used by a freestanding ED medical director to match ED resources to patient demand. Methods: The application was developed through a public-private partnership between University of Florida Health and Roundtable Analytics, Inc., supported by a National Science Foundation Small Business Technology Transfer (STTR) grant. The core simulation technology was designed to be quickly adaptable to any ED using data routinely collected by most electronic health record systems. To demonstrate model accuracy, Monte Carlo studies were performed to predict the effects of management interventions in two distinct ED settings. At one ED, the medical director conducted simulation studies to evaluate the sustainability of the current staffing strategy and inform his decision to implement specific interventions that better match ED resources to patient demand. After implementation of one intervention, the fidelity of the model&apos;s predictions was evaluated. Results: A flexible, cloud-based software application enabling ED decision-makers to predict the effects of operational decisions was developed and deployed at two qualitatively distinct EDs. The application accurately recreated each ED&apos;s throughput and faithfully predicted the effects of specific management interventions. At one site, the application was used to identify when increasing arrivals will dictate that the current staffing strategy will be less effective than an alternative strategy. As actual arrivals approached this point, decision-makers used the application to simulate a variety different interventions; this directly informed their decision to implement a new strategy. The observed outcomes resulting from this intervention fell within the range of predictions from the model. Conclusion: This application overcomes technical barriers that have made simulation modeling inaccessible to key decision-makers in emergency departments. Using this technology, ED managers with no programming experience can conduct customized simulation studies regardless of their ED&apos;s volume and complexity. In two very different case studies, the fidelity of the application was established and the application was shown to have a direct positive effect on patient flow. The effective use of simulation modeling promises to replace inefficient trial-anderror approaches and become a useful and accessible tool for healthcare managers challenged to make operational decisions in environments of increasingly scarce resources

    Adjusting the June Area Survey Estimate of the Number of U.S. Farms for Misclassification and Non-response

    No full text
    Each year, the National Agricultural Statistics Service (NASS) conducts the June Area Survey (JAS), which is based on an area frame. The JAS provides information on U.S. agriculture, including an estimate of the number of farms in the U.S. NASS also conducts the Census of Agriculture every five years in years ending in 2 and 7. The census, which uses both a list and the JAS area frame, also produces an estimate of the number of U.S. farms. In 2007, the two estimates were further apart than could be attributed to sampling error alone. Previous studies of the JAS identified misclassification of JAS sampled units as a source leading to an undercount in the number of farms in the U.S. Using data from the 2007 JAS and the 2007 Census, misclassification of tracts as agricultural or non-agricultural were identified. Research has also identified the estimation of agricultural activities for sampled tracts as another factor that contributes to the discrepancy in the JAS number of farms estimate. This research report presents methodology that adjusts for two known sources of error on the JAS: misclassification and estimation (which later will be addressed as non-response)

    On the Feasibility of Using NASS’s Sampling List Frame to Evaluate Misclassification Errors of the June Area Survey

    No full text
    During the past three years, the National Agricultural Statistics Service (NASS) has made an effort to address, quantify, and adjust for an undercount in the number of farms indication from its annual June Area Survey (JAS), which is based on an area frame. This undercount is a direct result of the misclassification of agricultural tracts as non-agricultural. The 2007 Census of Agriculture mailing list (CML) was evaluated as a potential source to assess misclassification on the 2007 JAS. The CML was found to be a rich source from which to quantify the undercount of farms on the JAS. However, the CML is only available every five years, and misclassification on the JAS should be assessed each year. Independently of the area frame, NASS maintains a list of agricultural operators, referred to as the list frame. Yearly list-based samples are selected from the list frame. In addition, the list frame serves as the foundation for building the CML. The list frame is updated on an on-going basis and operators are categorized as either active or inactive. Although the CML includes all active records, some of these do not qualify as farming operations. This research report explores the potential of using the list frame on a yearly basis to assess the misclassification of farms on the JAS

    Annual Land Utilization Survey (ALUS): Design and Methodology

    No full text
    Each year, the National Agricultural Statistics Service (NASS) publishes an estimate of the number of farms in the United States based on the June Area Survey (JAS). Independent studies showed that the JAS number of farm indications have significant undercount due to misclassification. To adjust for this undercount, a follow-on survey to the JAS called the Annual Land Utilization Survey (ALUS) has been proposed. ALUS is designed and developed based on the Farm Numbers Research Project (FNRP). NASS conducted the FNRP in the fall of 2009 (Abreu, McCarthy and Colburn, 2010). ALUS samples from all JAS segments containing any estimated or non-agricultural JAS tracts. For a selected segment, all estimated and non-agricultural JAS tracts will be re-evaluated. The collection of eligible segments in a particular year will be called the ALUS population. The sample allocation of ALUS segments to each state-stratum combination considers two factors: the proportion of the ALUS population in the stratum and the proportion of the FNRP adjustment from non-agricultural tracts in the stratum. ALUS can be treated as a second phase to the JAS. The two-phase stratified design, JAS-ALUS, can be applied to any estimate produced by the JAS. However, ALUS has non-response. In this paper, methodology for a three-phase sampling design is developed by extending the two-phase sampling design methodology proposed by Sarndal and Swensson (1987). A general sampling design is allowed in each phase; that is, the inclusion probabilities in each phase are arbitrary. The estimator is unbiased, and an unbiased estimator for the variance is provided. Here, this method is applied to the two-phase JAS-ALUS with the third phase being response/non-response

    Place-based attributes predict community membership in a mobile phone communication network.

    Get PDF
    Social networks can be organized into communities of closely connected nodes, a property known as modularity. Because diseases, information, and behaviors spread faster within communities than between communities, understanding modularity has broad implications for public policy, epidemiology and the social sciences. Explanations for community formation in social networks often incorporate the attributes of individual people, such as gender, ethnicity or shared activities. High modularity is also a property of large-scale social networks, where each node represents a population of individuals at a location, such as call flow between mobile phone towers. However, whether or not place-based attributes, including land cover and economic activity, can predict community membership for network nodes in large-scale networks remains unknown. We describe the pattern of modularity in a mobile phone communication network in the Dominican Republic, and use a linear discriminant analysis (LDA) to determine whether geographic context can explain community membership. Our results demonstrate that place-based attributes, including sugar cane production, urbanization, distance to the nearest airport, and wealth, correctly predicted community membership for over 70% of mobile phone towers. We observed a strongly positive correlation (r?=?0.97) between the modularity score and the predictive ability of the LDA, suggesting that place-based attributes can accurately represent the processes driving modularity. In the absence of social network data, the methods we present can be used to predict community membership over large scales using solely place-based attributes
    corecore