65 research outputs found

    The Shivplot: a graphical display for trend elucidation and exploratory analysis of microarray data

    Get PDF
    BACKGROUND: High-throughput systems are powerful tools for the life science research community. The complexity and volume of data from these systems, however, demand special treatment. Graphical tools are needed to evaluate many aspects of the data throughout the analysis process because plots can provide quality assessments for thousands of values simultaneously. The utility of a plot, in turn, is contingent on both its interpretability and its efficiency. RESULTS: The shivplot, a graphical technique motivated by microarrays but applicable to any replicated high-throughput data set, is described. The plot capitalizes on the strengths of three well-established plotting graphics – a boxplot, a distribution density plot, and a variability vs intensity plot – by effectively combining them into a single representation. CONCLUSION: The utility of the new display is illustrated with microarray data sets. The proposed graph, retaining all the information of its precursors, conserves space and minimizes redundancy, but also highlights features of the data that would be difficult to appreciate from the individual display components. We recommend the use of the shivplot both for exploratory data analysis and for the communication of experimental data in publications

    Systematic error detection in experimental high-throughput screening

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>High-throughput screening (HTS) is a key part of the drug discovery process during which thousands of chemical compounds are screened and their activity levels measured in order to identify potential drug candidates (i.e., hits). Many technical, procedural or environmental factors can cause systematic measurement error or inequalities in the conditions in which the measurements are taken. Such systematic error has the potential to critically affect the hit selection process. Several error correction methods and software have been developed to address this issue in the context of experimental HTS <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr></abbrgrp>. Despite their power to reduce the impact of systematic error when applied to error perturbed datasets, those methods also have one disadvantage - they introduce a bias when applied to data not containing any systematic error <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>. Hence, we need first to assess the presence of systematic error in a given HTS assay and then carry out systematic error correction method if and only if the presence of systematic error has been confirmed by statistical tests.</p> <p>Results</p> <p>We tested three statistical procedures to assess the presence of systematic error in experimental HTS data, including the χ<sup>2 </sup>goodness-of-fit test, Student's t-test and Kolmogorov-Smirnov test <abbrgrp><abbr bid="B8">8</abbr></abbrgrp> preceded by the Discrete Fourier Transform (DFT) method <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>. We applied these procedures to raw HTS measurements, first, and to estimated hit distribution surfaces, second. The three competing tests were applied to analyse simulated datasets containing different types of systematic error, and to a real HTS dataset. Their accuracy was compared under various error conditions.</p> <p>Conclusions</p> <p>A successful assessment of the presence of systematic error in experimental HTS assays is possible when the appropriate statistical methodology is used. Namely, the t-test should be carried out by researchers to determine whether systematic error is present in their HTS data prior to applying any error correction method. This important step can significantly improve the quality of selected hits.</p

    The skills of hypnosis

    Get PDF

    A methodology for global validation of microarray experiments

    Get PDF
    BACKGROUND: DNA microarrays are popular tools for measuring gene expression of biological samples. This ever increasing popularity is ensuring that a large number of microarray studies are conducted, many of which with data publicly available for mining by other investigators. Under most circumstances, validation of differential expression of genes is performed on a gene to gene basis. Thus, it is not possible to generalize validation results to the remaining majority of non-validated genes or to evaluate the overall quality of these studies. RESULTS: We present an approach for the global validation of DNA microarray experiments that will allow researchers to evaluate the general quality of their experiment and to extrapolate validation results of a subset of genes to the remaining non-validated genes. We illustrate why the popular strategy of selecting only the most differentially expressed genes for validation generally fails as a global validation strategy and propose random-stratified sampling as a better gene selection method. We also illustrate shortcomings of often-used validation indices such as overlap of significant effects and the correlation coefficient and recommend the concordance correlation coefficient (CCC) as an alternative. CONCLUSION: We provide recommendations that will enhance validity checks of microarray experiments while minimizing the need to run a large number of labour-intensive individual validation assays

    Proximity-graph-based tools for DNA clustering

    Get PDF
    There are more than one billion documents on the Web, with the count continually rising at a pace of over one million new documents per day. As information increases, the motivation and interest in data warehousing and mining research and practice remains high in organizational interest. The Encyclopedia of Data Warehousing and Mining, Second Edition, offers thorough exposure to the issues of importance in the rapidly changing field of data warehousing and mining. This essential reference source informs decision makers, problem solvers, and data mining specialists in business, academia, government, and other settings with over 300 entries on theories, methodologies, functionalities, and applications

    Populated and Remote Reefs Spanning Multiple Archipelagos Across the Central and Western Pacific

    Get PDF
    Comparable information on the status of natural resources across large geographic and human impact scales provides invaluable context to ecosystem-based management and insights into processes driving differences among areas. Data on fish assemblages at 39 US flag coral reef-areas distributed across the Pacific are presented. Total reef fish biomass varied by more than an order of magnitude: lowest at densely-populated islands and highest on reefs distant from human populations. Remote reefs (&lt;50 people within 100 km) averaged ∼4 times the biomass of &quot;all fishes&quot; and 15 times the biomass of piscivores compared to reefs near populated areas. Greatest within-archipelagic differences were found in Hawaiian and Mariana Archipelagos, where differences were consistent with, but likely not exclusively driven by, higher fishing pressure around populated areas. Results highlight the importance of the extremely remote reefs now contained within the system of Pacific Marine National Monuments as ecological reference areas

    Development of a framework for genotyping bovine-derived Cryptosporidium parvum, using a multilocus fragment typing tool

    Get PDF
    Background: There is a need for an integrated genotyping approach for C. parvum; no sufficiently discriminatory scheme to date has been fully validated or widely adopted by veterinary or public health researchers. Multilocus fragment typing (MLFT) can provide good differentiation and is relatively quick and cheap to perform. A MLFT tool was assessed in terms of its typeability, specificity, precision (repeatability and reproducibility), accuracy and ability to genotypically discriminate bovine-derived Cryptosporidium parvum. Methods: With the aim of working towards a consensus, six markers were selected for inclusion based on their successful application in previous studies: MM5, MM18, MM19, TP14, MS1 and MS9. Alleles were assigned according to the fragment sizes of repeat regions amplified, as determined by capillary electrophoresis. In addition, a region of the GP60 gene was amplified and sequenced to determine gp60 subtype and this was added to the allelic profiles of the 6 markers to determine the multilocus genotype (MLG). The MLFT tool was applied to 140 C. parvum samples collected in two cross-sectional studies of UK calves, conducted in Cheshire in 2004 (principally dairy animals) and Aberdeenshire/Caithness in 2011 (beef animals). Results: Typeability was 84 %. The primers did not amplify tested non-parvum species frequently detected in cattle. In terms of repeatability, within- and between-run fragment sizes showed little variability. Between laboratories, fragment sizes differed but allele calling was reproducible. The MLFT had good discriminatory ability (Simpson’s Index of Diversity, SID, was 0.92), compared to gp60 sequencing alone (SID 0.44). Some markers were more informative than others, with MS1 and MS9 proving monoallelic in tested samples. Conclusions: Further inter-laboratory trials are now warranted with the inclusion of human-derived C. parvum samples, allowing progress towards an integrated, standardised typing scheme to enable source attribution and to determine the role of livestock in future outbreaks of human C. parvum
    • …
    corecore