65 research outputs found

    Using surveys of Affymetrix GeneChips to study antisense expression.

    Get PDF
    We have used large surveys of Affymetrix GeneChip data in the public domain to conduct a study of antisense expression across diverse conditions. We derive correlations between groups of probes which map uniquely to the same exon in the antisense direction. When there are no probes assigned to an exon in the sense direction we find that many of the antisense groups fail to detect a coherent block of transcription. We find that only a minority of these groups contain coherent blocks of antisense expression suggesting transcription. We also derive correlations between groups of probes which map uniquely to the same exon in both sense and antisense direction. In some of these cases the locations of sense probes overlap with the antisense probes, and the sense and antisense probe intensities are correlated with each other. This configuration suggests the existence of a Natural Antisense Transcript (NAT) pair. We find the majority of such NAT pairs detected by GeneChips are formed by a transcript of an established gene and either an EST or an mRNA. In order to determine the exact antisense regulatory mechanism indicated by the correlation of sense probes with antisense probes, a further investigation is necessary for every particular case of interest. However, the analysis of microarray data has proved to be a good method to reconfirm known NATs, discover new ones, as well as to notice possible problems in the annotation of antisense transcripts

    Identifying the impact of G-quadruplexes on Affymetrix 3' arrays using cloud computing.

    Get PDF
    A tetramer quadruplex structure is formed by four parallel strands of DNA/ RNA containing runs of guanine. These quadruplexes are able to form because guanine can Hoogsteen hydrogen bond to other guanines, and a tetrad of guanines can form a stable arrangement. Recently we have discovered that probes on Affymetrix GeneChips that contain runs of guanine do not measure gene expression reliably. We associate this finding with the likelihood that quadruplexes are forming on the surface of GeneChips. In order to cope with the rapidly expanding size of GeneChip array datasets in the public domain, we are exploring the use of cloud computing to replicate our experiments on 3' arrays to look at the effect of the location of G-spots (runs of guanines). Cloud computing is a recently introduced high-performance solution that takes advantage of the computational infrastructure of large organisations such as Amazon and Google. We expect that cloud computing will become widely adopted because it enables bioinformaticians to avoid capital expenditure on expensive computing resources and to only pay a cloud computing provider for what is used. Moreover, as well as financial efficiency, cloud computing is an ecologically-friendly technology, it enables efficient data-sharing and we expect it to be faster for development purposes. Here we propose the advantageous use of cloud computing to perform a large data-mining analysis of public domain 3' arrays

    A simulation study for comparing testing statistics in response-adaptive randomization

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Response-adaptive randomizations are able to assign more patients in a comparative clinical trial to the tentatively better treatment. However, due to the adaptation in patient allocation, the samples to be compared are no longer independent. At large sample sizes, many asymptotic properties of test statistics derived for independent sample comparison are still applicable in adaptive randomization provided that the patient allocation ratio converges to an appropriate target asymptotically. However, the small sample properties of commonly used test statistics in response-adaptive randomization are not fully studied.</p> <p>Methods</p> <p>Simulations are systematically conducted to characterize the statistical properties of eight test statistics in six response-adaptive randomization methods at six allocation targets with sample sizes ranging from 20 to 200. Since adaptive randomization is usually not recommended for sample size less than 30, the present paper focuses on the case with a sample of 30 to give general recommendations with regard to test statistics for contingency tables in response-adaptive randomization at small sample sizes.</p> <p>Results</p> <p>Among all asymptotic test statistics, the Cook's correction to chi-square test (<it>T</it><sub><it>MC</it></sub>) is the best in attaining the nominal size of hypothesis test. The William's correction to log-likelihood ratio test (<it>T</it><sub><it>ML</it></sub>) gives slightly inflated type I error and higher power as compared with <it>T</it><sub><it>MC</it></sub>, but it is more robust against the unbalance in patient allocation. <it>T</it><sub><it>MC </it></sub>and <it>T</it><sub><it>ML </it></sub>are usually the two test statistics with the highest power in different simulation scenarios. When focusing on <it>T</it><sub><it>MC </it></sub>and <it>T</it><sub><it>ML</it></sub>, the generalized drop-the-loser urn (GDL) and sequential estimation-adjusted urn (SEU) have the best ability to attain the correct size of hypothesis test respectively. Among all sequential methods that can target different allocation ratios, GDL has the lowest variation and the highest overall power at all allocation ratios. The performance of different adaptive randomization methods and test statistics also depends on allocation targets. At the limiting allocation ratio of drop-the-loser (DL) and randomized play-the-winner (RPW) urn, DL outperforms all other methods including GDL. When comparing the power of test statistics in the same randomization method but at different allocation targets, the powers of log-likelihood-ratio, log-relative-risk, log-odds-ratio, Wald-type Z, and chi-square test statistics are maximized at their corresponding optimal allocation ratios for power. Except for the optimal allocation target for log-relative-risk, the other four optimal targets could assign more patients to the worse arm in some simulation scenarios. Another optimal allocation target, <it>R</it><sub><it>RSIHR</it></sub>, proposed by Rosenberger and Sriram (<it>Journal of Statistical Planning and Inference</it>, 1997) is aimed at minimizing the number of failures at fixed power using Wald-type Z test statistics. Among allocation ratios that always assign more patients to the better treatment, <it>R</it><sub><it>RSIHR </it></sub>usually has less variation in patient allocation, and the values of variation are consistent across all simulation scenarios. Additionally, the patient allocation at <it>R</it><sub><it>RSIHR </it></sub>is not too extreme. Therefore, <it>R</it><sub><it>RSIHR </it></sub>provides a good balance between assigning more patients to the better treatment and maintaining the overall power.</p> <p>Conclusion</p> <p>The Cook's correction to chi-square test and Williams' correction to log-likelihood-ratio test are generally recommended for hypothesis test in response-adaptive randomization, especially when sample sizes are small. The generalized drop-the-loser urn design is the recommended method for its good overall properties. Also recommended is the use of the <it>R</it><sub><it>RSIHR </it></sub>allocation target.</p

    Spatial autocorrelation analysis of health care hotspots in Taiwan in 2006

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Spatial analytical techniques and models are often used in epidemiology to identify spatial anomalies (hotspots) in disease regions. These analytical approaches can be used to not only identify the location of such hotspots, but also their spatial patterns.</p> <p>Methods</p> <p>In this study, we utilize spatial autocorrelation methodologies, including Global Moran's I and Local Getis-Ord statistics, to describe and map spatial clusters, and areas in which these are situated, for the 20 leading causes of death in Taiwan. In addition, we use the fit to a logistic regression model to test the characteristics of similarity and dissimilarity by gender.</p> <p>Results</p> <p>Gender is compared in efforts to formulate the common spatial risk. The mean found by local spatial autocorrelation analysis is utilized to identify spatial cluster patterns. There is naturally great interest in discovering the relationship between the leading causes of death and well-documented spatial risk factors. For example, in Taiwan, we found the geographical distribution of clusters where there is a prevalence of tuberculosis to closely correspond to the location of aboriginal townships.</p> <p>Conclusions</p> <p>Cluster mapping helps to clarify issues such as the spatial aspects of both internal and external correlations for leading health care events. This is of great aid in assessing spatial risk factors, which in turn facilitates the planning of the most advantageous types of health care policies and implementation of effective health care services.</p

    No evidence for association with APOL1 kidney disease risk alleles and Human African Trypanosomiasis in two Ugandan populations:

    Get PDF
    Human African trypanosomiasis (HAT) manifests as an acute form caused by Trypanosoma brucei rhodesiense (Tbr) and a chronic form caused by Trypanosoma brucei gambiense (Tbg). Previous studies have suggested a host genetic role in infection outcomes, particularly for APOL1. We have undertaken a candidate gene association studies (CGAS) in a Ugandan Tbr and a Tbg HAT endemic area, to determine whether polymorphisms in IL10, IL8, IL4, HLAG, TNFA, TNX4LB, IL6, IFNG, MIF, APOL1, HLAA, IL1B, IL4R, IL12B, IL12R, HP, HPR, and CFH have a role in HAT

    Quantitative measurements of inequality in geographic accessibility to pediatric care in Oita Prefecture, Japan: Standardization with complete spatial randomness

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>A quantitative measurement of inequality in geographic accessibility to pediatric care as well as that of mean distance or travel time is very important for priority setting to ensure fair access to pediatric facilities. However, conventional techniques for measuring inequality is inappropriate in geographic settings. Since inequality measures of access distance or travel time is strongly influenced by the background geographic distribution patterns, they cannot be directly used for regional comparisons of geographic accessibility. The objective of this study is to resolve this issue by using a standardization approach.</p> <p>Methods</p> <p>Travel times to the nearest pediatric care were calculated for all children in Oita Prefecture, Japan. Relative mean differences were considered as the inequality measure for secondary medical service areas, and were standardized with an expected value estimated from a Monte Carlo simulation based on complete spatial randomness.</p> <p>Results</p> <p>The observed mean travel times in the area considered averaged 4.50 minutes, ranging from 1.83 to 7.02 minutes. The mean of the observed inequality measure was 1.1, ranging from 0.9 to 1.3. The expected values of the inequality measure varied according to the background geographic distribution pattern of children, which ranged from 0.3 to 0.7. After standardizing the observed inequality measure with the expected one, we found that the ranks of the inequality measure were reversed for the observed areas.</p> <p>Conclusions</p> <p>Using the indicator proposed in this paper, it is possible to compare the inequality in geographic accessibility among regions. Such a comparison may facilitate priority setting in health policy and planning.</p

    Using microwave links to adjust the radar rainfall field

    No full text
    The final stage in processing radar data so as to arrive at an estimated rain field typically involves a comparison of the preliminary radar-derived estimates of hourly rainfall with those observed by ground-based gauges. Often a mean field bias adjustment will then be applied using an age-weighted average of the individual gauge–radar comparisons. In this paper, a mean field bias adjustment is presented that uses the path-integrated rainfall estimates provided by microwave links together with information from gauges. It is shown to be at least as efficient as the current gauge-based procedure used by the UK Met Office to improve the accuracy of radar-based estimates of rainfall at the ground

    A Survey of Spatial Defects in Homo Sapiens Affymetrix GeneChips

    Get PDF
    Modern biology has moved from a science of individual measurements to a science where data are collected on an industrial scale. Foremost amongst the new tools for biochemistry are chip arrays which, in one operation, measure hundreds of thousands or even millions of DNA sequences or RNA transcripts. Whilst this is impressive, increasingly sophisticated analysis tools have been required to convert gene array data into gene expression levels. Despite the assumption that noise levels are low, since the number of measurements for an individual gene is small, identifying which signals are affected by noise is a priority. High-density oligonucleotide array (HDONAs) from NCBI GEO shows that, even in the best Human GeneChips 1/4percent of data are affected by spatial noise. Earlier designs are more noisy and spatial defects may affect more than 25percent of probes. BioConductor R code is available as supplementary material and via \hrefhttp://bioinformatics.essex.ac.uk/users/wlangdon/TCBB-2007-11-0161.tar.gzhttp://bioinformatics.essex.ac.uk/users/wlangdon/TCBB-2007-11-0161.tar.g
    corecore