149 research outputs found

    Effect of various normalization methods on Applied Biosystems expression array system data

    Get PDF
    BACKGROUND: DNA microarray technology provides a powerful tool for characterizing gene expression on a genome scale. While the technology has been widely used in discovery-based medical and basic biological research, its direct application in clinical practice and regulatory decision-making has been questioned. A few key issues, including the reproducibility, reliability, compatibility and standardization of microarray analysis and results, must be critically addressed before any routine usage of microarrays in clinical laboratory and regulated areas can occur. In this study we investigate some of these issues for the Applied Biosystems Human Genome Survey Microarrays. RESULTS: We analyzed the gene expression profiles of two samples: brain and universal human reference (UHR), a mixture of RNAs from 10 cancer cell lines, using the Applied Biosystems Human Genome Survey Microarrays. Five technical replicates in three different sites were performed on the same total RNA samples according to manufacturer's standard protocols. Five different methods, quantile, median, scale, VSN and cyclic loess were used to normalize AB microarray data within each site. 1,000 genes spanning a wide dynamic range in gene expression levels were selected for real-time PCR validation. Using the TaqMan(® )assays data set as the reference set, the performance of the five normalization methods was evaluated focusing on the following criteria: (1) Sensitivity and reproducibility in detection of expression; (2) Fold change correlation with real-time PCR data; (3) Sensitivity and specificity in detection of differential expression; (4) Reproducibility of differentially expressed gene lists. CONCLUSION: Our results showed a high level of concordance between these normalization methods. This is true, regardless of whether signal, detection, variation, fold change measurements and reproducibility were interrogated. Furthermore, we used TaqMan(® )assays as a reference, to generate TPR and FDR plots for the various normalization methods across the assay range. Little impact is observed on the TP and FP rates in detection of differentially expressed genes. Additionally, little effect was observed by the various normalization methods on the statistical approaches analyzed which indicates a certain robustness of the analysis methods currently in use in the field, particularly when used in conjunction with the Applied Biosystems Gene Expression System

    Constructing non-stationary Dynamic Bayesian Networks with a flexible lag choosing mechanism

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Dynamic Bayesian Networks (DBNs) are widely used in regulatory network structure inference with gene expression data. Current methods assumed that the underlying stochastic processes that generate the gene expression data are stationary. The assumption is not realistic in certain applications where the intrinsic regulatory networks are subject to changes for adapting to internal or external stimuli.</p> <p>Results</p> <p>In this paper we investigate a novel non-stationary DBNs method with a potential regulator detection technique and a flexible lag choosing mechanism. We apply the approach for the gene regulatory network inference on three non-stationary time series data. For the Macrophages and Arabidopsis data sets with the reference networks, our method shows better network structure prediction accuracy. For the Drosophila data set, our approach converges faster and shows a better prediction accuracy on transition times. In addition, our reconstructed regulatory networks on the Drosophila data not only share a lot of similarities with the predictions of the work of other researchers but also provide many new structural information for further investigation.</p> <p>Conclusions</p> <p>Compared with recent proposed non-stationary DBNs methods, our approach has better structure prediction accuracy By detecting potential regulators, our method reduces the size of the search space, hence may speed up the convergence of MCMC sampling.</p

    A classification-based framework for predicting and analyzing gene regulatory response

    Get PDF
    BACKGROUND: We have recently introduced a predictive framework for studying gene transcriptional regulation in simpler organisms using a novel supervised learning algorithm called GeneClass. GeneClass is motivated by the hypothesis that in model organisms such as Saccharomyces cerevisiae, we can learn a decision rule for predicting whether a gene is up- or down-regulated in a particular microarray experiment based on the presence of binding site subsequences ("motifs") in the gene's regulatory region and the expression levels of regulators such as transcription factors in the experiment ("parents"). GeneClass formulates the learning task as a classification problem — predicting +1 and -1 labels corresponding to up- and down-regulation beyond the levels of biological and measurement noise in microarray measurements. Using the Adaboost algorithm, GeneClass learns a prediction function in the form of an alternating decision tree, a margin-based generalization of a decision tree. METHODS: In the current work, we introduce a new, robust version of the GeneClass algorithm that increases stability and computational efficiency, yielding a more scalable and reliable predictive model. The improved stability of the prediction tree enables us to introduce a detailed post-processing framework for biological interpretation, including individual and group target gene analysis to reveal condition-specific regulation programs and to suggest signaling pathways. Robust GeneClass uses a novel stabilized variant of boosting that allows a set of correlated features, rather than single features, to be included at nodes of the tree; in this way, biologically important features that are correlated with the single best feature are retained rather than decorrelated and lost in the next round of boosting. Other computational developments include fast matrix computation of the loss function for all features, allowing scalability to large datasets, and the use of abstaining weak rules, which results in a more shallow and interpretable tree. We also show how to incorporate genome-wide protein-DNA binding data from ChIP chip experiments into the GeneClass algorithm, and we use an improved noise model for gene expression data. RESULTS: Using the improved scalability of Robust GeneClass, we present larger scale experiments on a yeast environmental stress dataset, training and testing on all genes and using a comprehensive set of potential regulators. We demonstrate the improved stability of the features in the learned prediction tree, and we show the utility of the post-processing framework by analyzing two groups of genes in yeast — the protein chaperones and a set of putative targets of the Nrg1 and Nrg2 transcription factors — and suggesting novel hypotheses about their transcriptional and post-transcriptional regulation. Detailed results and Robust GeneClass source code is available for download from

    High prevalence of hyperglycaemia and the impact of high household income in transforming Rural China

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The prevalence of hyperglycaemia and its association with socioeconomic factors have been well studied in developed countries, however, little is known about them in transforming rural China.</p> <p>Methods</p> <p>A cross-sectional study was carried out in 4 rural communities of Deqing County located in East China in 2006-07, including 4,506 subjects aged 18 to 64 years. Fasting plasma glucose (FPG) was measured. Subjects were considered to have impaired fasting glucose (IFG) if FPG was in the range from 5.6 to 6.9 mmol/L and to have diabetes mellitus (DM) if FG was 7.0 mmol/L or above.</p> <p>Results</p> <p>The crude prevalences of IFG and DM were 5.4% and 2.2%, respectively. The average ratio of IFG/DM was 2.5, and tended to be higher for those under the age of 35 years than older subjects. After adjustment for covariates including age (continuous), sex, BMI (continuous), smoking, alcohol drinking, and regular leisure physical activity, subjects in the high household income group had a significantly higher risk of IFG compared with the medium household income group (OR: 1.74, 95% CI: 1.11-2.72) and no significant difference in IFG was observed between the low and medium household income groups. Education and farmer occupation were not significantly associated with IFG.</p> <p>Conclusions</p> <p>High household income was significantly associated with an increased risk of IFG. A high ratio of IFG/DM suggests a high risk of diabetes in foreseeable future in the Chinese transforming rural communities.</p

    Factors affecting patterns of tick parasitism on forest rodents in tick-borne encephalitis risk areas, Germany

    Get PDF
    Identifying factors affecting individual vector burdens is essential for understanding infectious disease systems. Drawing upon data of a rodent monitoring programme conducted in nine different forest patches in southern Hesse, Germany, we developed models which predict tick (Ixodes spp. and Dermacentor spp.) burdens on two rodent species Apodemus flavicollis and Myodes glareolus. Models for the two rodent species were broadly similar but differed in some aspects. Patterns of Ixodes spp. burdens were influenced by extrinsic factors such as season, unexplained spatial variation (both species), relative humidity and vegetation cover (A. flavicollis). We found support for the ‘body mass’ (tick burdens increase with body mass/age) and for the ‘dilution’ hypothesis (tick burdens decline with increasing rodent densities) and little support for the ‘sex-bias’ hypothesis (both species). Surprisingly, roe deer densities were not correlated with larvae counts on rodents. Factors influencing the mean burden did not significantly explain the observed dispersion of tick counts. Co-feeding aggregations, which are essential for tick-borne disease transmission, were mainly found in A. flavicollis of high body mass trapped in areas with fast increase in spring temperatures. Locally, Dermacentor spp. appears to be an important parasite on A. flavicollis and M. glareolus. Dermacentor spp. was rather confined to areas with higher average temperatures during the vegetation period. Nymphs of Dermacentor spp. mainly fed on M. glareolus and were seldom found on A. flavicollis. Whereas Ixodes spp. is the dominant tick genus in woodlands of our study area, the distribution and epidemiological role of Dermacentor spp. should be monitored closely

    Towards an integrated approach in surveillance of vector-borne diseases in Europe

    Get PDF
    Vector borne disease (VBD) emergence is a complex and dynamic process. Interactions between multiple disciplines and responsible health and environmental authorities are often needed for an effective early warning, surveillance and control of vectors and the diseases they transmit. To fully appreciate this complexity, integrated knowledge about the human and the vector population is desirable. In the current paper, important parameters and terms of both public health and medical entomology are defined in order to establish a common language that facilitates collaboration between the two disciplines. Special focus is put on the different VBD contexts with respect to the current presence or absence of the disease, the pathogen and the vector in a given location. Depending on the context, whether a VBD is endemic or not, surveillance activities are required to assess disease burden or threat, respectively. Following a decision for action, surveillance activities continue to assess trends
    corecore