195 research outputs found

    Analysis with respect to instrumental variables for the exploration of microarray data structures

    Get PDF
    BACKGROUND: Evaluating the importance of the different sources of variations is essential in microarray data experiments. Complex experimental designs generally include various factors structuring the data which should be taken into account. The objective of these experiments is the exploration of some given factors while controlling other factors. RESULTS: We present here a family of methods, the analyses with respect to instrumental variables, which can be easily applied to the particular case of microarray data. An illustrative example of analysis with instrumental variables is given in the case of microarray data investigating the effect of beverage intake on peripheral blood gene expression. This approach is compared to an ANOVA-based gene-by-gene statistical method. CONCLUSION: Instrumental variables analyses provide a simple way to control several sources of variation in a multivariate analysis of microarray data. Due to their flexibility, these methods can be associated with a large range of ordination techniques combined with one or several qualitative and/or quantitative descriptive variables

    Stability of gene contributions and identification of outliers in multivariate analysis of microarray data

    Get PDF
    BACKGROUND: Multivariate ordination methods are powerful tools for the exploration of complex data structures present in microarray data. These methods have several advantages compared to common gene-by-gene approaches. However, due to their exploratory nature, multivariate ordination methods do not allow direct statistical testing of the stability of genes. RESULTS: In this study, we developed a computationally efficient algorithm for: i) the assessment of the significance of gene contributions and ii) the identification of sample outliers in multivariate analysis of microarray data. The approach is based on the use of resampling methods including bootstrapping and jackknifing. A statistical package of R functions was developed. This package includes tools for both inferring the statistical significance of gene contributions and identifying outliers among samples. CONCLUSION: The methodology was successfully applied to three published data sets with varying levels of signal intensities. Its relevance was compared with alternative methods. Overall, it proved to be particularly effective for the evaluation of the stability of microarray data

    How accurate and statistically robust are catalytic site predictions based on closeness centrality?

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>We examine the accuracy of enzyme catalytic residue predictions from a network representation of protein structure. In this model, amino acid α-carbons specify vertices within a graph and edges connect vertices that are proximal in structure. Closeness centrality, which has shown promise in previous investigations, is used to identify important positions within the network. Closeness centrality, a global measure of network centrality, is calculated as the reciprocal of the average distance between vertex <it>i </it>and all other vertices.</p> <p>Results</p> <p>We benchmark the approach against 283 structurally unique proteins within the Catalytic Site Atlas. Our results, which are inline with previous investigations of smaller datasets, indicate closeness centrality predictions are statistically significant. However, unlike previous approaches, we specifically focus on residues with the very best scores. Over the top five closeness centrality scores, we observe an average true to false positive rate ratio of 6.8 to 1. As demonstrated previously, adding a solvent accessibility filter significantly improves predictive power; the average ratio is increased to 15.3 to 1. We also demonstrate (for the first time) that filtering the predictions by residue identity improves the results even more than accessibility filtering. Here, we simply eliminate residues with physiochemical properties unlikely to be compatible with catalytic requirements from consideration. Residue identity filtering improves the average true to false positive rate ratio to 26.3 to 1. Combining the two filters together has little affect on the results. Calculated p-values for the three prediction schemes range from 2.7E-9 to less than 8.8E-134. Finally, the sensitivity of the predictions to structure choice and slight perturbations is examined.</p> <p>Conclusion</p> <p>Our results resolutely confirm that closeness centrality is a viable prediction scheme whose predictions are statistically significant. Simple filtering schemes substantially improve the method's predicted power. Moreover, no clear effect on performance is observed when comparing ligated and unligated structures. Similarly, the CC prediction results are robust to slight structural perturbations from molecular dynamics simulation.</p

    Ocean and land forcing of the record-breaking Dust Bowl heat waves across central United States

    Get PDF
    International audienceThe severe drought of the 1930s Dust Bowl decade coincided with record-breaking summer heatwaves that contributed to the socioeconomic and ecological disaster over North America's Great Plains. It remains unresolved to what extent these exceptional heatwaves, hotter than in historically forced coupled climate model simulations, were forced by sea surface temperatures (SSTs) and exacerbated through human-induced deterioration of land cover. Here we show, using an atmospheric-only model, that anomalously warm North Atlantic SSTs enhance heatwave activity through an association with drier spring conditions resulting from weaker moisture transport. Model devegetation simulations, that represent the widespread exposure of bare soil in the 1930s, suggest human activity fueled stronger and more frequent heatwaves through greater evaporative drying in the warmer months. This study highlights the potential for the amplification of naturally occurring extreme events like droughts by vegetation feedbacks to create more extreme heatwaves in a warmer world

    A flexible framework for sparse simultaneous component based data integration

    Get PDF
    <p>Abstract</p> <p>1 Background</p> <p>High throughput data are complex and methods that reveal structure underlying the data are most useful. Principal component analysis, frequently implemented as a singular value decomposition, is a popular technique in this respect. Nowadays often the challenge is to reveal structure in several sources of information (e.g., transcriptomics, proteomics) that are available for the same biological entities under study. Simultaneous component methods are most promising in this respect. However, the interpretation of the principal and simultaneous components is often daunting because contributions of each of the biomolecules (transcripts, proteins) have to be taken into account.</p> <p>2 Results</p> <p>We propose a sparse simultaneous component method that makes many of the parameters redundant by shrinking them to zero. It includes principal component analysis, sparse principal component analysis, and ordinary simultaneous component analysis as special cases. Several penalties can be tuned that account in different ways for the block structure present in the integrated data. This yields known sparse approaches as the lasso, the ridge penalty, the elastic net, the group lasso, sparse group lasso, and elitist lasso. In addition, the algorithmic results can be easily transposed to the context of regression. Metabolomics data obtained with two measurement platforms for the same set of <it>Escherichia coli </it>samples are used to illustrate the proposed methodology and the properties of different penalties with respect to sparseness across and within data blocks.</p> <p>3 Conclusion</p> <p>Sparse simultaneous component analysis is a useful method for data integration: First, simultaneous analyses of multiple blocks offer advantages over sequential and separate analyses and second, interpretation of the results is highly facilitated by their sparseness. The approach offered is flexible and allows to take the block structure in different ways into account. As such, structures can be found that are exclusively tied to one data platform (group lasso approach) as well as structures that involve all data platforms (Elitist lasso approach).</p> <p>4 Availability</p> <p>The additional file contains a MATLAB implementation of the sparse simultaneous component method.</p

    Frequency and genotypic distribution of GB virus C (GBV-C) among Colombian population with Hepatitis B (HBV) or Hepatitis C (HCV) infection

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>GB virus C (GBV-C) is an enveloped positive-sense ssRNA virus belonging to the <it>Flaviviridae </it>family. Studies on the genetic variability of the GBV-C reveals the existence of six genotypes: genotype 1 predominates in West Africa, genotype 2 in Europe and America, genotype 3 in Asia, genotype 4 in Southwest Asia, genotype 5 in South Africa and genotype 6 in Indonesia. The aim of this study was to determine the frequency and genotypic distribution of GBV-C in the Colombian population.</p> <p>Methods</p> <p>Two groups were analyzed: i) 408 Colombian blood donors infected with HCV (n = 250) and HBV (n = 158) from Bogotá and ii) 99 indigenous people with HBV infection from Leticia, Amazonas. A fragment of 344 bp from the 5' untranslated region (5' UTR) was amplified by nested RT PCR. Viral sequences were genotyped by phylogenetic analysis using reference sequences from each genotype obtained from GenBank (n = 160). Bayesian phylogenetic analyses were conducted using Markov chain Monte Carlo (MCMC) approach to obtain the MCC tree using BEAST v.1.5.3.</p> <p>Results</p> <p>Among blood donors, from 158 HBsAg positive samples, eight 5.06% (n = 8) were positive for GBV-C and from 250 anti-HCV positive samples, 3.2%(n = 8) were positive for GBV-C. Also, 7.7% (n = 7) GBV-C positive samples were found among indigenous people from Leticia. A phylogenetic analysis revealed the presence of the following GBV-C genotypes among blood donors: 2a (41.6%), 1 (33.3%), 3 (16.6%) and 2b (8.3%). All genotype 1 sequences were found in co-infection with HBV and 4/5 sequences genotype 2a were found in co-infection with HCV. All sequences from indigenous people from Leticia were classified as genotype 3. The presence of GBV-C infection was not correlated with the sex (p = 0.43), age (p = 0.38) or origin (p = 0.17).</p> <p>Conclusions</p> <p>It was found a high frequency of GBV-C genotype 1 and 2 in blood donors. The presence of genotype 3 in indigenous population was previously reported from Santa Marta region in Colombia and in native people from Venezuela and Bolivia. This fact may be correlated to the ancient movements of Asian people to South America a long time ago.</p

    Innate immunity against HIV: a priority target for HIV prevention research

    Get PDF
    This review summarizes recent advances and current gaps in understanding of innate immunity to human immunodeficiency virus (HIV) infection, and identifies key scientific priorities to enable application of this knowledge to the development of novel prevention strategies (vaccines and microbicides). It builds on productive discussion and new data arising out of a workshop on innate immunity against HIV held at the European Commission in Brussels, together with recent observations from the literature

    The Shedding of CD62L (L-Selectin) Regulates the Acquisition of Lytic Activity in Human Tumor Reactive T Lymphocytes

    Get PDF
    CD62L/L-selectin is a marker found on naïve T cells and further distinguishes central memory (Tcm, CD62L+) from effector memory (Tem, CD62L−) T cells. The regulation of CD62L plays a pivotal role in controlling the traffic of T lymphocytes to and from peripheral lymph nodes. CD62L is shed from the cell membrane following T cell activation, however, the physiological significance of this event remains to be elucidated. In this study, we utilized in vitro generated anti-tumor antigen T cells and melanoma lines as a model to evaluate the dynamics of CD62L shedding and expression of CD107a as a marker of lytic activity. Upon encounter, with matched tumor lines, antigen reactive T cells rapidly lose CD62L expression and this was associated with the acquisition of CD107a. By CD62L ELISA, we confirmed that this transition was mediated by the shedding of CD62L when T cells encountered specific tumor antigen. The introduction of a shedding resistant mutant of CD62L into the tumor antigen-reactive T cell line JKF6 impaired CD107a acquisition following antigen recognition and this was correlated with decreased lytic activity as measured by 51Cr release assays. The linkage of the shedding of CD62L from the surface of anti-tumor T cells and acquisition of lytic activity, suggests a new function for CD62L in T cell effector functions and anti-tumor activity
    corecore