109 research outputs found
BMC Bioinformatics
Background: For heterogeneous tissues, such as blood, measurements of gene expression are confounded by relative proportions of cell types involved. Conclusions have to rely on estimation of gene expression signals for homogeneous cell populations, e.g. by applying micro-dissection, fluorescence activated cell sorting, or in-silico deconfounding. We studied feasibility and validity of a non-negative matrix decomposition algorithm using experimental gene expression data for blood and sorted cells from the same donor samples. Our objective was to optimize the algorithm regarding detection of differentially expressed genes and to enable its use for classification in the difficult scenario of reversely regulated genes. This would be of importance for the identification of candidate biomarkers in heterogeneous tissues. Results: Experimental data and simulation studies involving noise parameters estimated from these data revealed that for valid detection of differential gene expression, quantile normalization and use of non-log data are optimal. We demonstrate the feasibility of predicting proportions of constituting cell types from gene expression data of single samples, as a prerequisite for a deconfounding-based classification approach. Classification cross-validation errors with and without using deconfounding results are reported as well as sample-size dependencies. Implementation of the algorithm, simulation and analysis scripts are available. Conclusions: The deconfounding algorithm without decorrelation using quantile normalization on non-log data is proposed for biomarkers that are difficult to detect, and for cases where confounding by varying proportions of cell types is the suspected reason. In this case, a deconfounding ranking approach can be used as a powerful alternative to, or complement of, other statistical learning approaches to define candidate biomarkers for molecular diagnosis and prediction in biomedicine, in realistically noisy conditions and with moderate sample sizes
Identification of T-cell antigens specific for latent mycobacterium tuberculosis infection.
BACKGROUND: T-cell responses against dormancy-, resuscitation-, and reactivation-associated antigens of Mycobacterium tuberculosis are candidate biomarkers of latent infection in humans. METHODOLOGY/PRINCIPAL FINDINGS: We established an assay based on two rounds of in vitro restimulation and intracellular cytokine analysis that detects T-cell responses to antigens expressed during latent M. tuberculosis infection. Comparison between active pulmonary tuberculosis (TB) patients and healthy latently M. tuberculosis-infected donors (LTBI) revealed significantly higher T-cell responses against 7 of 35 tested M. tuberculosis latency-associated antigens in LTBI. Notably, T cells specific for Rv3407 were exclusively detected in LTBI but not in TB patients. The T-cell IFNgamma response against Rv3407 in individual donors was the most influential factor in discrimination analysis that classified TB patients and LTBI with 83% accuracy using cross-validation. Rv3407 peptide pool stimulations revealed distinct candidate epitopes in four LTBI. CONCLUSIONS: Our findings further support the hypothesis that the latency-associated antigens can be exploited as biomarkers for LTBI
Biomarker discovery in heterogeneous tissue samples -taking the in-silico deconfounding approach
<p>Abstract</p> <p>Background</p> <p>For heterogeneous tissues, such as blood, measurements of gene expression are confounded by relative proportions of cell types involved. Conclusions have to rely on estimation of gene expression signals for homogeneous cell populations, e.g. by applying micro-dissection, fluorescence activated cell sorting, or <it>in-silico </it>deconfounding. We studied feasibility and validity of a non-negative matrix decomposition algorithm using experimental gene expression data for blood and sorted cells from the same donor samples. Our objective was to optimize the algorithm regarding detection of differentially expressed genes and to enable its use for classification in the difficult scenario of reversely regulated genes. This would be of importance for the identification of candidate biomarkers in heterogeneous tissues.</p> <p>Results</p> <p>Experimental data and simulation studies involving noise parameters estimated from these data revealed that for valid detection of differential gene expression, quantile normalization and use of non-log data are optimal. We demonstrate the feasibility of predicting proportions of constituting cell types from gene expression data of single samples, as a prerequisite for a deconfounding-based classification approach.</p> <p>Classification cross-validation errors with and without using deconfounding results are reported as well as sample-size dependencies. Implementation of the algorithm, simulation and analysis scripts are available.</p> <p>Conclusions</p> <p>The deconfounding algorithm without decorrelation using quantile normalization on non-log data is proposed for biomarkers that are difficult to detect, and for cases where confounding by varying proportions of cell types is the suspected reason. In this case, a deconfounding ranking approach can be used as a powerful alternative to, or complement of, other statistical learning approaches to define candidate biomarkers for molecular diagnosis and prediction in biomedicine, in realistically noisy conditions and with moderate sample sizes.</p
Improved analysis of bacterial CGH data beyond the log-ratio paradigm
<p>Abstract</p> <p>Background</p> <p>Existing methods for analyzing bacterial CGH data from two-color arrays are based on log-ratios only, a paradigm inherited from expression studies. We propose an alternative approach, where microarray signals are used in a different way and sequence identity is predicted using a supervised learning approach.</p> <p>Results</p> <p>A data set containing 32 hybridizations of sequenced versus sequenced genomes have been used to test and compare methods. A ROC-analysis has been performed to illustrate the ability to rank probes with respect to Present/Absent calls. Classification into Present and Absent is compared with that of a gaussian mixture model.</p> <p>Conclusion</p> <p>The results indicate our proposed method is an improvement of existing methods with respect to ranking and classification of probes, especially for multi-genome arrays.</p
Accelerated search for biomolecular network models to interpret high-throughput experimental data
<p>Abstract</p> <p>Background</p> <p>The functions of human cells are carried out by biomolecular networks, which include proteins, genes, and regulatory sites within DNA that encode and control protein expression. Models of biomolecular network structure and dynamics can be inferred from high-throughput measurements of gene and protein expression. We build on our previously developed fuzzy logic method for bridging quantitative and qualitative biological data to address the challenges of noisy, low resolution high-throughput measurements, i.e., from gene expression microarrays. We employ an evolutionary search algorithm to accelerate the search for hypothetical fuzzy biomolecular network models consistent with a biological data set. We also develop a method to estimate the probability of a potential network model fitting a set of data by chance. The resulting metric provides an estimate of both model quality and dataset quality, identifying data that are too noisy to identify meaningful correlations between the measured variables.</p> <p>Results</p> <p>Optimal parameters for the evolutionary search were identified based on artificial data, and the algorithm showed scalable and consistent performance for as many as 150 variables. The method was tested on previously published human cell cycle gene expression microarray data sets. The evolutionary search method was found to converge to the results of exhaustive search. The randomized evolutionary search was able to converge on a set of similar best-fitting network models on different training data sets after 30 generations running 30 models per generation. Consistent results were found regardless of which of the published data sets were used to train or verify the quantitative predictions of the best-fitting models for cell cycle gene dynamics.</p> <p>Conclusion</p> <p>Our results demonstrate the capability of scalable evolutionary search for fuzzy network models to address the problem of inferring models based on complex, noisy biomolecular data sets. This approach yields multiple alternative models that are consistent with the data, yielding a constrained set of hypotheses that can be used to optimally design subsequent experiments.</p
Functional Correlations of Pathogenesis-Driven Gene Expression Signatures in Tuberculosis
Tuberculosis remains a major health threat and its control depends on improved measures of prevention, diagnosis and treatment. Biosignatures can play a significant role in the development of novel intervention measures against TB and blood transcriptional profiling is increasingly exploited for their rational design. Such profiles also reveal fundamental biological mechanisms associated with the pathology of the disease. We have compared whole blood gene expression in TB patients, as well as in healthy infected and uninfected individuals in a cohort in The Gambia, West Africa and validated previously identified signatures showing high similarities of expression profiles among different cohorts. In this study, we applied a unique combination of classical gene expression analysis with pathway and functional association analysis integrated with intra-individual expression correlations. These analyses were employed for identification of new disease-associated gene signatures, identifying a network of Fc gamma receptor 1 signaling with correlating transcriptional activity as hallmark of gene expression in TB. Remarkable similarities to characteristic signatures in the autoimmune disease systemic lupus erythematosus (SLE) were observed. Functional gene clusters of immunoregulatory interactions involving the JAK-STAT pathway; sensing of microbial patterns by Toll-like receptors and IFN-signaling provide detailed insights into the dysregulation of critical immune processes in TB, involving active expression of both pro-inflammatory and immunoregulatory systems. We conclude that transcriptomics (i) provides a robust system for identification and validation of biosignatures for TB and (ii) application of integrated analysis tools yields novel insights into functional networks underlying TB pathogenesis
Maximum expected accuracy structural neighbors of an RNA secondary structure
International audienceBACKGROUND: Since RNA molecules regulate genes and control alternative splicing by allostery, it is important to develop algorithms to predict RNA conformational switches. Some tools, such as paRNAss, RNAshapes and RNAbor, can be used to predict potential conformational switches; nevertheless, no existent tool can detect general (i.e., not family specific) entire riboswitches (both aptamer and expression platform) with accuracy. Thus, the development of additional algorithms to detect conformational switches seems important, especially since the difference in free energy between the two metastable secondary structures may be as large as 15-20 kcal/mol. It has recently emerged that RNA secondary structure can be more accurately predicted by computing the maximum expected accuracy (MEA) structure, rather than the minimum free energy (MFE) structure. RESULTS: Given an arbitrary RNA secondary structure S₀ for an RNA nucleotide sequence a = a₁,..., a(n), we say that another secondary structure S of a is a k-neighbor of S₀, if the base pair distance between S₀ and S is k. In this paper, we prove that the Boltzmann probability of all k-neighbors of the minimum free energy structure S₀ can be approximated with accuracy ε and confidence 1 - p, simultaneously for all 0 ≤ k N(ε,p,K)=Φ⁻¹(p/2K)²/4ε², where Φ(z) is the cumulative distribution function (CDF) for the standard normal distribution. We go on to describe the algorithm RNAborMEA, which for an arbitrary initial structure S₀ and for all values 0 ≤ k < K, computes the secondary structure MEA(k), having maximum expected accuracy over all k-neighbors of S₀. Computation time is O(n³ * K²), and memory requirements are O(n² * K). We analyze a sample TPP riboswitch, and apply our algorithm to the class of purine riboswitches. CONCLUSIONS: The approximation of RNAbor by sampling, with rigorous bound on accuracy, together with the computation of maximum expected accuracy k-neighbors by RNAborMEA, provide additional tools toward conformational switch detection. Results from RNAborMEA are quite distinct from other tools, such as RNAbor, RNAshapes and paRNAss, hence may provide orthogonal information when looking for suboptimal structures or conformational switches. Source code for RNAborMEA can be downloaded from http://sourceforge.net/projects/rnabormea/ or http://bioinformatics.bc.edu/clotelab/RNAborMEA/
Systemic Inflammation in Preclinical Ulcerative Colitis
Background & Aims: Preclinical ulcerative colitis is poorly defined. We aimed to characterize the preclinical systemic inflammation in ulcerative colitis, using a comprehensive set of proteins. Methods: We obtained plasma samples biobanked from individuals who developed ulcerative colitis later in life (n = 72) and matched healthy controls (n = 140) within a population-based screening cohort. We measured 92 proteins related to inflammation using a proximity extension assay. The biologic relevance of these findings was validated in an inception cohort of patients with ulcerative colitis (n = 101) and healthy controls (n = 50). To examine the influence of genetic and environmental factors on these markers, a cohort of healthy twin siblings of patients with ulcerative colitis (n = 41) and matched healthy controls (n = 37) were explored. Results: Six proteins (MMP10, CXCL9, CCL11, SLAMF1, CXCL11 and MCP-1) were up-regulated (P < .05) in preclinical ulcerative colitis compared with controls based on both univariate and multivariable models. Ingenuity Pathway Analyses identified several potential key regulators, including interleukin-1ß, tumor necrosis factor, interferon-gamma, oncostatin M, nuclear factor-¿B, interleukin-6, and interleukin-4. For validation, we built a multivariable model to predict disease in the inception cohort. The model discriminated treatment-naïve patients with ulcerative colitis from controls with leave-one-out cross-validation (area under the curve = 0.92). Consistently, MMP10, CXCL9, CXCL11, and MCP-1, but not CCL11 and SLAMF1, were significantly up-regulated among the healthy twin siblings, even though their relative abundances seemed higher in incident ulcerative colitis. Conclusions: A set of inflammatory proteins are up-regulated several years before a diagnosis of ulcerative colitis. These proteins were highly predictive of an ulcerative colitis diagnosis, and some seemed to be up-regulated already at exposure to genetic and environmental risk factors. © 2021 The Author
Identification of T-Cell Antigens Specific for Latent Mycobacterium Tuberculosis Infection
BACKGROUND: T-cell responses against dormancy-, resuscitation-, and reactivation-associated antigens of Mycobacterium tuberculosis are candidate biomarkers of latent infection in humans. METHODOLOGY/PRINCIPAL FINDINGS: We established an assay based on two rounds of in vitro restimulation and intracellular cytokine analysis that detects T-cell responses to antigens expressed during latent M. tuberculosis infection. Comparison between active pulmonary tuberculosis (TB) patients and healthy latently M. tuberculosis-infected donors (LTBI) revealed significantly higher T-cell responses against 7 of 35 tested M. tuberculosis latency-associated antigens in LTBI. Notably, T cells specific for Rv3407 were exclusively detected in LTBI but not in TB patients. The T-cell IFNgamma response against Rv3407 in individual donors was the most influential factor in discrimination analysis that classified TB patients and LTBI with 83% accuracy using cross-validation. Rv3407 peptide pool stimulations revealed distinct candidate epitopes in four LTBI. CONCLUSIONS: Our findings further support the hypothesis that the latency-associated antigens can be exploited as biomarkers for LTBI
Evolution of Competitive Ability: An Adaptation Speed vs. Accuracy Tradeoff Rooted in Gene Network Size
Ecologists have increasingly come to understand that evolutionary change on short
time-scales can alter ecological dynamics (and vice-versa), and this idea is
being incorporated into community ecology research programs. Previous research
has suggested that the size and topology of the gene network underlying a
quantitative trait should constrain or facilitate adaptation and thereby alter
population dynamics. Here, I consider a scenario in which two species with
different genetic architectures compete and evolve in fluctuating environments.
An important trade-off emerges between adaptive accuracy and adaptive speed,
driven by the size of the gene network underlying the ecologically-critical
trait and the rate of environmental change. Smaller, scale-free networks confer
a competitive advantage in rapidly-changing environments, but larger networks
permit increased adaptive accuracy when environmental change is sufficiently
slow to allow a species time to adapt. As the differences in network
characteristics increase, the time-to-resolution of competition decreases. These
results augment and refine previous conclusions about the ecological
implications of the genetic architecture of quantitative traits, emphasizing a
role of adaptive accuracy. Along with previous work, in particular that
considering the role of gene network connectivity, these results provide a set
of expectations for what we may observe as the field of ecological genomics
develops
- …