93 research outputs found

    Matched Ascertainment of Informative Families for Complex Genetic Modelling

    Get PDF
    Family data are used extensively in quantitative genetic studies to disentangle the genetic and environmental contributions to various diseases. Many family studies based their analysis on population-based registers containing a large number of individuals composed of small family units. For binary trait analyses, exact marginal likelihood is a common approach, but, due to the computational demand of the enormous data sets, it allows only a limited number of effects in the model. This makes it particularly difficult to perform joint estimation of variance components for a binary trait and the potential confounders. We have developed a data-reduction method of ascertaining informative families from population-based family registers. We propose a scheme where the ascertained families match the full cohort with respect to some relevant statistics, such as the risk to relatives of an affected individual. The ascertainment-adjusted analysis, which we implement using a pseudo-likelihood approach, is shown to be efficient relative to the analysis of the whole cohort and robust to mis-specification of the random effect distribution

    Meta-analysis of gene expression profiles in breast cancer: toward a unified understanding of breast cancer subtyping and prognosis signatures

    Get PDF
    INTRODUCTION: Breast cancer subtyping and prognosis have been studied extensively by gene expression profiling, resulting in disparate signatures with little overlap in their constituent genes. Although a previous study demonstrated a prognostic concordance among gene expression signatures, it was limited to only one dataset and did not fully elucidate how the different genes were related to one another nor did it examine the contribution of well-known biological processes of breast cancer tumorigenesis to their prognostic performance. METHOD: To address the above issues and to further validate these initial findings, we performed the largest meta-analysis of publicly available breast cancer gene expression and clinical data, which are comprised of 2,833 breast tumors. Gene coexpression modules of three key biological processes in breast cancer (namely, proliferation, estrogen receptor [ER], and HER2 signaling) were used to dissect the role of constituent genes of nine prognostic signatures. RESULTS: Using a meta-analytical approach, we consolidated the signatures associated with ER signaling, ERBB2 amplification, and proliferation. Previously published expression-based nomenclature of breast cancer 'intrinsic' subtypes can be mapped to the three modules, namely, the ER-/HER2- (basal-like), the HER2+ (HER2-like), and the low- and high-proliferation ER+/HER2- subtypes (luminal A and B). We showed that all nine prognostic signatures exhibited a similar prognostic performance in the entire dataset. Their prognostic abilities are due mostly to the detection of proliferation activity. Although ER- status (basal-like) and ERBB2+ expression status correspond to bad outcome, they seem to act through elevated expression of proliferation genes and thus contain only indirect information about prognosis. Clinical variables measuring the extent of tumor progression, such as tumor size and nodal status, still add independent prognostic information to proliferation genes. CONCLUSION: This meta-analysis unifies various results of previous gene expression studies in breast cancer. It reveals connections between traditional prognostic factors, expression-based subtyping, and prognostic signatures, highlighting the important role of proliferation in breast cancer prognosis.Journal ArticleMeta-AnalysisResearch Support, Non-U.S. Gov'tSCOPUS: ar.jinfo:eu-repo/semantics/publishe

    Estimating Parameters of Speciation Models Based on Refined Summaries of the Joint Site-Frequency Spectrum

    Get PDF
    Understanding the processes and conditions under which populations diverge to give rise to distinct species is a central question in evolutionary biology. Since recently diverged populations have high levels of shared polymorphisms, it is challenging to distinguish between recent divergence with no (or very low) inter-population gene flow and older splitting events with subsequent gene flow. Recently published methods to infer speciation parameters under the isolation-migration framework are based on summarizing polymorphism data at multiple loci in two species using the joint site-frequency spectrum (JSFS). We have developed two improvements of these methods based on a more extensive use of the JSFS classes of polymorphisms for species with high intra-locus recombination rates. First, using a likelihood based method, we demonstrate that taking into account low-frequency polymorphisms shared between species significantly improves the joint estimation of the divergence time and gene flow between species. Second, we introduce a local linear regression algorithm that considerably reduces the computational time and allows for the estimation of unequal rates of gene flow between species. We also investigate which summary statistics from the JSFS allow the greatest estimation accuracy for divergence time and migration rates for low (around 10) and high (around 100) numbers of loci. Focusing on cases with low numbers of loci and high intra-locus recombination rates we show that our methods for the estimation of divergence time and migration rates are more precise than existing approaches

    A hierarchical Bayesian model for understanding the spatiotemporal dynamics of the intestinal epithelium

    Get PDF
    Our work addresses two key challenges, one biological and one methodological. First, we aim to understand how proliferation and cell migration rates in the intestinal epithelium are related under healthy, damaged (Ara-C treated) and recovering conditions, and how these relations can be used to identify mechanisms of repair and regeneration. We analyse new data, presented in more detail in a companion paper, in which BrdU/IdU cell-labelling experiments were performed under these respective conditions. Second, in considering how to more rigorously process these data and interpret them using mathematical models, we use a probabilistic, hierarchical approach. This provides a best-practice approach for systematically modelling and understanding the uncertainties that can otherwise undermine the generation of reliable conclusions-uncertainties in experimental measurement and treatment, difficult-to-compare mathematical models of underlying mechanisms, and unknown or unobserved parameters. Both spatially discrete and continuous mechanistic models are considered and related via hierarchical conditional probability assumptions. We perform model checks on both in-sample and out-of-sample datasets and use them to show how to test possible model improvements and assess the robustness of our conclusions. We conclude, for the present set of experiments, that a primarily proliferation-driven model suffices to predict labelled cell dynamics over most time-scales

    Dormancy within Staphylococcus epidermidis biofilms : a transcriptomic analysis by RNA-seq

    Get PDF
    The proportion of dormant bacteria within Staphylococcus epidermidis biofilms may determine its inflammatory profile. Previously, we have shown that S. epidermidis biofilms with higher proportions of dormant bacteria have reduced activation of murine macrophages. RNA-sequencing was used to identify the major transcriptomic differences between S. epidermidis biofilms with different proportions of dormant bacteria. To accomplish this goal, we used an in vitro model where magnesium allowed modulation of the proportion of dormant bacteria within S. epidermidis biofilms. Significant differences were found in the expression of 147 genes. A detailed analysis of the results was performed based on direct and functional gene interactions. Biological processes among the differentially expressed genes were mainly related to oxidation-reduction processes and acetyl-CoA metabolic processes. Gene set enrichment revealed that the translation process is related to the proportion of dormant bacteria. Transcription of mRNAs involved in oxidation-reduction processes was associated with higher proportions of dormant bacteria within S. epidermidis biofilm. Moreover, the pH of the culture medium did not change after the addition of magnesium, and genes related to magnesium transport did not seem to impact entrance of bacterial cells into dormancy.The authors thank Stephen Lorry at Harvard Medical School for providing CLC Genomics software. This work was funded by Fundacao para a Ciencia e a Tecnologia (FCT) and COMPETE grants PTDC/BIA-MIC/113450/2009, FCOMP-01-0124-FEDER-014309, FCOMP-01-0124-FEDER-022718 (FCT PEst-C/SAU/LA0002/2011), QOPNA research unit (project PEst-C/QUI/UI0062/2011), and CENTRO-07-ST24-FEDER-002034. The following authors had an individual FCT fellowship: VC (SFRH/BD/78235/2011) and AF (2SFRH/BD/62359/2009)

    Context Matters: The Illusive Simplicity of Macaque V1 Receptive Fields

    Get PDF
    Even in V1, where neurons have well characterized classical receptive fields (CRFs), it has been difficult to deduce which features of natural scenes stimuli they actually respond to. Forward models based upon CRF stimuli have had limited success in predicting the response of V1 neurons to natural scenes. As natural scenes exhibit complex spatial and temporal correlations, this could be due to surround effects that modulate the sensitivity of the CRF. Here, instead of attempting a forward model, we quantify the importance of the natural scenes surround for awake macaque monkeys by modeling it non-parametrically. We also quantify the influence of two forms of trial to trial variability. The first is related to the neuron’s own spike history. The second is related to ongoing mean field population activity reflected by the local field potential (LFP). We find that the surround produces strong temporal modulations in the firing rate that can be both suppressive and facilitative. Further, the LFP is found to induce a precise timing in spikes, which tend to be temporally localized on sharp LFP transients in the gamma frequency range. Using the pseudo R[superscript 2] as a measure of model fit, we find that during natural scene viewing the CRF dominates, accounting for 60% of the fit, but that taken collectively the surround, spike history and LFP are almost as important, accounting for 40%. However, overall only a small proportion of V1 spiking statistics could be explained (R[superscript 2]~5%), even when the full stimulus, spike history and LFP were taken into account. This suggests that under natural scene conditions, the dominant influence on V1 neurons is not the stimulus, nor the mean field dynamics of the LFP, but the complex, incoherent dynamics of the network in which neurons are embedded.National Institutes of Health (U.S.) (K25 NS052422-02)National Institutes of Health (U.S.) (DP1 ODOO3646

    Reproducible Cancer Biomarker Discovery in SELDI-TOF MS Using Different Pre-Processing Algorithms

    Get PDF
    BACKGROUND: There has been much interest in differentiating diseased and normal samples using biomarkers derived from mass spectrometry (MS) studies. However, biomarker identification for specific diseases has been hindered by irreproducibility. Specifically, a peak profile extracted from a dataset for biomarker identification depends on a data pre-processing algorithm. Until now, no widely accepted agreement has been reached. RESULTS: In this paper, we investigated the consistency of biomarker identification using differentially expressed (DE) peaks from peak profiles produced by three widely used average spectrum-dependent pre-processing algorithms based on SELDI-TOF MS data for prostate and breast cancers. Our results revealed two important factors that affect the consistency of DE peak identification using different algorithms. One factor is that some DE peaks selected from one peak profile were not detected as peaks in other profiles, and the second factor is that the statistical power of identifying DE peaks in large peak profiles with many peaks may be low due to the large scale of the tests and small number of samples. Furthermore, we demonstrated that the DE peak detection power in large profiles could be improved by the stratified false discovery rate (FDR) control approach and that the reproducibility of DE peak detection could thereby be increased. CONCLUSIONS: Comparing and evaluating pre-processing algorithms in terms of reproducibility can elucidate the relationship among different algorithms and also help in selecting a pre-processing algorithm. The DE peaks selected from small peak profiles with few peaks for a dataset tend to be reproducibly detected in large peak profiles, which suggests that a suitable pre-processing algorithm should be able to produce peaks sufficient for identifying useful and reproducible biomarkers

    Shortening of 3′UTRs Correlates with Poor Prognosis in Breast and Lung Cancer

    Get PDF
    A major part of the post-transcriptional regulation of gene expression is affected by trans-acting elements, such as microRNAs, binding the 3′ untraslated region (UTR) of their target mRNAs. Proliferating cells partly escape this type of negative regulation by expressing shorter 3′ UTRs, depleted of microRNA binding sites, compared to non-proliferating cells. Using large-scale gene expression datasets, we show that a similar phenomenon takes place in breast and lung cancer: tumors expressing shorter 3′ UTRs tend to be more aggressive and to result in shorter patient survival. Moreover, we show that a gene expression signature based only on the expression ratio of alternative 3′ UTRs is a strong predictor of survival in both tumors. Genes undergoing 3′UTR shortening in aggressive tumors of the two tissues significantly overlap, and several of them are known to be involved in tumor progression. However the pattern of 3′ UTR shortening in aggressive tumors in vivo is clearly distinct from analogous patterns involved in proliferation and transformation
    corecore