440 research outputs found

    Sparse regulatory networks

    Full text link
    In many organisms the expression levels of each gene are controlled by the activation levels of known "Transcription Factors" (TF). A problem of considerable interest is that of estimating the "Transcription Regulation Networks" (TRN) relating the TFs and genes. While the expression levels of genes can be observed, the activation levels of the corresponding TFs are usually unknown, greatly increasing the difficulty of the problem. Based on previous experimental work, it is often the case that partial information about the TRN is available. For example, certain TFs may be known to regulate a given gene or in other cases a connection may be predicted with a certain probability. In general, the biology of the problem indicates there will be very few connections between TFs and genes. Several methods have been proposed for estimating TRNs. However, they all suffer from problems such as unrealistic assumptions about prior knowledge of the network structure or computational limitations. We propose a new approach that can directly utilize prior information about the network structure in conjunction with observed gene expression data to estimate the TRN. Our approach uses L1L_1 penalties on the network to ensure a sparse structure. This has the advantage of being computationally efficient as well as making many fewer assumptions about the network structure. We use our methodology to construct the TRN for E. coli and show that the estimate is biologically sensible and compares favorably with previous estimates.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS350 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    The dysbindin-containing complex (BLOC-1) in brain: developmental regulation, interaction with SNARE proteins and role in neurite outgrowth.

    Get PDF
    Previous studies have implicated DTNBP1 as a schizophrenia susceptibility gene and its encoded protein, dysbindin, as a potential regulator of synaptic vesicle physiology. In this study, we found that endogenous levels of the dysbindin protein in the mouse brain are developmentally regulated, with higher levels observed during embryonic and early postnatal ages than in young adulthood. We obtained biochemical evidence indicating that the bulk of dysbindin from brain exists as a stable component of biogenesis of lysosome-related organelles complex-1 (BLOC-1), a multi-subunit protein complex involved in intracellular membrane trafficking and organelle biogenesis. Selective biochemical interaction between brain BLOC-1 and a few members of the SNARE (soluble N-ethylmaleimide-sensitive factor attachment protein receptor) superfamily of proteins that control membrane fusion, including SNAP-25 and syntaxin 13, was demonstrated. Furthermore, primary hippocampal neurons deficient in BLOC-1 displayed neurite outgrowth defects. Taken together, these observations suggest a novel role for the dysbindin-containing complex, BLOC-1, in neurodevelopment, and provide a framework for considering potential effects of allelic variants in DTNBP1--or in other genes encoding BLOC-1 subunits--in the context of the developmental model of schizophrenia pathogenesis

    OperonDB: a comprehensive database of predicted operons in microbial genomes

    Get PDF
    The fast pace of bacterial genome sequencing and the resulting dependence on highly automated annotation methods has driven the development of many genome-wide analysis tools. OperonDB, first released in 2001, is a database containing the results of a computational algorithm for locating operon structures in microbial genomes. OperonDB has grown from 34 genomes in its initial release to more than 500 genomes today. In addition to increasing the size of the database, we have re-designed our operon finding algorithm and improved its accuracy. The new database is updated regularly as additional genomes become available in public archives. OperonDB can be accessed at: http://operondb.cbcb.umd.ed

    Genetic structure and introgression in riparian populations of Populus alba L.

    Get PDF
    White poplar (Populus alba) is a widespread species of the northern hemisphere. Introgressed populations or hybrid zones with the related species of the European aspen (Populus tremula) have been suggested as potential venues for the identification of functionally important variation for germplasm conservation, restoration efforts and tree breeding. Data on the genetic diversity and structure of introgressed P. alba are available only for sympatric populations from central Europe. Here, clonality, introgression and spatial genetic patterns were evaluated in three riparian populations of P. alba along the Ticino, Paglia-Tevere and Cesano river drainages in Italy. Samples of all three populations were typed for five nuclear microsatellite markers and 137 polymorphic amplified fragment length polymorphisms. Microsatellite-based inbreeding co-efficients (FIS) were significantly positive in all three populations. Genetic diversity was consistently highest in Ticino, the population with the highest level of introgression from P. tremula. Population differentiation (FST) was low between the Ticino valley in northern Italy and the Cesano valley in central Italy and between the central Italian populations of Cesano and Paglia-Tevere, consistent with a role of the Appenine mountain range as a barrier to gene flow between adjacent drainage areas. Introgression was not the primary determinant of within-population spatial genetic structure (SGS) in the studied populations

    Phenotypic plasticity, QTL mapping and genomic characterization of bud set in black poplar

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The genetic control of important adaptive traits, such as bud set, is still poorly understood in most forest trees species. Poplar is an ideal model tree to study bud set because of its indeterminate shoot growth. Thus, a full-sib family derived from an intraspecific cross of <it>P. nigra </it>with 162 clonally replicated progeny was used to assess the phenotypic plasticity and genetic variation of bud set in two sites of contrasting environmental conditions.</p> <p>Results</p> <p>Six crucial phenological stages of bud set were scored. Night length appeared to be the most important signal triggering the onset of growth cessation. Nevertheless, the effect of other environmental factors, such as temperature, increased during the process. Moreover, a considerable role of genotype × environment (G × E) interaction was found in all phenological stages with the lowest temperature appearing to influence the sensitivity of the most plastic genotypes.</p> <p>Descriptors of growth cessation and bud onset explained the largest part of phenotypic variation of the entire process. Quantitative trait loci (QTL) for these traits were detected. For the four selected traits (the onset of growth cessation (date2.5), the transition from shoot to bud (date1.5), the duration of bud formation (subproc1) and bud maturation (subproc2)) eight and sixteen QTL were mapped on the maternal and paternal map, respectively. The identified QTL, each one characterized by small or modest effect, highlighted the complex nature of traits involved in bud set process. Comparison between map location of QTL and <it>P. trichocarpa </it>genome sequence allowed the identification of 13 gene models, 67 bud set-related expressional and six functional candidate genes (CGs). These CGs are functionally related to relevant biological processes, environmental sensing, signaling, and cell growth and development. Some strong QTL had no obvious CGs, and hold great promise to identify unknown genes that affect bud set.</p> <p>Conclusions</p> <p>This study provides a better understanding of the physiological and genetic dissection of bud set in poplar. The putative QTL identified will be tested for associations in <it>P. nigra </it>natural populations. The identified QTL and CGs will also serve as useful targets for poplar breeding.</p

    Factor analysis for gene regulatory networks and transcription factor activity profiles

    Get PDF
    BACKGROUND: Most existing algorithms for the inference of the structure of gene regulatory networks from gene expression data assume that the activity levels of transcription factors (TFs) are proportional to their mRNA levels. This assumption is invalid for most biological systems. However, one might be able to reconstruct unobserved activity profiles of TFs from the expression profiles of target genes. A simple model is a two-layer network with unobserved TF variables in the first layer and observed gene expression variables in the second layer. TFs are connected to regulated genes by weighted edges. The weights, known as factor loadings, indicate the strength and direction of regulation. Of particular interest are methods that produce sparse networks, networks with few edges, since it is known that most genes are regulated by only a small number of TFs, and most TFs regulate only a small number of genes. RESULTS: In this paper, we explore the performance of five factor analysis algorithms, Bayesian as well as classical, on problems with biological context using both simulated and real data. Factor analysis (FA) models are used in order to describe a larger number of observed variables by a smaller number of unobserved variables, the factors, whereby all correlation between observed variables is explained by common factors. Bayesian FA methods allow one to infer sparse networks by enforcing sparsity through priors. In contrast, in the classical FA, matrix rotation methods are used to enforce sparsity and thus to increase the interpretability of the inferred factor loadings matrix. However, we also show that Bayesian FA models that do not impose sparsity through the priors can still be used for the reconstruction of a gene regulatory network if applied in conjunction with matrix rotation methods. Finally, we show the added advantage of merging the information derived from all algorithms in order to obtain a combined result. CONCLUSION: Most of the algorithms tested are successful in reconstructing the connectivity structure as well as the TF profiles. Moreover, we demonstrate that if the underlying network is sparse it is still possible to reconstruct hidden activity profiles of TFs to some degree without prior connectivity information

    BiForce Toolbox: powerful high-throughput computational analysis of gene-gene interactions in genome-wide association studies

    Get PDF
    Genome-wide association studies (GWAS) have discovered many loci associated with common disease and quantitative traits. However, most GWAS have not studied the gene–gene interactions (epistasis) that could be important in complex trait genetics. A major challenge in analysing epistasis in GWAS is the enormous computational demands of analysing billions of SNP combinations. Several methods have been developed recently to address this, some using computers equipped with particular graphical processing units, most restricted to binary disease traits and all poorly suited to general usage on the most widely used operating systems. We have developed the BiForce Toolbox to address the demand for high-throughput analysis of pairwise epistasis in GWAS of quantitative and disease traits across all commonly used computer systems. BiForce Toolbox is a stand-alone Java program that integrates bitwise computing with multithreaded parallelization and thus allows rapid full pairwise genome scans via a graphical user interface or the command line. Furthermore, BiForce Toolbox incorporates additional tests of interactions involving SNPs with significant marginal effects, potentially increasing the power of detection of epistasis. BiForce Toolbox is easy to use and has been applied in multiple studies of epistasis in large GWAS data sets, identifying interesting interaction signals and pathways

    An Unbiased Estimator of Gene Diversity in Samples Containing Related Individuals

    Get PDF
    Gene diversity is sometimes estimated from samples that contain inbred or related individuals. If inbred or related individuals are included in a sample, then the standard estimator for gene diversity produces a downward bias caused by an inflation of the variance of estimated allele frequencies. We develop an unbiased estimator for gene diversity that relies on kinship coefficients for pairs of individuals with known relationship and that reduces to the standard estimator when all individuals are noninbred and unrelated. Applying our estimator to data simulated based on allele frequencies observed for microsatellite loci in human populations, we find that the new estimator performs favorably compared with the standard estimator in terms of bias and similarly in terms of mean squared error. For human population-genetic data, we find that a close linear relationship previously seen between gene diversity and distance from East Africa is preserved when adjusting for the inclusion of close relatives

    Contribution of common and rare variants to bipolar disorder susceptibility in extended pedigrees from population isolates.

    Get PDF
    Current evidence from case/control studies indicates that genetic risk for psychiatric disorders derives primarily from numerous common variants, each with a small phenotypic impact. The literature describing apparent segregation of bipolar disorder (BP) in numerous multigenerational pedigrees suggests that, in such families, large-effect inherited variants might play a greater role. To identify roles of rare and common variants on BP, we conducted genetic analyses in 26 Colombia and Costa Rica pedigrees ascertained for bipolar disorder 1 (BP1), the most severe and heritable form of BP. In these pedigrees, we performed microarray SNP genotyping of 838 individuals and high-coverage whole-genome sequencing of 449 individuals. We compared polygenic risk scores (PRS), estimated using the latest BP1 genome-wide association study (GWAS) summary statistics, between BP1 individuals and related controls. We also evaluated whether BP1 individuals had a higher burden of rare deleterious single-nucleotide variants (SNVs) and rare copy number variants (CNVs) in a set of genes related to BP1. We found that compared with unaffected relatives, BP1 individuals had higher PRS estimated from BP1 GWAS statistics (P = 0.001 ~ 0.007) and displayed modest increase in burdens of rare deleterious SNVs (P = 0.047) and rare CNVs (P = 0.002 ~ 0.033) in genes related to BP1. We did not observe rare variants segregating in the pedigrees. These results suggest that small-to-moderate effect rare and common variants are more likely to contribute to BP1 risk in these extended pedigrees than a few large-effect rare variants