138 research outputs found

    A Biomedically Enriched Collection of 7000 Human ORF Clones

    Get PDF
    We report the production and availability of over 7000 fully sequence verified plasmid ORF clones representing over 3400 unique human genes. These ORF clones were derived using the human MGC collection as template and were produced in two formats: with and without stop codons. Thus, this collection supports the production of either native protein or proteins with fusion tags added to either or both ends. The template clones used to generate this collection were enriched in three ways. First, gene redundancy was removed. Second, clones were selected to represent the best available GenBank reference sequence. Finally, a literature-based software tool was used to evaluate the list of target genes to ensure that it broadly reflected biomedical research interests. The target gene list was compared with 4000 human diseases and over 8500 biological and chemical MeSH classes in ∼15 Million publications recorded in PubMed at the time of analysis. The outcome of this analysis revealed that relative to the genome and the MGC collection, this collection is enriched for the presence of genes with published associations with a wide range of diseases and biomedical terms without displaying a particular bias towards any single disease or concept. Thus, this collection is likely to be a powerful resource for researchers who wish to study protein function in a set of genes with documented biomedical significance

    Assessment of clusters of transcription factor binding sites in relationship to human promoter, CpG islands and gene expression

    Get PDF
    BACKGROUND: Gene expression is regulated mainly by transcription factors (TFs) that interact with regulatory cis-elements on DNA sequences. To identify functional regulatory elements, computer searching can predict TF binding sites (TFBS) using position weight matrices (PWMs) that represent positional base frequencies of collected experimentally determined TFBS. A disadvantage of this approach is the large output of results for genomic DNA. One strategy to identify genuine TFBS is to utilize local concentrations of predicted TFBS. It is unclear whether there is a general tendency for TFBS to cluster at promoter regions, although this is the case for certain TFBS. Also unclear is the identification of TFs that have TFBS concentrated in promoters and to what level this occurs. This study hopes to answer some of these questions. RESULTS: We developed the cluster score measure to evaluate the correlation between predicted TFBS clusters and promoter sequences for each PWM. Non-promoter sequences were used as a control. Using the cluster score, we identified a PWM group called PWM-PCP, in which TFBS clusters positively correlate with promoters, and another PWM group called PWM-NCP, in which TFBS clusters negatively correlate with promoters. The PWM-PCP group comprises 47% of the 199 vertebrate PWMs, while the PWM-NCP group occupied 11 percent. After reducing the effect of CpG islands (CGI) against the clusters using partial correlation coefficients among three properties (promoter, CGI and predicted TFBS cluster), we identified two PWM groups including those strongly correlated with CGI and those not correlated with CGI. CONCLUSION: Not all PWMs predict TFBS correlated with human promoter sequences. Two main PWM groups were identified: (1) those that show TFBS clustered in promoters associated with CGI, and (2) those that show TFBS clustered in promoters independent of CGI. Assessment of PWM matches will allow more positive interpretation of TFBS in regulatory regions

    ChromaSig: A Probabilistic Approach to Finding Common Chromatin Signatures in the Human Genome

    Get PDF
    Computational methods to identify functional genomic elements using genetic information have been very successful in determining gene structure and in identifying a handful of cis-regulatory elements. But the vast majority of regulatory elements have yet to be discovered, and it has become increasingly apparent that their discovery will not come from using genetic information alone. Recently, high-throughput technologies have enabled the creation of information-rich epigenetic maps, most notably for histone modifications. However, tools that search for functional elements using this epigenetic information have been lacking. Here, we describe an unsupervised learning method called ChromaSig to find, in an unbiased fashion, commonly occurring chromatin signatures in both tiling microarray and sequencing data. Applying this algorithm to nine chromatin marks across a 1% sampling of the human genome in HeLa cells, we recover eight clusters of distinct chromatin signatures, five of which correspond to known patterns associated with transcriptional promoters and enhancers. Interestingly, we observe that the distinct chromatin signatures found at enhancers mark distinct functional classes of enhancers in terms of transcription factor and coactivator binding. In addition, we identify three clusters of novel chromatin signatures that contain evolutionarily conserved sequences and potential cis-regulatory elements. Applying ChromaSig to a panel of 21 chromatin marks mapped genomewide by ChIP-Seq reveals 16 classes of genomic elements marked by distinct chromatin signatures. Interestingly, four classes containing enrichment for repressive histone modifications appear to be locally heterochromatic sites and are enriched in quickly evolving regions of the genome. The utility of this approach in uncovering novel, functionally significant genomic elements will aid future efforts of genome annotation via chromatin modifications

    Discovery and characterization of chromatin states for systematic annotation of the human genome

    Get PDF
    A plethora of epigenetic modifications have been described in the human genome and shown to play diverse roles in gene regulation, cellular differentiation and the onset of disease. Although individual modifications have been linked to the activity levels of various genetic functional elements, their combinatorial patterns are still unresolved and their potential for systematic de novo genome annotation remains untapped. Here, we use a multivariate Hidden Markov Model to reveal 'chromatin states' in human T cells, based on recurrent and spatially coherent combinations of chromatin marks. We define 51 distinct chromatin states, including promoter-associated, transcription-associated, active intergenic, large-scale repressed and repeat-associated states. Each chromatin state shows specific enrichments in functional annotations, sequence motifs and specific experimentally observed characteristics, suggesting distinct biological roles. This approach provides a complementary functional annotation of the human genome that reveals the genome-wide locations of diverse classes of epigenetic function.National Science Foundation (U.S.). (Award 0905968)National Human Genome Research Institute (U.S.) (Award U54-HG004570)National Human Genome Research Institute (U.S.) (Award RC1-HG005334

    Sequence-Based Prediction of Type III Secreted Proteins

    Get PDF
    The type III secretion system (TTSS) is a key mechanism for host cell interaction used by a variety of bacterial pathogens and symbionts of plants and animals including humans. The TTSS represents a molecular syringe with which the bacteria deliver effector proteins directly into the host cell cytosol. Despite the importance of the TTSS for bacterial pathogenesis, recognition and targeting of type III secreted proteins has up until now been poorly understood. Several hypotheses are discussed, including an mRNA-based signal, a chaperon-mediated process, or an N-terminal signal peptide. In this study, we systematically analyzed the amino acid composition and secondary structure of N-termini of 100 experimentally verified effector proteins. Based on this, we developed a machine-learning approach for the prediction of TTSS effector proteins, taking into account N-terminal sequence features such as frequencies of amino acids, short peptides, or residues with certain physico-chemical properties. The resulting computational model revealed a strong type III secretion signal in the N-terminus that can be used to detect effectors with sensitivity of ∼71% and selectivity of ∼85%. This signal seems to be taxonomically universal and conserved among animal pathogens and plant symbionts, since we could successfully detect effector proteins if the respective group was excluded from training. The application of our prediction approach to 739 complete bacterial and archaeal genome sequences resulted in the identification of between 0% and 12% putative TTSS effector proteins. Comparison of effector proteins with orthologs that are not secreted by the TTSS showed no clear pattern of signal acquisition by fusion, suggesting convergent evolutionary processes shaping the type III secretion signal. The newly developed program EffectiveT3 (http://www.chlamydiaedb.org) is the first universal in silico prediction program for the identification of novel TTSS effectors. Our findings will facilitate further studies on and improve our understanding of type III secretion and its role in pathogen–host interactions

    Gene Characterization Index: Assessing the Depth of Gene Annotation

    Get PDF
    We introduce the Gene Characterization Index, a bioinformatics method for scoring the extent to which a protein-encoding gene is functionally described. Inherently a reflection of human perception, the Gene Characterization Index is applied for assessing the characterization status of individual genes, thus serving the advancement of both genome annotation and applied genomics research by rapid and unbiased identification of groups of uncharacterized genes for diverse applications such as directed functional studies and delineation of novel drug targets.The scoring procedure is based on a global survey of researchers, who assigned characterization scores from 1 (poor) to 10 (extensive) for a sample of genes based on major online resources. By evaluating the survey as training data, we developed a bioinformatics procedure to assign gene characterization scores to all genes in the human genome. We analyzed snapshots of functional genome annotation over a period of 6 years to assess temporal changes reflected by the increase of the average Gene Characterization Index. Applying the Gene Characterization Index to genes within pharmaceutically relevant classes, we confirmed known drug targets as high-scoring genes and revealed potentially interesting novel targets with low characterization indexes. Removing known drug targets and genes linked to sequence-related patent filings from the entirety of indexed genes, we identified sets of low-scoring genes particularly suited for further experimental investigation.The Gene Characterization Index is intended to serve as a tool to the scientific community and granting agencies for focusing resources and efforts on unexplored areas of the genome. The Gene Characterization Index is available from http://cisreg.ca/gci/

    Adaptive Evolution in Zinc Finger Transcription Factors

    Get PDF
    The majority of human genes are conserved among mammals, but some gene families have undergone extensive expansion in particular lineages. Here, we present an evolutionary analysis of one such gene family, the poly–zinc-finger (poly-ZF) genes. The human genome encodes approximately 700 members of the poly-ZF family of putative transcriptional repressors, many of which have associated KRAB, SCAN, or BTB domains. Analysis of the gene family across the tree of life indicates that the gene family arose from a small ancestral group of eukaryotic zinc-finger transcription factors through many repeated gene duplications accompanied by functional divergence. The ancestral gene family has probably expanded independently in several lineages, including mammals and some fishes. Investigation of adaptive evolution among recent paralogs using dN/dS analysis indicates that a major component of the selective pressure acting on these genes has been positive selection to change their DNA-binding specificity. These results suggest that the poly-ZF genes are a major source of new transcriptional repression activity in humans and other primates

    Inositol Hexakisphosphate-Induced Autoprocessing of Large Bacterial Protein Toxins

    Get PDF
    Large bacterial protein toxins autotranslocate functional effector domains to the eukaryotic cell cytosol, resulting in alterations to cellular functions that ultimately benefit the infecting pathogen. Among these toxins, the clostridial glucosylating toxins (CGTs) produced by Gram-positive bacteria and the multifunctional-autoprocessing RTX (MARTX) toxins of Gram-negative bacteria have distinct mechanisms for effector translocation, but a shared mechanism of post-translocation autoprocessing that releases these functional domains from the large holotoxins. These toxins carry an embedded cysteine protease domain (CPD) that is activated for autoprocessing by binding inositol hexakisphosphate (InsP6), a molecule found exclusively in eukaryotic cells. Thus, InsP6-induced autoprocessing represents a unique mechanism for toxin effector delivery specifically within the target cell. This review summarizes recent studies of the structural and molecular events for activation of autoprocessing for both CGT and MARTX toxins, demonstrating both similar and potentially distinct aspects of autoprocessing among the toxins that utilize this method of activation and effector delivery

    Cross-Species Analyses Identify the BNIP-2 and Cdc42GAP Homology (BCH) Domain as a Distinct Functional Subclass of the CRAL_TRIO/Sec14 Superfamily

    Get PDF
    The CRAL_TRIO protein domain, which is unique to the Sec14 protein superfamily, binds to a diverse set of small lipophilic ligands. Similar domains are found in a range of different proteins including neurofibromatosis type-1, a Ras GTPase-activating Protein (RasGAP) and Rho guanine nucleotide exchange factors (RhoGEFs). Proteins containing this structural protein domain exhibit a low sequence similarity and ligand specificity while maintaining an overall characteristic three-dimensional structure. We have previously demonstrated that the BNIP-2 and Cdc42GAP Homology (BCH) protein domain, which shares a low sequence homology with the CRAL_TRIO domain, can serve as a regulatory scaffold that binds to Rho, RhoGEFs and RhoGAPs to control various cell signalling processes. In this work, we investigate 175 BCH domain-containing proteins from a wide range of different organisms. A phylogenetic analysis with ∼100 CRAL_TRIO and similar domains from eight representative species indicates a clear distinction of BCH-containing proteins as a novel subclass within the CRAL_TRIO/Sec14 superfamily. BCH-containing proteins contain a hallmark sequence motif R(R/K)h(R/K)(R/K)NL(R/K)xhhhhHPs (‘h’ is large and hydrophobic residue and ‘s’ is small and weekly polar residue) and can be further subdivided into three unique subtypes associated with BNIP-2-N, macro- and RhoGAP-type protein domains. A previously unknown group of genes encoding ‘BCH-only’ domains is also identified in plants and arthropod species. Based on an analysis of their gene-structure and their protein domain context we hypothesize that BCH domain-containing genes evolved through gene duplication, intron insertions and domain swapping events. Furthermore, we explore the point of divergence between BCH and CRAL-TRIO proteins in relation to their ability to bind small GTPases, GAPs and GEFs and lipid ligands. Our study suggests a need for a more extensive analysis of previously uncharacterized BCH, ‘BCH-like’ and CRAL_TRIO-containing proteins and their significance in regulating signaling events involving small GTPases

    Mutation of the Zebrafish Nucleoporin elys Sensitizes Tissue Progenitors to Replication Stress

    Get PDF
    The recessive lethal mutation flotte lotte (flo) disrupts development of the zebrafish digestive system and other tissues. We show that flo encodes the ortholog of Mel-28/Elys, a highly conserved gene that has been shown to be required for nuclear integrity in worms and nuclear pore complex (NPC) assembly in amphibian and mammalian cells. Maternal elys expression sustains zebrafish flo mutants to larval stages when cells in proliferative tissues that lack nuclear pores undergo cell cycle arrest and apoptosis. p53 mutation rescues apoptosis in the flo retina and optic tectum, but not in the intestine, where the checkpoint kinase Chk2 is activated. Chk2 inhibition and replication stress induced by DNA synthesis inhibitors were lethal to flo larvae. By contrast, flo mutants were not sensitized to agents that cause DNA double strand breaks, thus showing that loss of Elys disrupts responses to selected replication inhibitors. Elys binds Mcm2-7 complexes derived from Xenopus egg extracts. Mutation of elys reduced chromatin binding of Mcm2, but not binding of Mcm3 or Mcm4 in the flo intestine. These in vivo data indicate a role for Elys in Mcm2-chromatin interactions. Furthermore, they support a recently proposed model in which replication origins licensed by excess Mcm2-7 are required for the survival of human cells exposed to replication stress
    corecore