9 research outputs found

    P-Match: transcription factor binding site search by combining patterns and weight matrices

    Get PDF
    P-Match is a new tool for identifying transcription factor (TF) binding sites in DNA sequences. It combines pattern matching and weight matrix approaches thus providing higher accuracy of recognition than each of the methods alone. P-Match is closely interconnected with the TRANSFAC(®) database. In particular, P-Match uses the matrix library as well as sets of aligned known TF-binding sites collected in TRANSFAC(®) and therefore provides the possibility to search for a large variety of different TF binding sites. Using results of extensive tests of recognition accuracy, we selected three sets of optimized cut-off values that minimize either false negatives or false positives, or the sum of both errors. Comparison with the weight matrix approaches such as Match™ tool shows that P-Match generally provides superior recognition accuracy in the area of low false negative errors (high sensitivity). As familiar to the user of Match™, P-Match also allows to save user-specific profiles that include selected subsets of matrices with corresponding TF-binding sites or user-defined cut-off values. Furthermore, a number of tissue-specific profiles are provided that were compiled by the TRANSFAC(®) team. A public version of the P-Match tool is available at

    Functional classification of protein domain superfamilies for protein function annotation

    Get PDF
    Proteins are made up of domains that are generally considered to be independent evolutionary and structural units having distinct functional properties. It is now well established that analysis of domains in proteins provides an effective approach to understand protein function using a `domain grammar'. Towards this end, evolutionarily-related protein domains have been classified into homologous superfamilies in CATH and SCOP databases. An ideal functional sub-classification of the domain superfamilies into `functional families' can not only help in function annotation of uncharacterised sequences but also provide a useful framework for understanding the diversity and evolution of function at the domain level. This work describes the development of a new protocol (FunFHMMer) for identifying functional families in CATH superfamilies that makes use of sequence patterns only and hence, is unaffected by the incompleteness of function annotations, annotation biases or misannotations existing in the databases. The resulting family classification was validated using known functional information and was found to generate more functionally coherent families than other domain-based protein resources. A protein function prediction pipeline was developed exploiting the functional annotations provided by the domain families which was validated by a database rollback benchmark set of proteins and an independent assessment by CAFA 2. The functional classification was found to capture the functional diversity of superfamilies well in terms of sequence, structure and the protein-context. This aided studies on evolution of protein domain function both at the superfamily level and in specific proteins of interest. The conserved positions in the functional family alignments were found to be enriched in catalytic site residues and ligand-binding site residues which led to the development of a functional site prediction tool. Lastly, the function prediction tools were assessed for annotation of moonlighting functions of proteins and a classification of moonlighting proteins was proposed based on their structure-function relationships

    Bioinformatics analysis of ZBED6, a novel transcription factor in mammals

    Get PDF
    The identification of a regulatory mutation in the intron 3 of Insulin like growth factor 2, was a major finding. This single nucleotide mutation in the non coding region abrogates the interaction of nuclear factor ZBED6, resulting in the 3-4% increase in the muscle mass of domesticated pigs. The mutation was observed in the evolutionary conserved CpG island that is hypomethylated in skeletal muscle. ZBED6 has been derived from a domesticated DNA transposon exclusively present in the placental mammals. Chromatin immuno precipitation (ChIP) sequencing in mouse C2C12 myoblasts using anti-ZBED6 antibody identified 2499 ZBED6 binding fragments. The de novo search on the binding fragments of ZBED6 showed the consensus sequence of 5´-GCTCG-3´. ZBED6 binding fragments contain more than 1200 genes with annotated functions in biological processes, transcriptional regulation, neurogenesis, cell signaling and muscle development. In present study we have done bioinformatics analysis on the ZBED6 using TRANSFAC database professional version 2010.1 in order to identify the other transcription factors co-regulating the expression of ZBED6 target genes. The ChIP data (ZBED6 target genes) and microarray expression data (siRNA silenced ZBED6) in mouse C2C12 cells were used in this study for finding the binding sites for transcription factors in the promoter regions. The genes associated with ZBED6 showed significant overrepresentation of binding sites of transcription factors SP1, ZF5, E2F1, ZBED6, AP2alpha and KROX in their promoter regions. Majority of factors found have GC rich binding sites and belongs to zinc finger families. The obtained factors show role in tumor suppression. The microarray expression data analysis showed that MEF2 and SRF transcription factors binding sites are significantly present in the promoters of co-pressed genes. The ZBED6 binding sites that were at a distance of 500 kb away from known transcription start site TSS, showed OCT1 and IRF1 binding sites. There is a possibility that these factors are the enhancer elements for many ZBED6 target genes. Few long non-coding RNAs were also identified in the vicinity of ZBED6 binding sites present at a distance of 500 kb away from known TSS

    Developing a computational approach to investigate the impacts of disease-causing mutations on protein function

    Get PDF
    This project uses bioinformatics protocols to explore the impacts of non-synonymous mutations (nsSNPs) in proteins associated with diseases, including germline, rare diseases and somatic diseases such as cancer. New approaches were explored for determining the impacts of disease-associated mutations on protein structure and function. Whilst this work has mainly concentrated on the analysis of cancer mutations, the methods developed are generic and could be applied to analysing other types of disease mutations. Different types of disease-causing mutations have been studied including germline diseases, somatic cancer mutations in oncogenes and tumour-suppressors, along with known activating and inactivating mutations in kinases. The proximity of disease-associated mutations has been analysed with respect to known functional sites reported by CSA, IBIS, along with predicted functional sites derived from the CATH classification of domain structure superfamilies. The latter are called FunSites, and are highly conserved residues within a CATH functional family (FunFam) – which is a functionally coherent subset of a CATH superfamily. Such sites include key catalytic residues as well as specificity determining residues and interface residues. Clear differences were found between oncogenes, tumour suppressor and germ-line mutations with oncogene mutations more likely to locate close to FunSites. Functional families that are highly enriched in disease mutations were identified and exploited structural data to identify clusters within proteins in these families that are enriched in mutations (using our MutClust program). We examined the tendencies of these clusters to lie close to the functional sites discussed above. For selected genes, the stability effects of disease mutations in cancer have also been investigated with a particular focus on activating mutations in FGFR3. These studies, which were supported by experimental validation, showed that activating mutations implicated in cancer tend to cause stabilisation of the active FGFR3 form, leading to its abnormal activity and oncogenesis. Mutationally enriched CATH FunFams were also used in the identification of cancer driver genes, which were then subjected to pathway and GO biological process analysis

    Genetic networks of antibacterial responses of eukaryotic cells. Bioinformatics analysis and modeling

    Get PDF
    This work describes the development of new methods to construction of promoter models as one of necessary steps of regulatory networks construction. Identification of characteristic promoter features shows the role of specific transcription factors (TFs) in triggering the response, which in turn sheds light on the signaling pathways activating these TFs. Treating reported results of microarray analyses together with other available information about the genes expressed in different cellular systems under consideration, we search for distinguishing features of the promoters of coexpressed genes. The application of such promoter models enables to identify additional candidate genes belonging to the same regulatory network. Four novel approaches are presented in this work: (i) subtractive approach to matrix generation; (ii) distance distribution approach; (iii) "seed" sets approach; (iv) complementary pairs approach. These approaches help to solve serious problems in promoter model construction such as the doubtful reliability of positive training sets ("seed" sets approach) and lack of knowledge about the exact signaling pathways triggering the gene expression (complementary pairs approach); the subtractive approach to matrix generation allows to refine positional weight matrices (PWM) for heterogeneous sets of binding sites, thus to improve the PWM search for single TFBS. A significant improvement of the specificity of promoter analysis has been achieved by applying statistical methods for characterizing TFBS combinations at over-represented distances rather than the mere identification of single potential TFBS (distance distributions approach). The newly developed methods were applied to the description of four defensive eukaryotic systems in terms of transcription regulation. The obtained models enabled us to gain better insights into the pathways of the corresponding signaling networks.Diese Arbeit beschreibt die Entwicklung mehrerer neuer Methoden zur Konstruktion von Promotormodellen als einen der notwendigen Schritte zur Konstruktion regulatorischer Netzwerke. Die Identifizierung charakteristischer Eigenschaften von Promotoren zeigt die Rolle bestimmter Transkriptionsfaktoren (TF) beim Auslösen spezifischer Antworten auf, was wiederum Aufschluss über die Signalwege zur Aktivierung dieser TF gibt. Durch Verarbeitung von Ergebnissen aus Microarray-Analysen zusammen mit weiteren verfügbaren Informationen über die in den betrachteten zellulären Systemen exprimierten Gene suchen wir nach kennzeichnenden Eigenschaften koregulierter Promotoren. Die Applikation solcher Promotermodelle ermöglicht die Identifizierung zusätzlicher Kandidatengene, die demselben regulatorischen Netzwerk angehören. Vier neue Ansätze werden in dieser Arbeit präsentiert: (i) der subtraktive Ansatz zur Matrixerzeugung; (ii) der Distanzverteilungsansatz; (iii) der "seed"-Set-Ansatz; (iv) der Ansatz komplementärer Paare. Diese Ansätze helfen, beträchtliche Probleme der Promotormodellkonstruktion zu lösen, wie die zweifelhafte Zuverlässigkeit positiver Trainingsets ("seed"-Set-Ansatz) und der Mangel an Wissen über die präzisen Signalwege, die bestimmte Genexpressionsereignisse auslösen (Ansatz komplementärer Paare). Der subtraktive Ansatz zur Matrixerzeugung erlaubt, Positionsgewichtungsmatrizen (PWM) für heterogene Sets von Bindungsstellen zu verfeinern und dadurch die PWM-Suche für einzelne TFBSs zur verbessern. Eine signifikante Verbesserung der Spezifität der Promotoranalyse wurde durch die Anwendung statistischer Methoden zur Charakterisierung von TFBS-Kombinationen in überrepräsentierten Distanzen anstelle der bloßen Identifizierung einzelner potentieller TFBSs erreicht. Die neuentwickelten Methoden wurden zur Beschreibung von vier eukaryotischen Abwehrsystemen verwendet. Die erhaltenen Modelle eröffneten tiefergehende Einsichten in die Pfade der zugehörigen Signalnetzwerke

    Annotation of Cytochrome P450 Genes In Harmonia axyridis\u27 And a Comparative Study of CYP Genes in Harmonia axyridis\u27 and Tribolium castaneum\u27

    Get PDF
    Our knowledge of beetle cytochrome P450 (CYP) genes was primarily obtained from studies of the model beetle and grain pest Tribolium castaneum.To gain additional insight into beetle CYPs and ultimately to inform our understanding of beetle CYP evolution, we identified and annotated all of the CYP genes present in a new draft genome of Harmonia axyridis by using traditional and automated methods for gene annotation. Overall, we identified somewhat fewer CYPs in H. axyridis (at least 94 genes and 3 pseudo genes representing 17 families and 42 subfamilies) compared to the number of of known CYPs in T. castaneum (137 plus 2 slight variants and 10 pseudogenes). The H. axyridis CYPs could be divided into 4 distinct clans: Mito, CYP2, CYP3 and CYP4 clans are major (monophyletic ) groups with strong support for most relationships and illustrates the presence of CYP blooms in T. castaneum that are lacking in H. axyridis. Several additional CYPs that are present in H. axyridis are missing in T. castaneum. The Mito clan of H. axyridis contains 6 genes in 5 families and 6 subfamilies . We found 7 genes in CYP2 clan with 5 families and 6 sub-families. We found 2 distinct families (4 and 349) and a minimum of 22 genes in the CYP4 clan in H. axyridis. Interestingly, both H. axyridis and T. castaneum carry CYP4G genes, which are candidate resistance genes for insecticides, including permethrins. The function of CYP4G was associated with pesticide resistance. The CYP3 clan has 59 genes in t families in 5 families H.axyridis: CYP6, CYP9, CYP345, CYP435 and CYP436. These 5 families in CYP3 are classified into at least 21 subfamilies. Our work focused on the automated annotation of CYP genes involved several software programs, the most efficient and sensitive of which are Augustus, GenScan and Fgenesh. Although it is likely that a few CYP genes remain to be identified in H.axyridis genome, our ongoing work suggests that the vast majority of CYPs have been identified

    Annual Report

    Get PDF

    Conception de miARN artificiels basée sur la caractérisation de la boucle de régulation miR-20/E2F

    Get PDF
    La biologie moléculaire et, plus spécifiquement, la régulation de l’expression génique ont été révolutionnées par la découverte des microARN (miARN). Ces petits ARN d’une vingtaine de nucléotides sont impliqués dans la majorité des processus cellulaires et leur expression est dérégulée dans plusieurs maladies, comme le cancer. Un miARN reconnaît ses cibles principalement par son noyau, ce qui lui permet de réguler simultanément la traduction de centaines d’ARN messagers. Nos travaux ont montré l’existence d’une boucle de rétro-activation négative, entre deux miARN du polycistron miR-17-92 et trois facteurs de transcription de la famille E2F. E2F1, 2 et 3 induisent la transcription de miR-20 et miR-17 qui par la suite inhibent leur traduction. Nos résultats suggèrent l’implication de cette boucle dans la résistance à l’apoptose induite par E2F1 dans les cellules du cancer de la prostate, ce qui expliquerait en partie le potentiel oncogénique du polycistron miR-17-92. L’étude de ce motif de régulation nous a donc permis de réaliser le potentiel incroyable qu’ont les miARN à inhiber la traduction de plusieurs gènes. Basé sur les règles de reconnaissance des miARN, nous avons développé et validé MultiTar. Cet outil bioinformatique permet de trouver la séquence d’un miARN artificiel ayant le potentiel d’inhiber la traduction de gènes d’intérêts choisis par l’utilisateur. Afin de valider MultiTar, nous avons généré des multitargets pouvant inhiber l’expression des trois E2F, ce qui nous a permis de comparer leur efficacité à celle de miR-20. Nos miARN artificiels ont la capacité d’inhiber la traduction des E2F et de neutraliser leur fonction redondante de la progression du cycle cellulaire de façon similaire ou supérieur à miR-20. La fonctionnalité de notre programme, ouvre la voie à une stratégie flexible pouvant cibler le caractère multigénique de différents processus cellulaires ou maladies complexes, tel que le cancer. L’utilisation de miARN artificiels pourrait donc représenter une alternative intéressante aux stratégies déjà existantes, qui sont limitées à inhiber des cibles uniques. En plus d’élucider un réseau de régulation complexe impliquant les miARN, nous avons pu tirer profit de leur potentiel d’inhibition par la conception de miARN artificiels.miRNAs are powerful regulators of gene expression in mammals. These small RNAs of around 20 nucleotides are involved in several cellular processes and diseases. MiRNAs recognize their targets mainly by a region comprising nucleotides 2-8, known as the seed. This characteristic gives them the potential to inhibit hundreds of messenger RNAs. Our first goal was to better characterize the complex network involving miRNAs in the regulation of gene expression. To achieve this, we studied the relation between a family of transcription factors, the E2Fs, and a family of miRNAs, the miR-17-92 cluster. Our results suggest a negative feedback loop involving miR-17, miR-20a, E2F1, E2F2 and E2F3. In this loop E2F1, 2 and 3 activate the transcription of the two miRNAs that inhibit their translation in return. The inhibition of the antiapoptotic function of E2F1 by miR-17 and miR-20 in a prostate cancer context, could explain the oncogenic potential of the miR-17-92 cluster that was previously reported. Studying the miR-20/E2F feedback loop made us realize how powerful was the ability of miRNAs to inhibit several targets. To overcome the lack of efficient tools able to inhibit simultaneously the expression of multiple genes, our second goal was to develop MultiTar, an algorithm able to design artificial miRNAs that target a set of predetermined genes. MultiTar was validated in silico, using known targets of endogenous miRNAs and in vivo, taking advantage of our experience with the E2F context. We designed artificial miRNAs against E2F1-3 and expressed them both in normal human fibroblasts and prostate cancer cells where they inhibited cell proliferation and induced cellular senescence. The observed phenotypes were precisely those known for inhibiting E2F activities. Hence, MultiTar can efficiently design artificial micro RNAs able to target multiple genes and is thus a flexible tool that can address the issue of multigenic diseases and complex cellular processes. The use of multitargets could be an alternative to overcome the limits of drugs or siRNAs that are designed generally to regulate only one target
    corecore