69 research outputs found

    Evolution of motif variants and positional bias of the cyclic-AMP response element

    Get PDF
    BACKGROUND: Transcription factors regulate gene expression by interacting with their specific DNA binding sites. Some transcription factors, particularly those involved in transcription initiation, always bind close to transcription start sites (TSS). Others have no such preference and are functional on sites even tens of thousands of base pairs (bp) away from the TSS. The Cyclic-AMP response element (CRE) binding protein (CREB) binds preferentially to a palindromic sequence (TGACGTCA), known as the canonical CRE, and also to other CRE variants. CREB can activate transcription at CREs thousands of bp away from the TSS, but in mammals CREs are found far more frequently within 1 to 150 bp upstream of the TSS than in any other region. This property is termed positional bias. The strength of CREB binding to DNA is dependent on the sequence of the CRE motif. The central CpG dinucleotide in the canonical CRE (TGACGTCA) is critical for strong binding of CREB dimers. Methylation of the cytosine in the CpG can inhibit binding of CREB. Deamination of the methylated cytosines causes a C to T transition, resulting in a functional, but lower affinity CRE variant, TGATGTCA. RESULTS: We performed genome-wide surveys of CREs in a number of species (from worm to human) and showed that only vertebrates exhibited a CRE positional bias. We performed pair-wise comparisons of human CREs with orthologous sequences in mouse, rat and dog genomes and found that canonical and TGATGTCA variant CREs are highly conserved in mammals. However, when orthologous sequences differ, canonical CREs in human are most frequently TGATGTCA in the other species and vice-versa. We have identified 207 human CREs showing such differences. CONCLUSION: Our data suggest that the positional bias of CREs likely evolved after the separation of urochordata and vertebrata. Although many canonical CREs are conserved among mammals, there are a number of orthologous genes that have canonical CREs in one species but the TGATGTCA variant in another. These differences are likely due to deamination of the methylated cytosines in the CpG and may contribute to differential transcriptional regulation among orthologous genes

    Discovery of Functional Genes for Systemic Acquired Resistance in Arabidopsis Thaliana through Integrated Data Mining

    Get PDF
    Various data mining techniques combined with sequence motif information in the promoter region of genes were applied to discover functional genes that are involved in the defense mechanism of systemic acquired resistance (SAR) in Arabidopsis thaliana. A series of K-Means clustering with difference-in-shape as distance measure was initially applied. A stability measure was used to validate this clustering process. A decision tree algorithm with the discover-and-mask technique was used to identify a group of most informative genes. Appearance and abundance of various transcription factor binding sites in the promoter region of the genes were studied. Through the combination of these techniques, we were able to identify 24 candidate genes involved in the SAR defense mechanism. The candidate genes fell into 2 highly resolved categories, each category showing significantly unique profiles of regulatory elements in their promoter regions. This study demonstrates the strength of such integration methods and suggests a broader application of this approach.Diff\ue9rentes techniques d'exploration de donn\ue9es, combin\ue9es \ue0 de l'information sur le motif de s\ue9quence dans la r\ue9gion promotrice de g\ue8nes, ont \ue9t\ue9 appliqu\ue9es pour d\ue9couvrir les g\ue8nes fonctionnels qui interviennent dans le m\ue9canisme de d\ue9fense de la r\ue9sistance syst\ue9mique acquise (RSA ou SAR) chez Arabidopsis thaliana. On a initialement utilis\ue9 une s\ue9rie de classifications par les K moyennes et la diff\ue9rence de forme comme mesure de distance. On a utilis\ue9 une mesure de stabilit\ue9 pour valider ce processus de classification, et un algorithme d'arbre de d\ue9cision ainsi que la technique de d\ue9couverte et de masquage pour identifier un groupe de g\ue8nes sup\ue9rieurement informatifs. On a \ue9tudi\ue9 l'apparence et l'abondance de diff\ue9rents sites de liaison de facteurs de transcription dans la r\ue9gion promotrice des g\ue8nes. En combinant ces techniques, nous avons pu identifier 24 g\ue8nes candidats intervenant dans le m\ue9canisme de d\ue9fense de la RSA. Ces g\ue8nes candidats se classaient dans deux cat\ue9gories hautement r\ue9solues, chacune pr\ue9sentant des profils v\ue9ritablement uniques d'\ue9l\ue9ments r\ue9gulateurs dans leurs r\ue9gions promotrices. Cette \ue9tude d\ue9montre le potentiel de pareilles m\ue9thodes d'int\ue9gration et laisse entrevoir une plus vaste application de cette approche.Peer reviewed: YesNRC publication: Ye

    Mining biological information from 3D short time-series gene expression data: the OPTricluster algorithm

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Nowadays, it is possible to collect expression levels of a set of genes from a set of biological samples during a series of time points. Such data have three dimensions: gene-sample-time (GST). Thus they are called 3D microarray gene expression data. To take advantage of the 3D data collected, and to fully understand the biological knowledge hidden in the GST data, novel subspace clustering algorithms have to be developed to effectively address the biological problem in the corresponding space.</p> <p>Results</p> <p>We developed a subspace clustering algorithm called Order Preserving Triclustering (OPTricluster), for 3D short time-series data mining. OPTricluster is able to identify 3D clusters with coherent evolution from a given 3D dataset using a combinatorial approach on the sample dimension, and the order preserving (OP) concept on the time dimension. The fusion of the two methodologies allows one to study similarities and differences between samples in terms of their temporal expression profile. OPTricluster has been successfully applied to four case studies: immune response in mice infected by malaria (<it>Plasmodium chabaudi</it>), systemic acquired resistance in <it>Arabidopsis thaliana</it>, similarities and differences between inner and outer cotyledon in <it>Brassica napus </it>during seed development, and to <it>Brassica napus </it>whole seed development. These studies showed that OPTricluster is robust to noise and is able to detect the similarities and differences between biological samples.</p> <p>Conclusions</p> <p>Our analysis showed that OPTricluster generally outperforms other well known clustering algorithms such as the TRICLUSTER, gTRICLUSTER and K-means; it is robust to noise and can effectively mine the biological knowledge hidden in the 3D short time-series gene expression data.</p

    The Role of Data Pre-Processing in Intelligent Data Analysis

    No full text
    This paper first provides a brief overview of some frequently encountered real world problems in data analysis. These are problems that have to be solved through data pre-processing so that the nature of the data is better understood and the data analysis is performed more accurately and efficiently. The architecture of a data analysis tool for which a data pre-processing mechanism has been developed and tested is also explained. An example is then given of the use of this data pre-processing mechanism for two purposes: (i) to filter out a set of semiconductor data, and (ii) to find out more about the nature of these data and make the induction process more efficient.Cet article pr\ue9sente d'abord un bref aper\ue7u de quelques-uns des probl\ue8mes les plus courants de l'analyse des donn\ue9es. Il s'agit de probl\ue8mes qui doivent \ueatre r\ue9solus gr\ue2ce au pr\ue9traitement des donn\ue9es pour que la nature de ces donn\ue9es soit mieux comprise et que l'analyse soit effectu\ue9e de fa\ue7on plus efficace et plus exacte. On explique \ue9galement l'architecture d'un outil d'analyse de donn\ue9es pour lequel on a d\ue9velopp\ue9 et test\ue9 un m\ue9canisme de pr\ue9traitement. Un exemple illustre l'utilisation de ce m\ue9canisme \ue0 deux fins : (i) filtrer ou \ue9liminer par filtrage un ensemble de donn\ue9es de semi-conducteurs et (ii) mieux saisir la nature de ces donn\ue9es pour rendre le processus d'induction plus efficace.NRC publication: Ye

    Integrated Software Provides Workcell Autonomy

    No full text
    NRC publication: Ye

    Use of Decision-Tree Induction for Process Optimization and Knowledge Refinement of an Industrial Process

    No full text
    Development of expert systems involves knowledge acquisition which can be supported by applying machine learning techniques. This paper presents the basic idea of using decision-tree induction in process optimization and development of the domain model of electrochemical machining (ECM). It further discusses how decision-tree induction is used to build and refine the knowledge base of the process. The idea of developing an intelligent supervisory system with a learning component (IMAFO, Intelligent MAnufacturing FOreman) that is already implemented is briefly introduced. The results of applying IMAFO for analyzing data form the ECM process are presented. How the domain model of the process (electrochemical machining) is built from the initial known information and how the results of decision-tree induction can be used to optimize the model of the process and further refine the knowledge base are shown. Two examples are given to demonstrate how new rules (to be included in the knowledge base of an expert system) are generated from the rules induced by IMAFO. The procedure to refine these types of rules is also explained.Le d\ue9veloppement de syst\ue8mes experts facilite l'acquisition de connaissances gr\ue2ce \ue0 des techniques d'apprentissage machine. Cet article pr\ue9sente l'id\ue9e de base qui consiste \ue0 utiliser l'induction d'arbre de d\ue9cision dans l'optimisation du processus et le d\ue9veloppement du mod\ue8le de domaine de l'usinage \ue9lectrolytique (ECM ou electrochemical machining). On examine plus en d\ue9tail comment l'induction d'arbre de d\ue9cision sert \ue0 b\ue2tir et \ue0 affiner la base de connaissances du processus. On explique bri\ue8vement l'id\ue9e de d\ue9velopper un syst\ue8me de supervision intelligent avec une composante d'apprentissage (IMAFO ou Intelligent MAnufacturing Foreman) qui est d\ue9j\ue0 implant\ue9e. Les r\ue9sultats de l'application de IMAFO \ue0 l'analyse des donn\ue9es forment le processus ECM. On montre comment le mod\ue8le de domaine du processus (usinage \ue9lectrolytique) est b\ue2ti \ue0 partir de l'information connue au d\ue9part et comment les r\ue9sultats de l'induction d'arbre de d\ue9cision permettent d'optimiser le mod\ue8le du processus et d'affiner encore plus la base de connaissances. Deux exemples illustrent comment les nouvelles r\ue8gles (\ue0 inclure dans la base de connaissances d'un syst\ue8me expert) sont g\ue9n\ue9r\ue9es d'apr\ue8s les r\ue8gles induites par IMAFO. La proc\ue9dure d'affinement de ce type de r\ue8gles est \ue9galement expliqu\ue9e.NRC publication: Ye

    Searching for patterns in imbalanced data : methods and alternatives with case studies in life sciences

    No full text
    The prime motivation for pattern discovery and machine learning research has been the collection and warehousing of large amounts of data, in many domains such as life sciences and industrial processes. Examples of unique problems arisen are situations where the data is imbalanced. The class imbalance problem corresponds to situations where majority of cases belong to one class and a small minority belongs to the other, which in many cases is equally or even more important. To deal with this problem a number of approaches have been studied in the past. In this talk we provide an overview of some existing methods and present novel applications that are based on identifying the inherent characteristics of one class vs the other. We present the results of a number of studies focusing on real data from life science applications.Peer reviewed: YesNRC publication: Ye
    • …
    corecore